Query Chronicles
Posts
Unicode characters to Bypass Security Checks

Unicode characters to Bypass Security Checks

Sim4n6
June 15, 2023

You are reading Sim4n6's newsletter, a publication designed for ethical hackers. Each issue features a few selected vulnerability reports, providing the straight-to-the-point trick to adopt.

This edition is about using Unicode encoding to Bypass Validation Logic.

GoSecure's presentation [PDF] has a valuable insight about some unusual Unicode vulnerabilities that could byͥte. It recommends that "if you need to do normalization, normalize prior to a security validation"… But Why ?

Because a post-Unicode normalization may introduce back some omitted characters.

When the Unicode normalization is applied for instance to the character U+FF20 (＠), the resulting character will be the regular U+0040 (@). If ever a security check is performed against the regular one, and then a normalization is performed for an input holding the Unicode equivalent character, that would bring back its dangerous state.

The next condition outcome is true in Python:

unicodedata.normalize("NFKC", '＠') == '@'  # True

Breaking URL parser

Take for instance, the URL parser bases its splitting of the host part and user:password part on the identification of the regular character @ (U+0040).

The URL parser is supposed to deny the host evil.com. However, a malicious URL may include the Unicode equivalent character﹫ (U+FE6B). It would become https://＠evil.com. No regular @ character can be found. So, the host ＠evil.com is not denied. When normalized after the denying check, the URL would get back to its malicious state @evil.com.

Bypassing rXSS escaping

Take for instance this Python Flask snippet:

import unicodedata
from flask import Flask, request, escape, render_template

app = Flask(__name__)

@app.route("/")
def escape_nd_normalize():
    ui_escaped = escape(request.args.get('ui'))
    norm_ui = unicodedata.normalize("NFKC", ui_escaped)
    return render_template('result.html', ui=norm_ui)

with the result html template:

<!DOCTYPE html>
<html lang="en">
<head>
    <title>Results</title>
</head>
<body>
    <h1>Results</h1>
    <p>
        {{ ui | safe }}
    </p>
  
</body>
</html>

Run the developement web server using:

FLASK_APP=snippet.py flask run --reload

Now, hit the server using the following payloads:

# 1. No rXSS triggered 
http://127.0.0.1:5000/?ui=%3Cimg%20src=x%20onerror=print()%3E

# 2. A rXSS is triggered 
http://127.0.0.1:5000/?ui=%EF%B9%A4img%20src=x%20onerror=print()%EF%B9%A5

Using the first payload, the flask.escape() function has successfully escaped the regular characters < and > making the first payload benign.
While the escape function has considered the Unicode character equivalent ﹤ (U+FE64) and ﹥ (U+FE65) as harmless. Thus, no escaping.
But, when the late Unicode normalization happens with the form algorithm NKFC, it leads to the conversion of the Unicode character ﹤ (U+FE64) back to the regular one < , resulting in a rXSS triggering.

Impact

A post-Unicode normalization may lead to:

Breaking the URL parser and credentials leakage, for instance:
https://www.evil.c℀.ms.com would become https://www.evil.ca/c.ms.com ( CVE-2019-0654 ).
Account takeover due to character collision ( CVE-2019-19844 ).
Bypass-escaping mechanisms.

References:

Host/Split - Exploitable Anti-patterns in Unicode Normalization - https://i.blackhat.com/USA-19/Thursday/us-19-Birch-HostSplit-Exploitable-Antipatterns-In-Unicode-Normalization.pdf
Unicode vulnerabilities that could byte you - https://gosecure.github.io/presentations/2021-02-unicode-owasp-toronto/philippe_arteau_owasp_unicode_v4.pdf