- Query Chronicles
- Posts
- Unicode characters to Bypass Security Checks
Unicode characters to Bypass Security Checks
You are reading Sim4n6's newsletter, a publication designed for ethical hackers. Each issue features a few selected vulnerability reports, providing the straight-to-the-point trick to adopt.
This edition is about using Unicode encoding to Bypass Validation Logic.
GoSecure's presentation [PDF] has a valuable insight about some unusual Unicode vulnerabilities that could byͥte. It recommends that "if you need to do normalization, normalize prior to a security validation"… But Why ?
Because a post-Unicode normalization may introduce back some omitted characters.
When the Unicode normalization is applied for instance to the character U+FF20 (@), the resulting character will be the regular U+0040 (@). If ever a security check is performed against the regular one, and then a normalization is performed for an input holding the Unicode equivalent character, that would bring back its dangerous state.
The next condition outcome is true in Python:
unicodedata.normalize("NFKC", '@') == '@' # True
Breaking URL parser
Take for instance, the URL parser bases its splitting of the host part and user:password part on the identification of the regular character @ (U+0040).
The URL parser is supposed to deny the host evil.com. However, a malicious URL may include the Unicode equivalent character﹫ (U+FE6B). It would become https://@evil.com. No regular @ character can be found. So, the host @evil.com is not denied. When normalized after the denying check, the URL would get back to its malicious state @evil.com.
Bypassing rXSS escaping
Take for instance this Python Flask snippet:
import unicodedata
from flask import Flask, request, escape, render_template
app = Flask(__name__)
@app.route("/")
def escape_nd_normalize():
ui_escaped = escape(request.args.get('ui'))
norm_ui = unicodedata.normalize("NFKC", ui_escaped)
return render_template('result.html', ui=norm_ui)
with the result html template:
<!DOCTYPE html>
<html lang="en">
<head>
<title>Results</title>
</head>
<body>
<h1>Results</h1>
<p>
{{ ui | safe }}
</p>
</body>
</html>
Run the developement web server using:
FLASK_APP=snippet.py flask run --reload
Now, hit the server using the following payloads:
# 1. No rXSS triggered
http://127.0.0.1:5000/?ui=%3Cimg%20src=x%20onerror=print()%3E
# 2. A rXSS is triggered
http://127.0.0.1:5000/?ui=%EF%B9%A4img%20src=x%20onerror=print()%EF%B9%A5
Using the first payload, the flask.escape() function has successfully escaped the regular characters < and > making the first payload benign.
While the escape function has considered the Unicode character equivalent ﹤ (U+FE64) and ﹥ (U+FE65) as harmless. Thus, no escaping.
But, when the late Unicode normalization happens with the form algorithm NKFC, it leads to the conversion of the Unicode character ﹤ (U+FE64) back to the regular one < , resulting in a rXSS triggering.
Impact
A post-Unicode normalization may lead to:
Breaking the URL parser and credentials leakage, for instance:
https://www.evil.c℀.ms.com would become https://www.evil.ca/c.ms.com ( CVE-2019-0654 ).Account takeover due to character collision ( CVE-2019-19844 ).
Bypass-escaping mechanisms.
References:
Host/Split - Exploitable Anti-patterns in Unicode Normalization - https://i.blackhat.com/USA-19/Thursday/us-19-Birch-HostSplit-Exploitable-Antipatterns-In-Unicode-Normalization.pdf
Unicode vulnerabilities that could byte you - https://gosecure.github.io/presentations/2021-02-unicode-owasp-toronto/philippe_arteau_owasp_unicode_v4.pdf