Unicode characters to Bypass Security Checks

You are reading Sim4n6's newsletter, a publication designed for ethical hackers. Each issue features a few selected vulnerability reports, providing the straight-to-the-point trick to adopt.

This edition is about using Unicode encoding to Bypass Validation Logic.

GoSecure's presentation [PDF] has a valuable insight about some unusual Unicode vulnerabilities that could byͥte. It recommends that "if you need to do normalization, normalize prior to a security validation"… But Why ?

Because a post-Unicode normalization may introduce back some omitted characters.

When the Unicode normalization is applied for instance to the character U+FF20 (@), the resulting character will be the regular U+0040 (@). If ever a security check is performed against the regular one, and then a normalization is performed for an input holding the Unicode equivalent character, that would bring back its dangerous state.

The next condition outcome is true in Python:

unicodedata.normalize("NFKC", '@') == '@'  # True

Breaking URL parser

Take for instance, the URL parser bases its splitting of the host part and user:password part on the identification of the regular character @ (U+0040).

The URL parser is supposed to deny the host evil.com. However, a malicious URL may include the Unicode equivalent character﹫ (U+FE6B). It would become https://@evil.com. No regular @ character can be found. So, the host @evil.com is not denied. When normalized after the denying check, the URL would get back to its malicious state @evil.com.

Bypassing rXSS escaping

Take for instance this Python Flask snippet:

import unicodedata
from flask import Flask, request, escape, render_template

app = Flask(__name__)

def escape_nd_normalize():
    ui_escaped = escape(request.args.get('ui'))
    norm_ui = unicodedata.normalize("NFKC", ui_escaped)
    return render_template('result.html', ui=norm_ui)

with the result html template:

<!DOCTYPE html>
<html lang="en">
        {{ ui | safe }}

Run the developement web server using:

FLASK_APP=snippet.py flask run --reload

Now, hit the server using the following payloads:

# 1. No rXSS triggered

# 2. A rXSS is triggered
  1. Using the first payload, the flask.escape() function has successfully escaped the regular characters < and > making the first payload benign.

  2. While the escape function has considered the Unicode character equivalent (U+FE64) and (U+FE65) as harmless. Thus, no escaping.

    But, when the late Unicode normalization happens with the form algorithm NKFC, it leads to the conversion of the Unicode character (U+FE64) back to the regular one < , resulting in a rXSS triggering.


A post-Unicode normalization may lead to:

  • Breaking the URL parser and credentials leakage, for instance:
    https://www.evil.c℀.ms.com would become https://www.evil.ca/c.ms.com ( CVE-2019-0654 ).

  • Account takeover due to character collision ( CVE-2019-19844 ).

  • Bypass-escaping mechanisms.