What is Unicode Encoding
Unicode encoding is a standardized way of representing text and symbols from different writing systems in computers and digital environments. It provides a unique number, called a code point, for every character, symbol, or glyph, regardless of the platform, program, or language.
HostSplit: Exploitable Antipatterns in Unicode Normalization refers to a class of vulnerabilities and design flaws that arise from improper handling or normalization of Unicode strings in software systems. These issues can lead to inconsistencies in how hosts, URLs, or domain names are parsed, potentially allowing attackers to exploit systems in various ways, including bypassing security checks or causing misrouting of requests.
Unicode characters that normalize ASCII characters with syntax-significance
Exploitation (DOMAIN)
# Domain
℀ (U+2100) --- Normalise to ---> a/c
https://evil.c℀ ---> https://evil.ca/c
Https://a.com/b.com ---> /is U+FF0F
If receive a log request *.b.com (Not Vulnerable) if a.com/b... (Vulnerable)
Can be used in obfuscation, physhing, redirection...
Exploitation (Path)
# Path
- Path Traversal
Code Page Unicode Character Hexadecimal Equivalent
----------------------------------------------------------
874 U+FF0F / 0x002F (/)
874 U+FF3C \ 0x005C (\)
932 U+00A5 ¥ 0x005C (\)
949 U+20A9 ₩ 0x005C (\)
1250 U+2044 ⁄ 0x002F (/)
1250 U+2215 ∕ 0x002F (/)
1250 U+2216 ∖ 0x005C (\)
1250 U+FF0F / 0x002F (/)
1250 U+FF3C \ 0x005C (\)
1251 U+FF0F / 0x002F (/)
1251 U+FF3C \ 0x005C (\)
1252 U+2044 ⁄ 0x002F (/)
... ... ... ...
More details: https://worst.fit/mapping/#ANSI:0x5C,0x2F
Exploitation (EMAILS)
# Email
Exploitation (HEADERS)
- More details soon...