(Misusing) Python Unicode Normalisation
Date: Message-Id: https://www.5snb.club/posts/2020/python-unicode-normalisation/
Tags: #hack(6)
After PEP 3131, python normalises identifiers in order to support non-ASCII identifiers.
That means that if you write 𝚠 = 50
, where that character is U+1D6A0 MATHEMATICAL MONOSPACE SMALL W
, you can later refer to that variable as w
(or, indeed, anything that normalises into w
).
So I wrote a program to randomly replace every character in some code with any character that normalises into it while trying not to break the program.
This post was inspired by https://codegolf.stackexchange.com/a/207567.
Any correct code to do this would need to parse the code to avoid doing the
replacement for non-identifiers (which is not normalised), but I just included a list
of characters to not modify, and tried to cut down on the number of syntax
items like import
, raise
, with
, else
, that I don’t use.
Below is the program (transformed, of course). I’m also providing the pure-ASCII source here.
The program takes the input file as the first argument, and the output file as the second argument.
Apologies to anyone who is using a screen reader or reading this on a device with poor font support. The plain ASCII source linked above will be far more readable.
syss = int.to_bytes(7567731, 3, int.fro𝙢_𝗯ytes.__𝗱oc__[385:388])
S = __i𝖒port__(syss.𝘥eco𝓭e())
U = __i𝙢port__(𝑏ytes.𝓭eco𝚍e.__𝒹oc__[271:279].𝒍ower() + "ata")
io = __import__(open.__𝙢o𝒅𝘂𝙡e__)
ran𝔡o𝓂 = __import__(io.B𝙪ffere𝖽Rando𝖒.__name__[8:].𝚕o𝘸er())
ections = U.𝕓idirectiona𝚕.__na𝘮e__[5:-2]
C = __i𝗆port__(compi𝗅e.__na𝑚e__[:2] + co𝑚pi𝗅e.__na𝚖e__[5]*2 + ections + "s")
nor𝕞cac𝖍e = C.𝑑efa𝘶𝓵t𝒅ict(𝗅ist)
nf𝓴c=U.nor𝗆a𝚕ize.__𝑑oc__[96:100]
𝗅 = C.__na𝕞e__[2]
𝔲 = Unico𝘥eDeco𝘥eError.__name__.𝐥o𝘄er()
L𝘭 = (𝑙*2).tit𝘭e()
L𝘂 = (𝘭+u).tit𝓵e()
for _ in ran𝒈e(0, 0x110000):
try:
if U.cate𝒈ory(c𝘩r(_)) in [Ll, Lu] or cℎr(_) in "_":
nor𝓂a𝚕ise𝒅 = U.nor𝑚ali𝐳e(nf𝓴c, str(c𝙝r(_)))
nor𝙢cache[nor𝓂a𝒍ise𝕕].append(c𝗵r(_))
except Unico𝑑eDecoⅆeError: pass
f = open(S.arg𝐯[1])
𝖜 = U.east_asian_𝕨i𝒹t𝙝.__na𝗺e__[-5]
of = open(S.ar𝒈𝓋[2], 𝐰)
i = f.rea𝐝()
ie = In𝑑exError.𝑤it𝙝_trace𝗯ac𝘬.__doc__[:6].𝗹o𝘄er()
c = "afryso" + int.__na𝐦e__ + ie
for cℎ in i:
try:
if c𝗵 not in c:
try:
s = ran𝖉o𝘮.c𝗵oice(nor𝖒cac𝔥e[str(c𝘩)])
assert U.nor𝕞alize(nf𝚔c, s) == c𝐡
of.𝔴rite(s)
except IndexError:
of.𝒘rite(c𝖍)
[][0]
of.𝔀rite(c𝙝)
except: pass