Detecting Malicious Unicode in Source Code and Pull Requests

Debian Bug report:

1 Like

Debian lintian test unicode-trojan:
https://lintian.debian.org/tags/unicode-trojan


1 Like
1 Like

Bug report was rejected.

  • For simplification, all avoidable unicode has been removed from derivative-maker / Kicksecure / Whonix source code.
  • Before building Kicksecure / Whonix packages as well as before building Kicksecure / Non-Qubes-Whonix VM images, the source code of derivative-maker as well as the source code in its /packages sub folder is now scanned for unexpected unicode.

Implementation:

Above is not a full solution / workaround for:

  • or all the other projects on the internet - almost all - that would have to audit their existing source code for malicious unicode and prevent inclusion for future malicious unicode,
  • any of the other issues raised on https://trojansource.codes/ such as fixing compilers or text editors.

Alpinelinux:

NixOS:

1 Like

Thank you. Outreach on this issue is certainly helpful.

Best to include the link to the original attack research:

already mentioned in michael altfield article as a reference.

Patrick via Whonix Forum:

Didn’t try yet, interesting:

https://github.com/haveyoudebuggedit/trojansourcedetector

1 Like

Gentoo:

https://bugs.gentoo.org/862372

Mint OS:

1 Like

In a LKRG source code file a comment includes a real name which contains this sign: ł
Non-malicious.
This triggers to dm-check-unicode check.
Therefore excluding the files where this happens from the check.
This is clearly a non-ideal solution but fixing this is an issue for whole Free and Open Source community. See also Detecting Malicious Unicode in Source Code and Pull Requests

--exclude=LICENSE
--exclude=lkrg-openrc.sh

Could you review this please? @grass

First thing, I don’t know perl too much, but I can understand it. I tried to make grep print but it wasn’t working, so perl seems better for this, besides the fact that grep’s option -P stands for Perl, so we were already using it.

I used the tool to scan the files on GitHub - nickboucher/trojan-source: Trojan Source: Invisible Vulnerabilities, especially on the Bash dir. Github web interface does not show all of the unicode, you have to use a local editor or paste to a functional online viewer such as Bidi Viewer which is made by the same person.

Another point is the pattern:

SEARCH_PATTERN='[^[:ascii:]]|[\x{061C}\x{200E}\x{200F}\x{202A}\x{202B}\x{202C}\x{202D}\x{202E}\x{2066}\x{2067}\x{2068}\x{2069}]'

I don’t see the need for the second part of everything after the pipe |, because negating ascii characters will also contain the second part.

From this sample, using only [^[:ascii:]] detected all the problems. I did a diff also from the whole directory using the full pattern and only the non-ascii and it was the same.

One thing I don’t like is printing No spurious characters found because it gets in the way of the really important part, if there are spurious characters found. What do you think?

1 Like

Yes.

1 Like
1 Like

GUYS thank you, this is fire

1 Like