Finding Backdoors in Freedom Software vs Non-Freedom Software

Patrick · March 26, 2021, 7:18pm

new:

Table: Finding Backdoors in Freedom Software vs Non-Freedom Software

Used here:

madaidan · March 26, 2021, 10:10pm

This table is also incredibly misleading. It’s no more difficult to discover backdoors in proprietary software than it is in open source software. Even if you were to discover a vulnerability, there would be no way to determine if it was intentional or not. I’d recommend reading https://web.archive.org/web/20210112132451/https://blog.blueboxsec.org/post/the-illusion-of-open-source/

Also see:

Reverse engineering is not “exponentially more difficult”; it’s just different. There are many people that are highly experienced at such.

There’s also no guarantee that open source software hasn’t been obfuscated. A project can easily insert misleading variable/function names, useless/spaghetti code, deceptive comments, etc. to throw you off. It’s significantly more thorough to reverse engineer / decompile a program where all those things would be stripped and you’re left with the exact untouched code of the program.

Likewise, there is no guarantee that proprietary programs are implementing obfuscation either.

A relevant excerpt from the link above:

Proprietary software is not an unauditable black box. It can be extensively audited and regularly is. Closed source projects can still be reverse engineered, fuzzed and so on. Reverse engineering a program actually allows you to analyze how it works far more thoroughly than simply reviewing the published source code can. You’re seeing exactly how the compiler configures things and how everything works at a deep level. It’s for this reason why many programs that are open source are still reverse engineered anyway to audit them. Reverse engineering also isn’t extremely difficult; it may seem daunting at first but once you get used to it, assembly can seem just like another programming language. Of course, obfuscation at the binary level is totally different however.

Finally, if you couldn’t modify proprietary software, then piracy, game modding, exploits, etc. wouldn’t be a thing and DRM would be useless.

Both proprietary and open source software can be thoroughly audited. The techniques in which to do so are simply different and most likely, you’re not going to uncover a legitimate backdoor either way. You always have to 100% trust the software you execute, regardless of the source model. If you don’t trust it, then that’s just it. There’s no way around it.

Please read what actual security researchers say on this topic.

Patrick · March 27, 2021, 5:49pm

I agree with that on the template Template:Backdoors - Whonix already. Called bugdoor.

If you believe that to be true… Then why share a link with title Open source: Almost one in five bugs are planted for malicious purposes? How would they know?

Reading that article didn’t made me any wiser.

So I’ve followed the article to the source.
Octoverse 2023: The state of open source | The State of the Octoverse

Downloaded the PDF.

Most software vulnerabilities are mistakes, not malicious attacks. Analysis on a random sample of 521 advisories from across our six ecosystems found that 17% of the advisories were related to explicitly malicious behavior such as backdoor attempts. These malicious vulnerabilities were generally in seldom-used packages, but triggered just 0.2% of alerts. While malicious attacks are more likely to get attention in security circles, most vulnerabilities are caused by mistakes.

Analysis on a random sample of 521 advisories from across our six ecosystems finds that 17% of the advisories are related to explicitly malicious behavior such as backdoor attempts. Of those 17%, the vast majority come from the npm ecosystem. While 17% of malicious attacks will steal the spotlight in security circles, vulnerabilities introduced by mistake can be just as disruptive and are much more likely to impact popular projects. Out of all the alerts GitHub sent developers notifying them of vulnerabilities in their dependencies, only 0.2% were related to explicitly malicious activity. That is, most vulnerabilities were simply those caused by mistakes.

That doesn’t really make your point? Please quote more specifics should I have missed something.

Template:Backdoors: Difference between revisions - Whonix

Indeed. That’s a problem. But not an unsovable one. In that case, as you’ve added to Issues with PGP in such cases reviewers can make statements such as (freely re-quote from memory) “gnupg is a museum of obsolete legacy cryptography that should be replaced”. Some people in Debian are now considering to replace use of gpg in APT. Progress can be made. Alternative applications such as signify have been developed.

Obviously not all precompiled proprietary software is using obfuscators. No such claim was made.

But there was one mistake which was now fixed. For table entries…

Depends. Some use binary obfuscators.

Depends. Some use anti-decompiler / obfuscators.

Should be yellow, not red background code since it doesn’t apply to all. Perhaps this was the issue.

No such claim was made.

Again agreement. It surely can. The critical point is in difficulty, feasibility, price.

From the footnote:

How many people decompiled for example Microsoft Office and kept doing that for every upgrade?

That’s one of the points I want to make: for proprietary precompiled binaries, you can only do reverse engineering. For Freedom Software you can do that too plus on top review the source code.

No such claim was made that no modification of proprietary software is possible whatsoever. That would be an absurd, indefensible statement. Surely people managed over and over again of disabling things such as serial key checks.

Maybe it’s about this…?

Third parties can legally software fork [archive], release a patched version without the backdoor

Well, a legal software fork is something very different from someone anonymously publishing a crack to disable serial key checks in proprietary software.

Added a table entry to acknowledge that.

I don’t weight so much the authority if I don’t have to. Skipping that and going straight for the actual arguments. If these are true, and if understanding these, it doesn’t matter if name is attached to it. But if you have interesting links, feel free to post.

Just picking the first two arguments coming to mind, my conclusion from Fully Countering Trusting Trust through Diverse Double-Compiling (DDC) - Countering Trojan Horse attacks on Compilers and https://reproducible-builds.org/ is that Freedom Software is the way to go. Proprietary precompiled binaries aren’t helpful, won’t increase security.

I hope we won’t split hair over the word “exponentially”. Perceived difficulty for humans cannot really be quantified in numbers. Therefore no real comparison if exponential or just a multiple or something.

Reverse engineering is for sure much more difficult.

Some examples…

Take all the Debian package maintainer scripts. Are these easier to review as is, most of them are written sh or bash or if these are converted to a program written in C, closed source, precompiled?

Do we prefer if OnionShare stays written in python, Open Source or do we prefer the project turned into a precompiled binary?

With such Open Source projects it’s useful to do a full review (of one package or a few) once, and then keep monitoring the diff, any source code changes.

In comparison to proprietary precompiled binaries one could review the disassembly but for subsequent releases that’s duplicating the effort. The disassembly isn’t optimized to change as little as possible or to be human understandable. If the compiled added new optimizations, compilation flags changed, that creates a much bigger diff of the disassembly.

I don’t know any usability studies of programming languages that objectively proof that assembler is more difficult than python. Until then, difficulty is subjective. Sure, an expert reverse engineer might say it’s easy. On difficulty, let people decide for themselves. Provide them with examples of lets say assembler code and python code. Then see which language is easier to understand and/or faster to master. Or ask people who speak multiple programming languages why they don’t do something with another programming language. For example, “usually” people don’t write let’s say messengers or websites in assembler. Sure, it’s possible. A few people might actually do that. But most will explain why it’s the wrong tool for it and better only used in cases it’s really needed.

Related: Bad Usability of Programming Languages

Citation required.

And certainly not as many as these capable of reviewing above examples as is.

madaidan · March 27, 2021, 6:12pm

It depends on who is implementing it. Technically capable people like Microsoft or Linux kernel developers are certainly possible of implementing backdoors in such a way that you cannot differentiate it from a normal bug.

It’s still not true. Proprietary software isn’t any more likely to be obfuscated than open source software and vice versa.

It’s being heavily implied.

Convenience of spotting backdoors | lowest convenience
Difficulty of spotting a “direct” backdoors | much higher difficulty
Difficulty of spotting a bugdoor | very much higher difficulty
Qualified individuals can use static code analysis tools | No
Qualified individuals can judge source code quality | No
Qualified individuals can find logic bugs in the source code | No
Can benefit from worldwide wisdom of the crowd | No

Most of the above are blatantly untrue.

Can always modify the software | No

It’s not so much their credentials but rather the arguments they make and the evidence provided.

It’s not. It’s completely subjective. To some people, bash code is more difficult to understand than C code — does that mean bash is the most difficult language ever? No because it’s subjective. It depends on the programmer’s preference and what they’re used to. There are plenty of people who are highly experienced in reverse engineering, more than they are in other languages that would be classed as “simple”.

That’s like asking for a citation of “many people are experienced at C”. In the other thread, I linked multiple examples of people reverse engineering Windows. I personally have friends who are experienced at reverse engineering. You should do more research on this. Overstating the advantages of FOSS isn’t going to help the project’s credibility and you have a responsibility to know what you’re doing before putting this stuff on the wiki.

Patrick · March 27, 2021, 7:25pm

No claims about likelihood are being made. Yet. That would be interesting if there was some statistic but in absence of that, leaving that out.

What’s the point of using an obfuscator on an Open Source program? No point whatsoever.
(Ignoring nitpicking about hypothetical corner cases such as an Open Source obsucators for a demonstration.)

What’s the point of using an obfuscator on an closed source program? Makes a lot more sense. Those who do want do make the disassembly to be readable as difficult as possible.

What happen to let’s say Skype… https://www.oklabs.net/skype-reverse-engineering-the-long-journey/ … Didn’t happen to let’s say Firefox.

Qualified individuals can find logic bugs in the source code | No

Can anyone find logic bugs in the source code of Edge? No, because of this time that source code isn’t available to the public. (At least not to my knowledge. If it is, pick another example if something that really is closed source.) And disassembly code isn’t source code. I don’t see what’s blatantly untrue about that.

“Qualified individuals”… Well. Obviously employees working for the company working on that code can. Members of general public cannot. If that’s the issue, I am happy to clarify that in the table if there are good rewording suggestions.

Then show me that.

Sure but by numbers of people sharing the same opinion on what’s simple and what is difficult… There will be more people preferring to review for example Open Source python code than the same as disassembly. And if more people find python easier than assembler, then python has objectively more people who say it’s easier. I am not sure that would also proof “python is objectively easier than assembler”. That would probably boil down to word definition games on what it means to be “easier”.

I disagree on overstating.

On popularity, not credibility: …well, obviously it isn’t going to score any points with people who consider FOSS a nusiance.

Sure.

I still don’t see big issues with it. Perhaps some misunderstandings can be avoided, some things better worded but good enough overall. No groundbreaking new claims. It’s all known stuff and debated for ages. The main point I wanted to work out, “yes, both proprietary precompiled binaries and Open Source can be audited but it’s a qualitative difference” was illustrated in a way that I haven’t seen elsewhere yet.

It was a private draft initially. After initial private feedback, I put it on the wiki but without adding any links to it. After privately asking multiple people to review it, I actually added it to links which are actually likely of being read and posted in forums for further feedback.

It’s highly unlikely that any opinions of proponents with very different viewpoints will be changed. The feedback however might inspire to improve the writeup of this viewpoint.

As “A common misconception in the security community is that because a certain software is open source, then it must be secure.” might have motivated you to write https://web.archive.org/web/20210112132451/https://blog.blueboxsec.org/post/the-illusion-of-open-source/ similarly I was over the years reading “you can audit closed source too”, and a lot of the resulting discussion of that. Which then motivated me to create the page related to this forum thread.

That article makes interesting points.

Backdoors are not going to be obvious. Backdoors are not:

// backdoor
steal_user_data();

That’s an important point to make. And often true. But not always true. I’d like to find (or invent if must) terms for such “direct” backdoors (for lack of better terms) vs bugdoors. The disadvantage of bugdoors for attackers is that these are not as convenient to exploit.

Some “direct” backdoors exist. Some are listed here: Backdoor (computing) - Wikipedia

One interesting “direct” backdoor was this bitcoin copay wallet backdoor. “If more than 100 BTC, steal it. Otherwise, don’t bother.”

Arguably not a bugdoor. It was obfuscated code that would have been caught with code review and a sane software supply chain. In comparison with a related vulnerability (I am not calling that one a bugdoor), electrum with its fake update message vulnerability was a lot more obvious (still a very bad bug, people lost money, I am not downplaying it), and easier to catch. The “steal only wallets with more than 100 BTC” bug could take a lot longer to catch by only looking at user reports. In electrum case, random public users were attacked. In copay case, only wallets with >100 BTC.

madaidan · March 27, 2021, 7:58pm

“Usually”

Hiding specific design ideas, obscuring malicious functionality, etc. As a popular example, look at obfuscated JavaScript code. The code is public and can be seen by clicking “View Page Source” but it’s still obfuscated and hard to review.

It’s misleading. You can find logic bugs in the disassembled code so it doesn’t matter.

One example is Daniel Micay who has gone in-depth about this on multiple occasions before. Search for key words on https://nitter.snopyta.org/DanielMicay/search, https://redditcommentsearch.com/ or https://freenode.logbot.info/?q=grapheneos

It’s not good enough. It’s very misleading. You created a table in which only FOSS would be favoured and severely downplayed the ability of inspecting proprietary programs.

Bugdoors can be very convenient to exploit. The Linux kernel has had trivial remote root vulnerabilities before. You simply cannot determine the difference between legitimate backdoors and innocent bugs. Backdoors in practice are going to be sneaky memory corruption vulnerabilities in parts of the code that are rarely looked at. Heartbleed would be a good example — a buffer over-read in a part of the code that was never actually used. It could have been a backdoor but we’ll never know.

When it comes to this, proprietary and open source are no different. If anything, open source grants more plausible deniability because the popular view is, “Open source software is 100% unbackdoorable and thus, this vulnerability was just an innocent bug. Just read the code if you want to find backdoors lol.”

Patrick · March 28, 2021, 1:01pm

What are we talking about? Obfuscation as in using a code obfuscator, anti-disassembly, anti-debugging, anti-VM or badly written code?

Any examples of Freedom Software obfuscated by code obsuscators, anti-disassembly, anti-debugging, anti-VM?

Obfuscated JavaScript code isn’t Freedom Software. A popular Freedom Software JavaScript jQuery is not obfuscated in its source code repository.

What you can see on websites might be processed using uglify-js or minify using the webapp (content generator) or web server software (example: apache/nginx-pagespeed). That’s similar to compilation of source code into binary. Such “obfuscation” would be similar to translating source code into binary.

There are indeed software freedom issues with websites. From the user’s perspective, even if the web server is based on Freedom Software… Users essentially use our browser to run “binaries” (websites) and there is no strong technical requirement for such a design. Any website could be seen as a “SaaS (Software as a Service)” or SaaSS (Service as a Software Substitute).

From a software freedom maximization perspective it would be desirable to somehow convert websites who wish to participate into local Freedom Software programs that deployed as source code, locally “compiled” if necessary and locally executed (browsing). If I could duplicate myself, I’d campaign and develop for that. For example, a start might be if users’ browsers could have a setting to request unoptimized (not minify, not uglify-js) (at the expensive of their own local performance). A lot to consider. But getting off-topic.

Added:

Third parties can find logic bugs in the disassembly | Yes | Yes

It matters, due to higher difficulty.

Too much. Specific/direct links/quotes required.

It’s not about downplaying. It’s about showing the difference between the two.

As for difficulty, I’ve just today added a lot more reasoning for that.

Patrick · March 28, 2021, 1:04pm

Are you referring to compiler-induced vulnerabilities, security flaws caused by compiler optimizations?

madaidan · March 30, 2021, 10:35pm

It’s still misleading. One such quote:

The GNU Hello [archive] program source file hello.c [archive] at time of writing contains 170 lines. The objdump -d /usr/bin/hello on Debian buster has 2757 lines.

This just isn’t how reverse engineering works. Not all 2757 lines are supposed to be reviewed. For example, take the following code:

int main() { return 0; }

When disassembled, objdump displays 150 lines:

objdump -d example | wc -l

150

However, the vast majority of this isn’t relevant. The real code is:

0000000000001119 <main>:
    1119:	55                   	push   %rbp
    111a:	48 89 e5             	mov    %rsp,%rbp
    111d:	b8 00 00 00 00       	mov    $0x0,%eax
    1122:	5d                   	pop    %rbp
    1123:	c3                   	ret

5 instructions. Drastically different from the original 150. Many instructions are compiler optimizations, security features (such as stack cookies or CFI), etc. and can be safely ignored when reviewing a program. You’re vastly overstating the difficulty of assembly and it’s unfair.

Also: https://blog.cmpxchg8b.com/2020/07/you-dont-need-reproducible-builds.html (from Tavis Ormandy, another well-known security expert)

Not necessarily. There are many scenarios in which it can be useful.

Patrick · March 31, 2021, 12:35pm

Right, large portions can (and even must be to safe time) be skipped. Every audit needs a scope. It cannot be “audit everything”. There are repetitive things. Similar things. Examples of things to skip include code of during the audit trusted, out of scope dependency libraries. One needs a mental model, rough overview what the program is generally doing. At that stage, one would catch a “direct” backdoor. Next is to identify attack surface, how untrusted inputs are processed and how specifically crafted inputs could have unexpected outcomes. That goes for any review, disassembly or source code.

The GNU Hello [archive] program source file hello.c [archive] at time of writing contains 170 lines. The objdump -d /usr/bin/hello on Debian buster has 2757 lines.

It’s to showcase the amount of education required.

madaidan:

0000000000001119 <main>:
    1119:	55                   	push   %rbp
    111a:	48 89 e5             	mov    %rsp,%rbp
    111d:	b8 00 00 00 00       	mov    $0x0,%eax
    1122:	5d                   	pop    %rbp
    1123:	c3                   	ret

And that is far harder than this:

For any program that does real things lines of codes of disassembly code to review will be far greater than it’s representation in high level abstraction source code.

Most difficult to least difficult:

hand written object code
disassembly code
assembler source code
C source code
ruby

Experiment. Introduce people to reviewing disassembly code, teach assembler programming, teach C and teach ruby. Then see which one is the easiest to learn by most people. Try hello world in assembler language vs ruby.

https://blog.cmpxchg8b.com/2020/07/you-dont-need-reproducible-builds.html

I’ll go through a few points.

You don’t need reproducible builds.

Hyperbolic title and contradicted later in the article.

Current in Debian and most if not all other operating systems, a compromised build machine could introduce a bugdoor or even “direct” backdoor, malware during compilation. Such introduced malware in the binary would be difficult to spot.

Build machines could be compromised for example by insiders, an evil maid, remote attacks. In case of Debian, even an honest maintainer and otherwise fully honest developer community would’t necessarily notice.

Such malicious third parties can be kept out due to use of reproducible builds.

The problem with this scenario is that the user still has to trust the vendor to do the verification. If the trusted vendor is compromised, then they can provide tampered binaries. If they’re not compromised, then there was no benefit to reproducing it with third parties.

Trust isn’t yes/no.
Vendor for example in case of Debian isn’t a monolithic entity.

At the moment the build machine, the build machine administrator, the server center where the build machine might reside, etc. are in a position to backdoor a binary. With reproducible builds this could be prevented.

Now if the vendor is compromised or becomes malicious, they can’t give the user any compromised binaries without also providing the source code. This ignores some complexities, like ensuring security updates are delivered even if one vendor is compromised, what to do if the reproducers stop working, or how to reach consensus if the reproducers and your vendor disagree on what software or fork you should be using.

If the vendor is compromised, stop upgrading. Wait for the issue to be resolved. Use a different operating system. Obviously reproducible builds cannot help against the vendor being coerced to do bad things. But at least it can be noticed and user can make an informed decision.

Regardless, even if we ignore these practicalities,

Yes. Safely ignored for now. These issues seem solvable, theoretic, …

the problem with this solution is that the vendor that was only trusted once still provides the source code for the system you’re using. They can still provide malicious source code to the builders for them to build and sign.

I rest my case. Thanks for confirming, that reproducible builds can move the issue from binaries to source code. That’s the point of the exercise.

Q. It’s easier to audit source code than binaries, and this will make it harder for vendors to hide malicious code.

I don’t think this is true, because of “bugdoors”. A bugdoor is simply an intentional security vulnerability that the vendor can “exploit” when they want backdoor access.

At least the issue of introducing extra backdoors at the binary level / build machine compromise can be resolved.

Q. Build servers get compromised, and that’s a fact. Reproducible builds mean proprietary vendors can quickly check if their infrastructure is producing tampered binaries.

Ignoring since only on proprietary software.

Q. If a user has chosen to trust a platform where all binaries must be codesigned by the vendor, but doesn’t trust the vendor, then reproducible builds allow them to verify the vendor isn’t malicious.

I think this is a fantasy threat model. If the user does discover the vendor was malicious, what are they supposed to do?

Stop installing upgrades. Monitor the situation. Share information with others. Change the vendor to one that isn’t malicious.

Q. Whether it’s useful for end users or not, it will allow experts to monitor for compromised build servers producing tampered builds.

I think this is true,

Great!

but there are other attacks against compromised build servers, all of which are more common than producing tampered builds.

What other attacks against build servers?

More often, attackers want signing keys so they can sign their own binaries,

Attacker singing own binaries is what would happen if a Debian build server was compromised. Reproducible builds would stop that.

Compromise of Debian APT repository signing key would be a disaster but it’s an unrelated security issue. That would be survivable in theory too with end-to-end signed debs. debsign, debsig and dpkg-sig. I hope this will be tackeld next after reproducible builds.

steal proprietary source code,

Non-applicable to Freedom Software.

inject malicious code into source code tarballs,
or malicious patches into source repositories.
Reproducible builds don’t help with any of those problems.

Of course not. Reproducible builds are to force backdoor attempts to target the source code where it’s easier to spot, not the binary. However, one cannot needlessly allow issue to act as blocker for another issue.

In summary, not convincing at all. The blog post is making the case of reproducible builds. Not the case against reproducible builds.

Btw also Microsoft is going for reproducible builds. Quote:

Why are the module timestamps in Windows 10 so nonsensical? - The Old New Thing

One of the changes to the Windows engineering system begun in Windows 10 is the move toward reproducible builds. This means that if you start with the exact same source code, then you should finish with the exact same binary code.

madaidan · March 31, 2021, 5:18pm

Not much education is required. It’s easy to skip 90% of it. You can trivially skip the vast majority by disassembling only the relevant functions. E.g.

gdb -batch -ex 'file /path/to/example' -ex 'disassemble main'

(would use objdump but Debian’s ancient version doesn’t support that functionality yet)

Not really. It’s the same code, just displayed a bit differently. Even within those 5 instructions, most can be skipped. Only 2 instructions are relevant:

mov $0x0,%eax
ret

It stores the value 0x0 (i.e. “0”) in the CPU register eax and then returns from the function, giving it a return value of 0.

Lines of code doesn’t determine difficulty/complexity. In assembly, there are only a few simple instructions (like mov, jmp, etc.) that are used repeatedly throughout the entire program. In my example above, the assembly would have had less lines of code than the C program. I shortened the code by fitting everything into 1 line but if I hadn’t, it would be something like:

int main()
{
  return 0;
}

That’s 4 lines whereas the assembly you need to look at is 2:

mov $0x0,%eax
ret

Going by your logic, the C code would be double the complexity of the assembly. That’s obviously not true since you can safely ignore the function definition and brackets but if you can ignore those then why can’t you ignore unnecessary assembly instructions?

Which doesn’t matter since introducing backdoors in the source code is just as easy as in binaries.