Open Source Software Integrity - Should support from The People’s Republic of China worry us?

Donald Trump’s trade war with China is impacting confidence in areas Trump has probably never heard of. Regardless of whether Huawei is a real security threat or just guilty of out-competing US tech firms, US embargoes and sanctions are making life very difficult for users of Huawei technology, and people are wondering what other technologies may be affected.

Take OSS (Open Source Software). Someone has just asked me: “Suppose I’m a Chinese company that is the major commercial backer to an open source project. How easy would it be for me to sneak nefarious code of some sort into the software? I realise that in theory there is some sort of independent oversight committee that decides what gets into the released version but how effective is this and can it be circumvented? Does anybody really check?”

Well, the short answers are “not that easy”; “pretty effective”; “anything can be circumvented”; and “yes, people do check”. However, as you might expect, it’s a bit more complicated than this, and probably depends which OSS company one is talking about. Just as some COTS (Commercial Off The Shelf) software companies are more effective and more security-conscious than others, the same doubtless applies to Open Source Software companies.

It’s not, obviously, just a Chinese threat. The customer has to do “due diligence” on its suppliers, whether of COTS software or OSS You can’t treat software vendors as a homogeneous whole – what is true for The Apache Software Foundation, say, may not be true for the publisher of a simple Qr-code reader. And, if you are using OSS commercially (especially in a critical area) you should be using the supported “commercial open source” version, and subjecting it to effective Configuration Management as it applies to software assets. If you aren’t, all bets are off – but the same applies to COTS software. Would you let everybody install pirated commercial software at will? Or not read and keep track of the COTS software licences? Well, OSS has licences; and similar disciplines apply (although licence enforcement is less aggressive).

Let’s look at Apache, because the two companies highlighted by my enquirer are Ververica (previously dataArtisans, a German company) that was acquired by Alibaba, and which supports Apache Flink; and Esgyn, which was spun out from HPE and was US (and China) based but is now wholly run out of China, and which supports Apache Trafodion.

Well, there isn’t an OSS police force any more than there is a COTS police force. According to Sally Khudairi (VP Marketing & Publicity and VP Sponsor Relations at The Apache Software Foundation): “whilst a company may be seemingly “backing” an Apache project due to its use of it in its own products or offerings or having a number of team members involved in its development, the Apache Software Foundation (ASF) is strictly vendor neutral. Open Source projects and their communities seeking to become official Apache projects are often doing so for our community-led development process, “Apache Way” governance model, and independence. No organization is able to control an Apache project’s direction or gain special privileges, irrespective of whether they employ Committers to work the project or if they support the ASF as a Sponsor”.

I believe Sally, because I’ve met ASF developers and they are fanatical about vendor independence – and, remember, they can see the code. As can anyone who is worried about a vendor inserting “nefarious code”, an option not usually available to someone worried about the possibility of an NSA backdoor, say, in a COTS product.

In more detail, I am told that: “all Apache projects are overseen by a self-selected, self-governing Project Management Committee (PMC). By definition, PMCs and project communities *must* be diverse – if a project is unfairly balanced towards a particular vendor or organization, that is disallowed, and either ASF Operations (President, etc.) or the Board will intervene. Over our 21-year history, there have been a small handful of projects that were at risk due to an apparent lack of diversity, but that has been resolved. We take project independence and neutrality very seriously. We will not be swayed”.

As for the specific cases mentioned:

Apache Flink, is an open community which welcomes all contributors – see here. The Apache Flink VP (PMC Chair), Stephan Ewen, is co-founder/CTO of dataArtisans/Ververica – this is not uncommon for Apache projects that have a corporate entity built upon the technology, with the original developers still being active in the project’s future progress.
Trafodion is diverse, open, and has committers from several organisations. It welcomes a broad community of contributors – see here. Its VP, Pierre Smits, runs his own consultancy that’s unaffiliated with HP Labs, HPE, or Esgyn.

If either were being run and staffed by a secret cabal of Chinese spooks, they’d quickly fall foul of ASF diversity rules – and, once again, it is quite hard to hide nefarious code if anyone can read it. And, if something was wrong, and someone was trying to subvert their ASF openness, the Open Source Software community (which is open and uncontrolled by commercial interests) would hate them – peer-group social controls do work.

OK, that’s all fine, but suppose a “black hat” really did want to insert bad code? There are always boring parts of a codebase that no-one can face reading (although the ASF does have access to a fair few obsessive nerds).

I could imagine putting all new code through static code analysis as part of the commitment process; and even computing a hash on the codebase so that unauthorised changes can be detected. In fact, I think both would be a good idea (especially the static code analysis) for all code, not just OSS code – inserting nefarious code into COTS software is a possibility too, it just takes the employment of a very bright programmer with two employers, one of which you don’t like – but there are management and implementation issues. In the case of Apache OSS, there is the Apache Way governance process – there is a brief independent overview here but individual projects are responsible for their own code quality and I doubt if many have found such approaches necessary – yet.

Sally Khudairi again: “We don’t have a “code cop” process, so to speak: each Apache Project and their communities are responsible for their code. Should something emerge, of course the ASF Security committee is there to help. Most of our projects are very careful with review prior to releasing new versions. We have processes in place for committing to a code base, so it would be very unlikely that someone would be able to do something that was Considered Harmful to Apache without it being seen/known. QA, too, is up to each project. From a Foundation standpoint, we don’t have a directive to follow best practices on the QA front. Of course, our people and processes change, so there may be a chance that something like this might happen in future”

Obviously, there is a large degree of Trust in OSS communities on my part – but that applies to COTS software as well (how many Fitness for Purpose warranties do you see for COTS software?). Ultimately, it comes down to maturity and culture and I have just as much, if not more, faith in the responsibility and professionalism of OSS programmers as I have in COTS programmers. There is a lot of open accountability for OSS committers and it is hard to hide bad or lazy practice – not at all true, in my experience, of the COTS world. It is quite hard to bully OSS programmers (although someone has to be paying their mortgage) and they are, by definition, somewhat altruistic; whereas quite a few COTS programmers are motivated purely by money and some hate their employers. And the ASF has a Maturity Model that helps projects work towards best practices (with a “Quality” section). I wish all software vendors espoused something similar.

In conclusion, I’d like to emphasise that we are really talking about trusting software developers here, as any functioning Mutable Business must, and I’d make the point that OSS developers may well be inherently more trustworthy(on average) than COTS software developers – they are motivated by Doing Good rather than money, whereas COTS “wage slaves” are frequently disaffected and bullied. Read, for example, what Jim Whitehurst (past CEO of Red Hat, said (in his book, the Open Organization, part 1 chapter 2) about what originally made Red Hat special: “it’s how Red Hat is able to attract the most talented people and engage them to the best of their abilities; we have a mission we believe in. Some companies dabble in offering some open source products, but Red Hat people are absolutely passionate about the fact that the only products we sell are 100% open source, which is something we all believe is fundamentally good for the world”. I’m sure something similar applies to Apache and other OSS organisations (and I wonder how it’ll survive at Red Hat under IBM, although I have a good feeling about it). Open Source Software communities are special and inspire Trust.