The open secret of open washing – why companies pretend to be open source
Allowing pretenders to co-opt the term is bad for everyone
Opinion If you believe Mark Zuckerberg, Meta's AI large language model (LLM) Llama 3 is open source.
It's not, despite what he says. The Open Source Initiative (OSI) spells it out in the Open Source Definition, and Llama 3's license – with clauses on litigation and branding – flunks it on several grounds.
Meta, unfortunately, is far from unique in wanting to claim that some of its software and models are open source. Indeed, the concept has its own name: open washing.
This is a deceptive practice in which companies or organizations present their products, services, or processes as "open" when they are not truly open in the spirit of transparency, access to information, participation, and knowledge sharing. This term is modeled after "greenwashing" and was coined by Michelle Thorne, an internet and climate policy scholar, in 2009.
With the rise of AI, open washing has become commonplace, as shown in a recent study. Andreas Liesenfeld and Mark Dingemanse of Radboud University's Center for Language Studies surveyed 45 text and text-to-image models that claim to be open. The pair found that while a handful of lesser-known LLMs, such as AllenAI's OLMo and BigScience Workshop + HuggingFace with BloomZ could be considered open, most are not. Would it surprise you to know that according to the study, the big-name ones from Google, Meta, and Microsoft aren't? I didn't think so.
But why do companies do this? Once upon a time, companies avoided open source like the plague. Steve Ballmer famously proclaimed in 2001 that "Linux is a cancer," because: "The way the license is written, if you use any open source software, you have to make the rest of your software open source." But that was a long time ago. Today, open source is seen as a good thing. Open washing enables companies to capitalize on the positive perception of open source and open practices without actually committing to them. This can help improve their public image and appeal to consumers who value transparency and openness.
Some corporations use open washing to shield their models and practices from scientific and regulatory scrutiny while benefiting from the "open" label.
Another major factor is that the EU AI Act provides special exemptions for "open source" models. This creates a powerful incentive for open washing: if their models count as open, they'll have far less restrictive requirements. That, in turn, means they'll need less money to meet regulatory requirements or have to clean their datasets of copyright and other intellectual property (IP) issues.
However, the EU still doesn't have a clear definition of open source AI. In all fairness, no one does yet. The OSI will release its open source AI definition in the next few days. That said, the current crop of open washing licenses fail by anyone's definition – other than their creators.
That's not to say all the big-name AI companies are lying about their open source street cred. For example, IBM's Granite 3.0 LLMs really are open source under the Apache 2 license.
Why is this important? Why do people like me insist that we properly use the term open source? It's not like, after all, the OSI is a government or regulatory organization. It's not. It's just a nonprofit that has created some very useful guidelines.
- WinAmp's woes will pass, but its wonders will be here forever
- Elon Musk's disaster relief promises: Should we believe the hype?
- OpenAI reportedly considering for-profit plans, but what would that be good for?
- CockroachDB scurries off to proprietary software land
As Dan Lorenc, CEO of security company Chainguard, said in his keynote speech at the Secure Open Source Software (SOSS) Fusion Conference in Atlanta this week, no one can "force you to use the OSI's definitions." But "fortunately, many people, particularly lawyers, believe in this definition. They trust the work that the OSI does, and they trust and understand the protections that companies are granted when they use these licenses when they meet the open source criteria. That's why we see it showing up in procurement contracts of big companies all over the world."
Open source isn't just a legal and business matter. Open source gives developers the freedom to operate the way they do. Without it, they'll "lose the benefits that we've all grown accustomed to of being able to freely use code without having to know about or care about all the different terms in these licenses."
If we need to check every license for every bit of code, "developers are going to go to legal reviews every time you want to use a new library. Companies are going to be scared to publish things on the internet if they're not clear about the liabilities they're encountering when that source code becomes public."
Lorenc continued: "You might think this is only a big company problem, but it's not. It's a shared problem. Everybody who uses open source is going to be affected by this. It could cause entire projects to stop working. Security bugs aren't going to get fixed. Maintenance is going to get a lot harder. We must act together to preserve and defend the definition of open source. Otherwise, the lawyers are going to have to come back. No one wants the lawyers to come back."
I must add that I know a lot of IP lawyers. They do not need or want these headaches. Real open source licenses make life easier for everyone: businesses, programmers, and lawyers. Introducing "open except for someone who might compete with us" or "open except for someone who might deploy the code on a cloud" is just asking for trouble.
In the end, open washing will dirty the legal, business, and development work for everyone. Including, ironically, the shortsighted companies now supporting this approach. After all, almost all their work, especially in AI, is ultimately based on open source. ®