The Open Source Initiative (OSI) on Monday released the result of a year-long global community initiative to create a standard defining what, exactly, constitutes an open source artificial intelligence (AI) system.

The Open Source AI Definition (OSAID) v1.0, unveiled at the All Things Open 2024 conference in Raleigh, North Carolina, is described as “the first stable version” of a project to establish a set of principles that “that can recreate permissionless, pragmatic, and simplified collaboration for AI practitioners, similar to that which the Open Source Definition has done for the software ecosystem,” the OSI said in its FAQ.

Created via a co-design process involving more than 25 organizations, including leaders from commercial entities such as Microsoft, Google, Amazon, Meta, Intel, and Samsung, and groups including the Mozilla Foundation, Linux Foundation, the Apache Software Foundation, and the United Nations International Telecommunications Union, the document has already been endorsed by organizations worldwide.

“Coming up with the proper open source definition is challenging, given restrictions on data, but I’m glad to see that the OSI v1.0 definition requires at least that the complete code for data processing (the primary driver of model quality) be open-source,” said Percy Liang, director of Center for Research on Foundation Models, Stanford University, in a statement endorsing OSAID. “The devil is in the details, so I’m sure we’ll have more to say once we have concrete examples of people trying to apply this Definition to their models.”

OSI said that it is confident that its methodology has resulted in a standard that meets its original brief.

“The co-design process that led to version 1.0 of the Open Source AI Definition was well-developed, thorough, inclusive, and fair,” said Carlo Piana, OSI board chair, in a statement. “It adhered to the principles laid out by the board, and the OSI leadership and staff followed our directives faithfully. The board is confident that the process has resulted in a definition that meets the standards of Open Source as defined in the Open Source Definition and the Four Essential Freedoms, and we’re energized about how this definition positions OSI to facilitate meaningful and practical Open Source guidance for the entire industry.”

Four criteria for open source AI systems

The Open Source AI Definition specifies that, to be an open source AI, a system must fulfill four criteria derived from the Free Software Definition. The OSAID says,

“[The system] must be made available under terms and in a way that grant the freedoms to:

  • Use the system for any purpose and without having to ask for permission.
  • Study how the system works and inspect its components.
  • Modify the system for any purpose, including to change its output.
  • Share the system for others to use with or without modifications, for any purpose.”

The OSAID also specifies, “These freedoms apply both to a fully functional system and to discrete elements of a system. A precondition to exercising these freedoms is to have access to the preferred form to make modifications to the system.”

In addition, the OSAID describes the preferred form for modification of machine learning systems, specifying the data information, code, and parameters to be included.

However, the OSAID notes, “The Open Source AI Definition does not require a specific legal mechanism for assuring that the model parameters are freely available to all. They may be free by their nature or a license or other legal instrument may be required to ensure their freedom. We expect this will become clearer over time, once the legal system has had more opportunity to address Open Source AI systems.”

Even a company that has its own specification for open source AI, Nextcloud, is endorsing the OSAID, and intends to embed it into its spec. “Users of AI solutions deserve transparency and control, which is why we introduced our Ethical AI rating in early 2023. Now, we see big tech firms trying to hijack the term open source AI,” said Frank Karlitschek, CEO and founder of Nextcloud, in a statement. “We fully endorse the creation of a clear definition of open source AI by the community to protect users and the market.“

Questions and concerns

Analyst Brian Jackson, principal research director at Info-Tech Research Group, has some concerns, however.

“In reading the Open Source Initiative’s brief on what it views as the open source AI standard, a few big questions come to mind for me,” he said. “Their standards are clear and consistent with the previous standards for releasing open source software. There are some key differences to look at with AI — it has training data, model weights, and a new architecture that’s not covered by traditional open source software licenses. That’s why this sort of standard is needed.”

Jackson noted that models are still permitted to withhold their training data, with the explanation that a model still could be open source even if it was illegal to release the data, as it would be with medical data. “I accept that reasoning,” he said, “but it doesn’t address the problem of copyright-protected content being included in the training data.”

He’s also worried about the harms that could emerge from open source AI, such as deepfakes and “nudify” apps that let users take photos of people and generate fake nude images.

And, he added, “We’ve already seen real-world harm as a result of open source AI. Child Sexual Abuse Material (CSAM) is one example of a malicious use of open source AI. The Internet Watch Foundation has reported on the rise in activity of dark web forums trafficking the material, and the creators of it stating preferences to use open source image generation models to get more accurate results. Harms also include fraud perpetrated by bad actors using open source AI. These models can be modified to be more helpful in creating convincing deepfakes, tailoring phishing messages, or conducting automated searches for users with vulnerabilities.”

The co-designers are less concerned. “The new definition requires Open Source models to provide enough information about their training data so that a ‘skilled person can recreate a substantially equivalent system using the same or similar data,’ which goes further than what many proprietary or ostensibly Open Source models do today,” said Ayah Bdeir, who leads AI strategy at Mozilla, in a statement. “This is the starting point to addressing the complexities of how AI training data should be treated, acknowledging the challenges of sharing full datasets while working to make open datasets a more commonplace part of the AI ecosystem. This view of AI training data in Open Source AI may not be a perfect place to be, but insisting on an ideologically pristine kind of gold standard that will not actually be met by any model builder could end up backfiring.”

The OSI itself is satisfied with OSAID v1.0, and views it as a starting point for further work.

“Arriving at today’s OSAID version 1.0 was a difficult journey, filled with new challenges for the OSI community,” said OSI Executive Director, Stefano Maffulli, in a statement. “Despite this delicate process, filled with differing opinions and uncharted technical frontiers—and the occasional heated exchange—the results are aligned with the expectations set out at the start of this two-year process. This is a starting point for a continued effort to engage with the communities to improve the definition over time as we develop with the broader Open Source community the knowledge to read and apply OSAID v.1.0.”