Skip to content

Open-Weight vs Open-Source AI

The Landscape 6 min read

In Short

Open-weight means you can download and run a model's trained weights, but not the data or recipe behind them. Open-source AI is a stronger standard that adds the training data information, code, and recipe needed to rebuild and study the model. Almost every model you hear called open, including Meta's Llama, is open-weight only, while fully open-source families such as OLMo and Pythia are rarer. The difference decides how far you can trust and reproduce a model and what its license lets you do.

Snapshot caveat: The core definitions, four freedoms, and license clauses are stable. What shifts is which models sit in each camp, plus version numbers and market figures. Re-check a model's own license and the OSI list before relying on specifics. Reflects June 2026.

01. What It Is

A trained AI model is, at bottom, a huge list of learned numbers, its weights or parameters. "Open-weight" means a company publishes those numbers so anyone can download, run, and fine-tune the model, while the training data and full recipe usually stay private.

"Open-source AI" is a stronger standard. As the Open Source Initiative (OSI) defines it, a system must release the weights plus enough information to reproduce and study them. Releasing the weights alone does not qualify.

02. Why It Matters

The distinction settles four practical questions. One is trust and provenance, since you cannot check an open-weight-only model for bias or copyrighted material without the data. Another is reproducibility, possible only when the data, code, and recipe are released. A third is commercial rights, set by the license, not by whether the weights are downloadable. The fourth is safety and accountability, which needs the training data, or information about it, open to audit.

03. How It Works

The spectrum from open-weight to fully-open

Openness runs along a range, not a yes-or-no switch. A closed model gives no weights at all, reachable only through an API or product. An open-weight model gives you the finished numbers to run, but not the data or recipe behind them.
Meta, Alibaba (Qwen), DeepSeek, and Mistral all do this, the pattern the model-landscape-2026 file maps across the industry. A fully open-source model adds the data information, training code, and recipe, so the whole system can be rebuilt and inspected.

The OSI Open Source AI Definition (OSAID 1.0) and the data-disclosure dispute

No agreed meaning for "open source AI" existed until October 28, 2024, when OSI published version 1.0 of its Open Source AI Definition, the first formal industry definition. It came from a year-long co-design process involving more than 25 organizations, including Mozilla, the Linux Foundation, the Apache Software Foundation, and major technology companies.

The definition frames openness as four freedoms adapted from the Free Software Definition. Can you use the system for any purpose without permission, study it, modify it, and share it, changed or unchanged? Only a yes to all four qualifies.

To satisfy them, OSAID requires three things as the "preferred form to make modifications." Those are Data Information, the Code to train and run the system, and the Parameters (the trained weights). Releasing weights alone fails.

The argued-over part is "Data Information." OSAID does not require publishing the training dataset itself, only a description detailed enough for a skilled person to build a substantially equivalent system, plus a listing of the public and third-party data sources. The data can stay private.

That compromise is disputed. Defenders say the required Data Information keeps the recipe open even when the dataset stays private. Critics including Bradley Kuhn of the Software Freedom Conservancy call closed training data a loophole, and the Free Software Foundation is drafting separate criteria for "free machine learning applications."

License types and restrictions

"You can download it" is not "you can do anything with it." The license sets the real terms, in two broad groups.

Permissive, OSI-approved licenses come with few strings. Apache 2.0 allows free use, modification, redistribution, and commercial use with no field-of-use restriction, plus an explicit patent grant. Qwen3, gpt-oss, OLMo, and Pythia all ship under it. MIT is simpler still, with minimal conditions and no patent grant.

Community licenses add conditions permissive ones do not. Meta's Llama 4 Community License, effective April 5, 2025, grants broad rights to use, modify, and distribute, yet it is a proprietary license, not OSI-approved. A licensee with more than 700 million monthly active users in the month before a release must request a separate license, granted at Meta's "sole discretion." Redistributors must display "Built with Llama," prefix any model trained on its outputs with "Llama," and reproduce a copyright notice under Meta's brand guidelines. It also adds an Acceptable Use Policy forbidding activities such as illegal acts, violence, and disinformation. That use restriction is the main reason OSI does not call Llama open source, since OSI requires use "for any purpose."

04. Key Terms

Term Plain-language meaning
Weights (parameters) The learned numbers that make a trained model work. Publishing them is "open-weight."
Open-weight model A model whose trained weights are downloadable, usually without the data or recipe.
Open-source AI OSI's stronger standard. Weights plus the data information, code, and recipe to rebuild and study it.
Training data What a model learned from. The main dividing line between open-weight and open-source.
License The legal terms on a model. Permissive ones (Apache 2.0, MIT) allow almost any use. Community ones add conditions.
Reproducibility Rebuilding a model to a substantially equivalent result. Needs the data, code, and recipe.
Acceptable Use Policy A forbidden-use list in some licenses. It bars OSI open source, which requires any purpose.

05. Examples

Open-weight only:
Most "open" models you have heard of sit here. Meta's Llama is the leading example, marketed as open yet governed by a community license. Alibaba's Qwen, DeepSeek, and Mistral also publish weights without their training data.

OpenAI's gpt-oss is a useful borderline case. On August 5, 2025, OpenAI released gpt-oss-120b and gpt-oss-20b under Apache 2.0, its first open-weight language models since GPT-2. The weights are freely downloadable, yet the training data was not, so they stay open-weight, not open-source by the OSI standard.

Fully open-source:
OLMo, from Ai2 (the Allen Institute for AI), releases weights, training code, training data, and recipes under Apache 2.0. The OLMo 3 line publishes a full "model flow" with downloadable pretraining, mid-training, and post-training data. This is possible because Ai2 openly published its pretraining dataset, Dolma, around 3 trillion tokens of web text, code, and books under the ODC-BY license.

EleutherAI's Pythia is a second fully-open family, built for reproducible research and released in April 2023. It is a suite of 16 models trained on the public Pile dataset under Apache 2.0, with 154 saved checkpoints per model, so a full training run can be studied.
To run an open-weight model yourself, see running-llms-locally.

06. Common Misconceptions

"Open-weight and open-source mean the same thing."
They do not. Open-weight releases only the finished weights. Open-source AI, per OSI, also requires the data information, code, and recipe to reproduce and study it.

"Llama is open source."
Meta markets it as open, but the Llama Community License is not OSI-approved. Its 700-million-user cap, Acceptable Use Policy, and branding rule put it outside the definition, so OSI calls it open-weight.

"If a model is open, I can use it for anything with no strings."
It depends on the license. Apache 2.0 and MIT are close to no-strings, while community licenses attach real conditions, so read it before building a product.

"I downloaded the model, so I can see how it was built."
The weights say almost nothing about how the model was trained or what data shaped it. Only fully-open models such as OLMo and Pythia ship the data, code, and recipe needed to audit or rebuild them.

"Fully open-source models are just weaker research toys."
They are often smaller than the frontier, but the distinction is about transparency and rights, not benchmark rank. For anyone who must audit training data for bias or copyright, or reproduce a result exactly, only full openness delivers it.