Risks and Opportunities of Open-Source Generative AI
Citation: Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Aaron Purewal, Csaba Botos, Fabro Steibel, Fazel Keshtkar, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Imperial, Juan Arturo Nolazco, Lori Landay, Matthew Jackson, Phillip H. S. Torr, Trevor Darrell, Yong Lee, Jakob Foerster Risks and Opportunities of Open-Source Generative AI.
arXiv (preprint): arXiv:10.48550/ARXIV.2405.08597
Internet Archive Scholar (search for fulltext): Risks and Opportunities of Open-Source Generative AI
Wikidata (metadata): Q135644817
Download: https://arxiv.org/abs/2405.08597
Tagged:
Summary
Introduces an openness taxonomy for generative-AI pipelines that scores each component (pre-training data, SFT/alignment data, evaluation code/data, inference, architecture/weights) on a license-based scale (C1–C5 for code, D1–D5 for data) and applies it to 45 widely used LLMs. The analysis surfaces two empirical patterns: (O1) providers commonly open weights while keeping training and safety-evaluation data/code closed, and (O2) currently more-open models underperform leading closed systems on public leaderboards. The paper also frames risk discussion across near/mid/long-term horizons and argues that—with appropriate practices—opening more of the pipeline (including safety evals and logs) yields net benefits. An accompanying website https://open-source-llms.github.io/ tracks updates to the model taxonomy.
Theoretical and Practical Relevance
Gives researchers, platforms, and policymakers a component-level measurement of “open” they can operationalize today (license rubric + per-component scoring + model matrix) to compare releases and target where to open next (e.g., safety-evaluation code/data). The recommended practices—release training & safety-evaluation datasets, training/eval/inference code, intermediate checkpoints, and training logs; publish thorough documentation; and keep openness conditions largely voluntary—offer a concrete path to expand transparency without new hard legal constraints; the long-term section posits that greater openness can aid technical alignment and reduce extreme-risk scenarios.