Meta unveils a new large language model that can run on a single GPU

A dramatic, colorful illustration.
Benj Edwards / Ars Technica

reader comments
53 with

On Friday, Meta announced a new AI-powered large language model (LLM) called LLaMA-13B that it claims can outperform OpenAI’s GPT-3 model despite being “10x smaller.” Smaller-sized AI models could lead to running ChatGPT-style language assistants locally on devices such as PCs and smartphones. It’s part of a new family of language models called “Large Language Model Meta AI,” or LLAMA for short.

The LLaMA collection of language models range from 7 billion to 65 billion parameters in size. By comparison, OpenAI’s GPT-3 model—the foundational model behind ChatGPT—has 175 billion parameters.

Meta trained its LLaMA models using publicly available datasets, such as Common Crawl, Wikipedia, and C4, which means the firm can potentially release the model and the weights open source. That’s a dramatic new development in an industry where, up until now, the Big Tech players in the AI race have kept their most powerful AI technology to themselves.

“Unlike Chinchilla, PaLM, or GPT-3, we only use datasets publicly available, making our work compatible with open-sourcing and reproducible, while most existing models rely on data which is either not publicly available or undocumented,” tweeted project member Guillaume Lample.

Meta calls its LLaMA models “foundational models,” which means the firm intends the models to form the basis of future, more-refined AI models built off the technology, similar to how OpenAI built ChatGPT from a foundation of GPT-3. The company hopes that LLaMA will be useful in natural language research and potentially power applications such as “question answering, natural language understanding or reading comprehension, understanding capabilities and limitations of current language models.”

wrote independent AI researcher Simon Willison in a Mastodon thread analyzing the impact of Meta’s new AI models.

Currently, a stripped-down version of LLaMA is available on GitHub. To receive the full code and weights (the “learned” training data in a neural network), Meta provides a form where interested researchers can request access. Meta has not announced plans for a wider release of the model and weights at this time.

Article Tags:
Article Categories: