reader comments
65 with
On Wednesday, Stability AI released a new family of open source AI language models called StableLM. Stability hopes to repeat the catalyzing effects of its Stable Diffusion open source image synthesis model, launched in 2022. With refinement, StableLM could be used to build an open source alternative to ChatGPT.
StableLM is currently available in alpha form on GitHub in 3 billion and 7 billion parameter model sizes, with 15 billion and 65 billion parameter models to follow, according to Stability. The company is releasing the models under the Creative Commons BY-SA-4.0 license, which requires that adaptations must credit the original creator and share the same license.
Stability AI Ltd. is a London-based firm that has positioned itself as an open source rival to OpenAI, which, despite its “open” name, rarely releases open source models and keeps its neural network weights—the mass of numbers that defines the core functionality of an AI model—proprietary.
“Language models will form the backbone of our digital economy, and we want everyone to have a voice in their design,” writes Stability in an introductory blog post. “Models like StableLM demonstrate our commitment to AI technology that is transparent, accessible, and supportive.”
Like GPT-4—the large language model (LLM) that powers the most powerful version of ChatGPT—StableLM generates text by predicting the next token (word fragment) in a sequence. That sequence starts with information provided by a human in the form of a “prompt.” As a result, StableLM can compose human-like text and write programs.
Like other recent “small” LLMs like Meta’s LLaMA, Stanford Alpaca, Cerebras-GPT, and Dolly 2.0, StableLM purports to achieve similar performance to OpenAI’s benchmark GPT-3 model while using far fewer parameters—7 billion for StableLM verses 175 billion for GPT-3.
The Pile, but three times larger. Stability claims that the “richness” of this data set, the details of which it promises to release later, accounts for the “surprisingly high performance” of the model at smaller parameter sizes at conversational and coding tasks.
In our informal experiments with a fine-tuned version of StableLM’s 7B model built for dialog based on the Alpaca method, we found that it seemed to perform better (in terms of outputs you would expect given the prompt) than Meta’s raw 7B parameter LLaMA model, but not at the level of GPT-3. Larger-parameter versions of StableLM may prove more flexible and capable.
In August of last year, Stability funded and publicized the open source launch of Stable Diffusion, developed by researchers at the CompVis group at Ludwig Maximilian University of Munich.
As an early open source latent diffusion model that could generate images from prompts, Stable Diffusion kickstarted an era of rapid development in image-synthesis technology. It also created a strong backlash among artists and corporate entities, some of which have sued Stability AI. Stability’s move into language models could inspire similar results.
Users can test the 7 billion-parameter StableLM base model Hugging Face and the fine-tuned model on Replicate. In addition, Hugging Face hosts a dialog-tuned version of StableLM with a similar conversation format as ChatGPT.
Stability says it will release a full technical report on StableLM “in the near future.”