Smol Models
Published on Aug 18, 2024
What are smol models?
There isn’t a single definition of what a smol model is. However, the term is often used to describe a small language model. Usually, these models are less than 500 million parameters. Some good examples of smol models are:
Why are smol models interesting?
They enable near real-time inference. For tasks like chat, autocomplete, quick summarization, etc. They can also be run on mobile devices and in the browser. In short, they enable edge inference.
SmolComplete
Here’s a 30 minute fun app that I built using a smol model. It’s like GitHub Copilot meets Google Docs.
Try it out here.
Instant SmolLM
As part of the launch of the Hugging Face SmolLM-360M-Instruct finetune, I helped build a demo of it running in the browser. It showcases the speed and quality of a smol model.
Instant Qwen
Here’s a demo of Qwen-0.5B-Instruct running in the browser.
It gets about 50 tokens per second on my 2024 Mac Pro. Super fast!
Demo is running in the browser in real-time. It’s powered by a Qwen-0.5B-Instruct model with MLC WebLLM.
The Future
I think we will see more and more of this in the future. It will be silently embedded in your browser, PDF readers, and other apps.
It won’t even feel like you’re using AI. It will just be there summarizing things, helping you write, and answering questions.