Smol Models

Hugging Face Smol Models

What are smol models?

There isn’t a single definition of what a smol model is. However, the term is often used to describe a small language model. Usually, these models are less than 500 million parameters. Some good examples of smol models are:

Why are smol models interesting?

They enable near real-time inference. For tasks like chat, autocomplete, quick summarization, etc. They can also be run on mobile devices and in the browser. In short, they enable edge inference.

SmolComplete

Here’s a 30 minute fun app that I built using a smol model. It’s like GitHub Copilot meets Google Docs.

SmolComplete

Try it out here.

Instant SmolLM

As part of the launch of the Hugging Face SmolLM-360M-Instruct finetune, I helped build a demo of it running in the browser. It showcases the speed and quality of a smol model.

Instant Qwen

Here’s a demo of Qwen-0.5B-Instruct running in the browser.

It gets about 50 tokens per second on my 2024 Mac Pro. Super fast!

Instant Qwen Demo is running in the browser in real-time. It’s powered by a Qwen-0.5B-Instruct model with MLC WebLLM.

The Future

I think we will see more and more of this in the future. It will be silently embedded in your browser, PDF readers, and other apps.

It won’t even feel like you’re using AI. It will just be there summarizing things, helping you write, and answering questions.