Intelligence at Scale

Intelligence too cheap to meter changes things. You should probably be using small language models for your enterprise and data tasks.

9 min read
#ai

tldr; it’s better to throw 10M addresses at an extremely cheap LLM than to use any conventional parser. there are probably dozens of other tasks like this hiding in your business

It’s the intelligence age. Hyperscalers are burning billions to train ever-larger models. It’s July as I write this, and Moonshot AI just dropped a state-of-the-art open-weights model: Kimi-K2-Instruct at a cool 1T parameters.

Most of the noise is about pushing the frontier PhD-level models that ace math proofs and write code better than most engineers. Few people are talking about the small models. Especially the open-weights ones that are basically free at this point.

I recently had to parse 10 M addresses for a side project. Addresses, as you know, are sloppy, messy, incomplete, and nowhere near as standardized as you’d hope. Traditionally you had two choices:

  1. Roll your own parser (a terrible idea)
  2. Use something like libpostal

What is libpostal?

Libpostal is an open-source library that parses and normalizes street addresses worldwide. It’s a tiny model trained on ~1 B addresses across 230+ countries.

Example:

Input: 48 1/2 Grant St, Unit A, St Augustine, FL 32084
Output:

json
{
  "house_number": "48 1/2",
  "road": "Grant St",
  "unit": "Unit A",
  "city": "St Augustine",
  "state": "FL",
  "postcode": "32084"
}

Experience with libpostal

In practice, the model isn’t as accurate as the docs suggest. You still download a multi-gig model and dataset, figure out how to serve it, etc. Infrastructure wasn’t a blocker—I just spun up serverless containers on Modal—but the results were rough. Postal codes off, unit numbers dumped into the road field, entire states missing.

I dug deeper and found Senzing Inc’s improved model. Better, but still missed the long tail of weird addresses I was seeing.

Using an LLM

So I asked: what if I just throw an LLM at it? Overkill, sure, but let’s do the napkin math.

Prompt + address ≈ 90 input tokens
JSON reply ≈ 30 output tokens
10 M addresses:

Model Input (900 M) Output (300 M) Total Cost
Claude Sonnet $2,700 $4,500 $7,200
GPT-4.1 $1,800 $2,400 $4,200
Gemini 2.5 Flash $270 $750 $1,020
liquid/lfm-7b $9 $3 $12

I tossed in the big proprietary names plus liquid/lfm-7b (shout-out to Maxime Labonne at Liquid AI). The scale is staggering. Twelve bucks to chew through 10 M addresses—probably an order of magnitude cheaper than wrangling the old-school parsers, and that’s before you factor in your time, auto-scaling to zero, batching, etc.

To put this in perspective: liquid/lfm-7b costs just $0.01 per 1M tokens—input or output. That’s borderline insane. We’re talking about a model that can parse, reason, and format data at a cost so low it’s basically free. At those prices, you start questioning whether it’s even worth building traditional parsers anymore.

That’s the magic: take a tiny, dirt-cheap model and apply its intelligence at scale. There are hundreds of tasks inside most enterprises that are still done manually, less accurately, or at far higher cost. Overkill? Maybe. But if it costs pennies and nails the accuracy, who cares.