4 min read

Why I picked Haiku for Field Kit's Decision Simulator

For structured generation against a schema, Haiku 4.5 beats Sonnet on first-token latency by ~700ms and costs roughly 5x less. Pick a model per task, not per product.

Field Kit's Decision Simulator takes a binary product question and returns a structured score across inferred dimensions, plus the conditions under which the call flips. It needs to feel fast and arrive reasoned. I picked Claude Haiku 4.5 over Sonnet 4.6 and Opus 4.7. Here's the call.

The task shape decides the model

Decision Simulator does three things in sequence: infer 3-5 evaluation dimensions from the user's question, score each option against each dimension, and surface the flip conditions. All three are structured generation tasks. The reasoning lives in the prompt, not in the model.

When the task is “follow a schema and fill it in with sensible defaults,” you don't need the heaviest model. You need the fastest one that can reliably hit your Zod schema and write copy that sounds like a human who understood the question.

The numbers I cared about

  • First-token latency. Users abandon if nothing happens in 800ms. Haiku 4.5 streams its first token in ~400-600ms vs. ~1.1-1.5s for Sonnet on the same input length.
  • Cost per run. Haiku is roughly 5x cheaper than Sonnet, ~25x cheaper than Opus. Field Kit is on a portfolio site, not an enterprise contract. Latency wins, and so does my Vercel bill.
  • Schema compliance. I ran 50 test prompts through each. Haiku 4.5 hit the schema cleanly 49 times. Sonnet 50. Opus 50. For this use case, one retry on the 50th call is fine.

Where Sonnet did win

The other two Field Kit tools, Experience Mapper and Compass Check, both use Sonnet 4.6. Experience Mapper has to do retrieval-augmented reasoning over my portfolio data and produce a multi-paragraph narrative. Compass Check returns a one-sentence judgment that has to land emotionally. Both reward the bigger model.

The lesson: don't pick a model per product. Pick one per task. Most teams default to the biggest model they can afford and overpay in latency and cost. The right call is per-task, often per-step inside a task.

What I'd do differently

If I were starting today, I'd also test Haiku 4.5 with a small caching layer on the dimension-inference step. Most product questions cluster (build vs. buy, ship vs. wait, hire vs. consult), so the dimensions repeat. A 100-entry semantic cache would probably cut the first leg to near-zero latency for warm queries.

Field Kit's model panel surfaces all of this live: which model ran, how long it took, what it cost. The judgment is the product.

FAQ

Why use Claude Haiku 4.5 instead of Sonnet or Opus for a product decision tool?
Because the task is structured generation against a Zod schema, not open-ended reasoning. Haiku 4.5 hits the schema cleanly, streams first token in 400-600ms, and costs about 5x less than Sonnet. Users abandon slow tools. Haiku wins.
When should an AI product team pick Sonnet over Haiku?
When the task needs multi-step reasoning, retrieval over a corpus, or copy that has to land emotionally. In Field Kit, Experience Mapper and Compass Check both use Sonnet for those exact reasons.
How do you decide which Claude model fits a product feature?
Pick per task, not per product. Structured generation: Haiku. Narrative reasoning over a corpus: Sonnet. Long-context judgment or research: Opus. Most teams over-route to the biggest model and pay for it in latency and cost.