AI product judgment in 2026: three calls I'm betting on
Three bets: evals replace PRDs, agents are products not features, and per-task model routing beats "use the biggest model everywhere." The AI PM role becomes more systems design than prompt engineering.
Three calls I'm making as a Director of Product going into the back half of 2026. None of them are about the models. All of them are about how product work changes when judgment is the bottleneck and generation is free.
1. Evals are the new PRD
The team that writes the eval set first ships faster than the team that writes the PRD first. Same energy as test-driven development, ten years later, but the leverage is bigger. With an eval set you can run against ten model variants in an afternoon. Without one, you're arguing in Slack for a week.
Practical version: every AI feature ticket should have an eval set attached before a single token is generated in production. The eval doesn't have to be fancy. Twenty hand-written test cases with pass/fail criteria gets you 80% of the value.
2. Agents are products, not features
The 2024 reflex was “bolt a chatbot onto the side of the app.” The 2026 unlock is shipping the agent the way you ship a teammate: with explicit scope, defaults, tools, and a clear job to be done. Field Kit treats each of the three tools as a small, scoped agent rather than a single mega-prompt.
The product question stops being “how do we add AI here.” It becomes “what is this agent allowed to do, what does it escalate, and how does the human stay in the loop.” That's a product spec, not a model spec.
3. Ruthless model routing beats bigger models
I'm watching teams default to the largest model they can afford across every feature. Cost goes up. Latency goes up. Quality often doesn't. The right call is task-by-task, sometimes step-by-step inside a task.
Haiku for fast structured calls. Sonnet for judgment and narrative. Opus when you can spend the time and money. A small routing layer in front of every feature, with budgets and fallbacks, will outperform “just use Opus everywhere” on every dimension that matters to the user and to the company paying the bill.
What this means for the next hire
The AI Product Leader I'd hire in 2026 doesn't write the best prompt. They write the eval set, design the routing layer, scope the agents, and keep the humans in the loop in the right places. The work looks more like systems design than copywriting.
If you're building toward this and want to compare notes, my contact is in the footer.
FAQ
- What are the most important AI product bets for 2026?
- Three: evals as the new PRD (write the test set before the spec), agents as products not features (scope them like teammates), and ruthless model routing (Haiku for structured calls, Sonnet for judgment, Opus only when you can spend).
- How is the role of an AI Product Manager changing in 2026?
- The 2026 AI PM writes the eval set first, designs the model routing layer, scopes agents like teammates, and keeps humans in the loop in the right places. The work looks more like systems design than prompt engineering.
- Why are AI evals more important than PRDs in 2026?
- Because with an eval set you can run a feature against ten model variants in an afternoon and pick the winner objectively. Without one, you argue in Slack for a week. Evals turn AI product work from opinion into measurement.
