
Our platform runs 11 AI subagents to automate development workflows — code review, test generation, security analysis, and documentation. All defaulted to Claude Sonnet 4.6 at $3/$15 per 1M tokens. That became our biggest AI cost line, and most of it was unnecessary.
This post describes the hybrid architecture we designed to fix it.
The problem
Subagents like code-simplifier, docs-maintainer, and pm-issue-writer perform small, structured prose tasks. They receive a diff or a lint report, produce a structured text output, and stop. Running them on Sonnet is like using a staff engineer to fill out a form.
Meanwhile, code-generator spent Sonnet tokens on file I/O loops that a CLI tool backed by OpenRouter can handle at 10× lower cost.
The OPENROUTER_API_KEY in our .env.local is a paid key. No free-tier rate limits. That unlocked the migration.
Provider inventory
Verified prices as of 2026-04-22:
| Model | $/1M in | $/1M out | Best for |
|---|---|---|---|
deepseek/deepseek-v3.2 |
0.26 | 0.38 | Code generation, refactor |
qwen/qwen3-coder-next |
0.15 | 0.80 | TypeScript, test generation |
x-ai/grok-4.1-fast |
0.20 | 0.50 | Fast parallel review |
google/gemini-2.5-flash |
0.075 | 0.30 | Docs, ADRs, prose |
google/gemini-2.5-flash-lite |
0.037 | 0.15 | Lint classification, small calls |
| Claude Haiku 4.5 | 1.00 | 5.00 | Orchestration, validation gates |
| Claude Sonnet 4.6 | 3.00 | 15.00 | High-blast-radius tasks only |
Four patterns
Pattern A — Skill replaces subagent
For pure analytical tasks whose input is a small artifact (diff, lint output) and whose output is a structured blob (report, suggestion list).
parent Claude → Skill (thin bash) → HTTP POST to OpenRouter → output
No agentic loop, no Sonnet at all. The /fix-review skill already does this — three OpenRouter model rounds, one Sonnet Arbiter. code-simplifier is the first to migrate via Pattern A.
Pattern B — Haiku orchestrator + OpenRouter worker
For agents that need a validation gate: generate output with a cheap model, then have Haiku verify shape or run tests.
Haiku subagent
→ OpenRouter model generates candidate
→ Haiku validates + runs verifier
→ success: commit | failure: retry
Applies to test-generator, docs-maintainer, pm-issue-writer, security-reviewer, static-analysis.
Pattern C — Haiku orchestrator + CLI worker
When the task requires real file I/O and iterative tool use (multi-file edits, build loops), delegate to a CLI with a built-in tool-use loop.
Haiku subagent
→ codex exec / opencode run (via openrouter/deepseek/deepseek-v3.2)
→ Haiku reviews diff + lint + test output before commit
Applies to code-generator. We already validated this pattern during a session where Claude’s subagent quota was exhausted — five PRs shipped cleanly via codex exec.
Pattern D — Keep Sonnet
For agents where project-specific conventions matter more than cost, and where mistakes are high-blast-radius.
Stays on Sonnet: bug-fixer, migration-generator, project-devops.
Decision matrix
| Agent | Pattern | Primary model |
|---|---|---|
code-generator |
C | openrouter/deepseek/deepseek-v3.2 via codex exec |
test-generator |
B | openrouter/qwen/qwen3-coder-next |
code-simplifier |
A | openrouter/deepseek/deepseek-v3.2 |
security-reviewer |
B | chorus review (3 OR models parallel) |
static-analysis |
B | openrouter/google/gemini-2.5-flash-lite |
docs-maintainer |
B | openrouter/google/gemini-2.5-flash |
pm-issue-writer |
B | openrouter/google/gemini-2.5-flash |
ci-build-agent |
B (Haiku-only) | Haiku 4.5 |
bug-fixer |
D | Sonnet 4.6 |
migration-generator |
D | Sonnet 4.6 |
project-devops |
D | Sonnet 4.6 |
Migration order
code-simplifier→ Pattern A. Smallest scope, baseline measurement.static-analysis→ Pattern B.docs-maintainer→ Pattern B. Prose on Gemini Flash.pm-issue-writer→ Pattern B.security-reviewer→ Pattern B wrapped around existingchorus review.test-generator→ Pattern B with test execution loop.code-generator→ Pattern C. Largest, highest-risk.ci-build-agent→ Pattern B (Haiku-only).
Each step ships as its own PR, followed by 3–5 real-task invocations before rolling to the next agent. Stop condition: success rate drops >10% vs Sonnet baseline.
Risks worth knowing
Haiku has lower context awareness. Prompts must inline the most relevant conventions rather than assume the agent knows the codebase. For each migration, expand the system prompt to include the exact rules the agent needs.
OpenRouter model drift. Pin model strings explicitly. A model ID that works today may swap its back-end next week.
CLI sandboxing. codex exec needed -s danger-full-access on this host (bubblewrap failure). Each CLI migration needs a smoke test.
Free vs paid confusion. Free :free tiers have silent rate limits. Always use paid openrouter/* model IDs.
Don’t delete Sonnet agent definitions on rollback. If a migrated agent regresses, keep the Sonnet definition in place for downstream callers and revert the model assignment only.
Expected outcome
8 of 11 subagents on Haiku + OpenRouter delegate. 3 agents remain on Sonnet. Target: 70–85% reduction in subagent spend with equivalent quality on analytical tasks and gated quality on generative ones — measured over a rolling 7-day window.