Claude Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro: which LLM for an SMB agent?

On April 16, 2026, Anthropic shipped Claude Opus 4.7. Within the week, three managers asked me whether they should migrate their agent. The honest answer: probably not for most of you, maybe yes if you do code. And above all: the choice of LLM is rarely what makes or breaks an SMB agent.

Here's what I look at when a client asks me "Claude, GPT or Gemini?" — and why it's not always the right question.

The 3 models in one sentence (April 2026)

Model	Positioning
Claude Opus 4.7 (Anthropic, shipped April 16, 2026)	Best at code and on long agents, the most expensive, most professional tone.
GPT-5.4 (OpenAI, shipped March 5, 2026)	First to surpass the human expert on OSWorld (computer use), very balanced.
Gemini 3.1 Pro (Google, available since March 2026)	The cheapest, huge context (1M tokens), excellent on multimodal.

The three are within a few points of each other on 80% of tasks. Differences show up on edge cases — long coding agent, computer use, huge document volume.

Benchmarks — what we can compare cleanly

Metric	Opus 4.7	GPT-5.4	Gemini 3.1 Pro
SWE-bench Verified (code)	87.6%	82.0%	80.6%
OSWorld-Verified (computer use)	78.0%	75.0%	72.1%
Max context	200k tokens (1M preview)	400k tokens	1M tokens
Input / output price (USD / 1M tokens)	15 / 75	2.50 / 15	1.25 / 10
Typical latency (500 output tokens)	~2.5s	~1.8s	~1.5s

Beware of benchmarks. They measure standardised tasks. On your specific use case, the real delta between the 3 is often 3 to 5 points, not 20.

Case 1 — Level-1 support agent (volume, cost)

A support agent handling 5,000 tickets/month, with 800 tokens average prompt and 400 tokens response. Total volume: 5M input tokens + 2M output tokens per month.

Model	Monthly API cost	Perceived latency
Opus 4.7	5 × 15 + 2 × 75 = $225/month	~2.5s
GPT-5.4	5 × 2.5 + 2 × 15 = $42.50/month	~1.8s
Gemini 3.1 Pro	5 × 1.25 + 2 × 10 = $26.25/month	~1.5s

On level-1 support, Opus 4.7 is almost always overkill. Response quality between the three is indistinguishable for the end customer, but cost ranges from $26 to $225/month. I default to Gemini 3.1 Pro or GPT-5.4.

The exception: legal, medical or highly technical support — there, Opus 4.7's rigour on phrasing and its refusal to hallucinate procedures matter. At that point, paying 5 to 8 times more is acceptable.

Case 2 — Document extraction (precision before cost)

An agent extracting 8 key fields from 15-40 page PDF contracts. Volume: 300 contracts/month, average context 30k tokens per contract.

Here, extraction precision is everything. One error on a contract costs more than a month of API. I measured on a client case (anonymised, 200 contracts tested):

Model	8-field extraction precision	Cost / 300 contracts
Opus 4.7	96.5%	~$180/month
GPT-5.4	94.2%	~$30/month
Gemini 3.1 Pro	93.1%	~$15/month

The 96.5% vs 94.2% gap looks ridiculous. In practice: on 300 contracts × 8 fields = 2,400 extractions/month, Opus makes 84 errors, GPT-5.4 makes 139, Gemini 165. If each error costs 5 minutes of human correction at €55/h (legal counsel), Gemini's human overhead vs Opus = 81 × 5 min × €55/60 = €371/month.

Result: Opus 4.7 becomes profitable as soon as precision matters. On this specific case, I picked Opus, and the client never went back.

Case 3 — Internal coding / DevOps agent

For an agent modifying code or doing automated debugging, Opus 4.7 crushes the other two. SWE-bench Verified at 87.6%, nearly 7 points above GPT-5.4. In practice on a properly scoped dev agent, this means PRs passing CI twice as often without a human in the loop.

But careful: a coding agent in an SMB is rare. 95% of my agent engagements aren't code. Paying for Opus to do extraction or support is like buying an F1 car to go pick up bread.

What's (really) changing in 2026

The landscape has flattened. In 2024, gaps of 20 to 30 points between the best and worst LLM on standard tasks were normal. In 2026, on an SMB, the three candidates do the job 90-95% the same. Differences come down to:

Price (factor 10 between Gemini and Opus).
Latency (factor 2).
Context (factor 5).
Specific cases (coding for Opus, computer use for GPT-5.4 and Opus, multimodal for Gemini).

The choice of model must not determine your agent's architecture. A good agent must be able to swap LLM in 2 config lines. I insist on this with every client: you don't build "a Claude agent" or "a GPT agent", you build an agent that uses an LLM — interchangeable. When Opus 5 ships in 4 months, you'll want to switch without redoing 6 weeks of work.

What systematically goes wrong

1. Picking the LLM before scoping the use case. "We want Claude" or "we want GPT" comes up in 1 brief out of 3. That's the wrong starting point. First the use case, evaluations, volumes — then the LLM that fits.

2. Picking the most expensive model "to be safe". I've seen SMBs burn $2,000/month on Opus 4.7 when a Gemini 3.1 Pro at $150/month would have done the same job on their specific case. Rule: always start with the cheapest, move up only if evaluations justify it.

3. Ignoring sovereignty. All three are hosted outside the EU (mainly US). If your data is sensitive, you can go through Bedrock (Europe) for Anthropic, Azure France for OpenAI, or a self-hosted open source model (Mistral Large, Llama 3.3). See the security checklist I apply on every engagement.

Decision grid — 3 questions

Question 1 — How critical is precision?

Standard (support, FAQ, summary): Gemini 3.1 Pro or GPT-5.4.
High (extraction, legal, medical): Opus 4.7.
Maximum with human validation: Opus 4.7 with systematic human-in-the-loop.

Question 2 — What monthly volume?

Less than 10M tokens/month: doesn't matter, pick on quality.
10 to 100M tokens/month: Gemini or GPT-5.4, unless specific case.
More than 100M tokens/month: cost out each model, the bill changes everything.

Question 3 — Code or computer use?

Yes: Opus 4.7.
No: any of the three.

Where to start

If you have an AI agent that works but costs too much, or if you're hesitating on the LLM to pick for a new project, 30 minutes are enough. I look at your case, your volume, your evaluations — and tell you frankly which model you should use, with the figures. I never push Opus if a Gemini does the job.

To go further

Related service: Custom AI agents
Related articles: AI agent vs chatbot — when to pick what · Calculating ROI on an AI project