Your AI System's Biggest Cost Driver Isn't the Model

Today

Your AI system's biggest cost driver isn't the model. It's what you put in the prompt.

On a AI assistant I worked on, every organization got its own Bedrock Agent with a dedicated knowledge base and action groups. The decision that mattered wasn't which model. It was what context to include.

Context = cost. A single query pulled 3 KB chunks + 2 action group responses + conversation history. Well-scoped retrieval vs. naive "dump everything": 3-4x token cost difference per request.

Context = latency. Every RAG retrieval and action group call adds round-trip time before the model generates. Putting structured data behind action groups (fast, deterministic) instead of in the prompt (slow, token-expensive) was the key design decision.

Context engineering is FinOps for AI. Most teams optimize model choice and ignore prompt architecture.

How are you managing context costs in production AI? Aggressive retrieval scoping, prompt caching, or hoping the bill stays reasonable?