The Challenge
What Ozzie Was Facing
Ozzie is a conversational AI assistant embedded in US enterprise workflows. The hard engineering problem was not calling an LLM — it was building the infrastructure layer that made the product reliable and cost-efficient at scale: streaming responses without connection timeouts, intelligent routing between model providers by cost and latency, conversation context storage that did not blow up database spend, and budget guardrails that did not break user experience.
The Solution
What We Built
We built an API-first platform on AWS Lambda and API Gateway with WebSocket support for streaming. A model router service selected providers in real time based on latency and per-token cost metrics logged to DynamoDB. Conversation history was compressed to S3 with a retrieval index in Redis, keeping context retrieval under 40ms. The full infrastructure was defined in CDK with blue-green deployments for zero-downtime model integration updates.

Results
