The Challenge
What MetekuAI Was Facing
MetekuAI analyzes product reviews at scale using LLMs to extract structured sentiment, feature mentions, and competitive signals. The engineering challenge was designing an inference pipeline that could process over a million records cost-efficiently — synchronous LLM calls were too expensive and too slow for bulk processing — while guaranteeing every record was processed exactly once and structured outputs were validated before being stored.
The Solution
What We Built
We designed the inference pipeline as an async job queue using SQS with long polling. Record batches were enqueued by an ingestion service, processed by auto-scaling ECS workers that called the LLM API with structured output schemas (JSON mode), and results were validated against Pydantic schemas before writing to PostgreSQL. Idempotency was enforced via a fingerprint deduplication table. Cost was managed by model-tier routing: simpler tasks used a cheaper model, complex extraction used a higher-tier model, with the routing decision made per-record based on text length and category signals.

Results
