CallMissed Blog
Insights on AI communication, voice agents, WhatsApp automation, and the future of customer engagement.
Tutorial: Build a Production RAG App in 2 Hours
This tutorial walks through building a production-grade RAG (Retrieval-Augmented Generation) app from scratch in roughly two hours. Not a toy — a system with chunking, hybrid retrieval, reranking, eval, and citations. Code samples are Python with widely-used 2026 libraries; substitute whatever you p…
Tutorial: Fine-Tune Llama 4 Scout for Your Domain
Llama 4 Scout — Meta's 17B-active-parameter MoE released in April 2025 with a 10M token context window — is one of the most capable open models available for domain fine-tuning in 2026. This tutorial walks through a LoRA fine-tune of Llama 4 Scout for a domain task, covering dataset prep, training, …
Tutorial: Stream LLM Responses from a FastAPI Backend
Streaming LLM responses from a FastAPI backend looks easy in tutorials and gets messy in production — client disconnects, post-stream cleanup, error propagation, usage tracking, and observability all surface only when traffic ramps. This tutorial covers the production-shape pattern: SSE (Server-Sent…