CallMissed Blog

Insights on AI communication, voice agents, WhatsApp automation, and the future of customer engagement.

#API Design4 postsClear filter ×
Streaming AI Responses: SSE, WebSockets, and the Pitfalls6 min read
GuideMay 16, 2026

Streaming AI Responses: SSE, WebSockets, and the Pitfalls

A streaming LLM response feels fast even when total generation takes ten seconds, because the user sees tokens arriving immediately. The trade is operational: streaming is a long-lived connection with backpressure, partial-failure modes, and a different shape from a normal HTTP request. Here is what…

5 min read
ComparisonMay 8, 2026

Structured Output vs Tool Use: Which When

By 2026 the "JSON parsing with regex" era is over. Both major model APIs offer constrained-decoding paths that produce schema-valid output, and tool use is mature enough that one or the other handles 90% of structured generation workloads. The remaining question is which to reach for — and the answe…

5 min read
GuideMay 8, 2026

Prompt Caching Explained: Anthropic, OpenAI, and the Math

Prompt caching is the single highest-leverage cost lever for most production LLM workloads in 2026. The idea is simple — reuse the prefill compute of a previously seen prompt prefix instead of recomputing it. The implementations are different across providers, and the math of when it pays off is wor…

6 min read
GuideMay 8, 2026

Rate Limiting AI APIs: Strategies That Actually Work

Rate limiting an AI API is harder than rate limiting a regular API. A "request" can cost $0.0001 or $5.00 depending on prompt size, model, and output length. A noisy tenant can starve a paying tenant. An agent loop can fire 100 model calls per user action. The "100 requests per minute" rules from RE…