Every Token You Add to Context Costs Money, Latency, and Accuracy
There is a comfortable assumption behind a lot of context handling: the window has a ceiling, so anything up to that ceiling is fair game. Retrieve a few extra passages in case they help. Keep the whole conversation because trimming is work. Carry a generous system prompt because instructions feel safer when there are more of them. The window fits it…
