Skip to content
Chad Brown
Chad Brown
  • Every Token You Add to Context Costs Money, Latency, and Accuracy

    There is a comfortable assumption behind a lot of context handling: the window has a ceiling, so anything up to that ceiling is fair game. Retrieve a few extra passages in case they help. Keep the whole conversation because trimming is work. Carry a generous system prompt because instructions feel safer when there are more of them. The window fits it…

    Read More Every Token You Add to Context Costs Money, Latency, and AccuracyContinue

  • The Context Window Is the Only Working Memory a Model Has

    A language model has no memory of you. It has no memory of the last thing it told you, no running notion of the task, no internal notebook that persists from one call to the next. What looks like memory is an illusion reconstructed on every request. The single place that illusion is assembled is the context window, and that window…

    Read More The Context Window Is the Only Working Memory a Model HasContinue

  • LinkedIn