The quest for lifelong AI memory has researchers pushing the boundaries of Context Engineering 2.0. This paradigm shift aims to revolutionize how AI handles memory and context, moving away from short-lived context windows towards a Semantic Operating System that can store, update, and forget information over decades, mirroring human memory more closely.
The evolution of context engineering is traced through four phases. In the 1990s, early systems forced users to translate intentions into rigid, machine-readable commands, limiting their ability to process unstructured inputs. This changed with models like GPT-3 in 2020, which began interpreting natural language and understanding implications, shifting context engineering to focus on unstructured, human-style input and semi-permanent memories.
Anthropic's recent focus on prompt engineering and the term's emergence in early 2023 by Riley Goodside, and its discussion by Shopify CEO Tobi Lutke and former OpenAI researcher Andrej Karpathy in the summer of 2025, highlight the growing importance of context engineering in AI development.
The researchers envision Era 3.0, centered on human-level interpretation, including social cues and emotions, and Era 4.0, where systems understand people better than they understand themselves, making new connections on their own. However, the debate rages on whether current technology can realistically reach this level of sophistication.
The paper highlights the familiar issue of models losing accuracy as context grows, with many systems degrading even when their memory is only half full. The computational cost is another constraint, as doubling the context quadruples the workload. This is why feeding an entire PDF into a chat window is usually a bad idea, as models work better when the input is trimmed to what matters.
The Semantic Operating System is proposed to overcome these limitations by storing and managing context in a more durable, structured way. It requires four key capabilities: large-scale semantic storage, human-like memory management, new architectures for handling time and sequence, and built-in interpretability for user inspection and correction.
The paper reviews various methods for processing textual context, including timestamping, organizing information into functional roles, and converting context into question-answer pairs or hierarchies. Each method has trade-offs, and the challenge lies in balancing clarity and flexibility.
Modern AI must handle multimodal data, combining text, images, audio, video, code, and sensor data. Three main strategies are described for multimodal processing: embedding data into a shared vector space, feeding multiple modalities into a single transformer, and using cross-attention. However, unlike the human brain, technical systems rely on fixed mappings, and the Semantic Operating System's central concept of 'self-baking' aims to turn fleeting impressions into stable, structured memories.
Early signs of the Semantic Operating System are emerging in projects like Anthropic's research agent, Google's Gemini CLI, and Alibaba's Tongyi DeepResearch, which condense information into summaries for future searches. Brain-computer interfaces are also seen as a potential way to reshape context collection by recording focus, emotional intensity, and cognitive effort, expanding memory systems to internal thoughts.
The paper concludes with a philosophical perspective, drawing on Karl Marx's idea that people are shaped by their social relationships. It argues that digital traces now play a similar role, with conversations, decisions, and interactions defining us. The Semantic Operating System is envisioned as providing the technical foundation for a future where context becomes a lasting form of knowledge, memory, and identity, continuing to interact with the world even after a person's life ends.