Building a Real-Time Business Memory System
AI assistants that forget everything between conversations are frustrating. A user explains their KPIs, describes their data model, and provides context about their business, and the next day they have to do it all over again. Business memory systems solve this by accumulating context across conversations, building a persistent understanding of each organization's data, terminology, and priorities.
This article covers the architecture for building a real-time business memory system, including entity extraction, vector storage, retrieval strategies, and the challenge of keeping memory current as business context evolves.
What Business Memory Captures
A business memory system is not a simple conversation log. It extracts and organizes structured knowledge from unstructured interactions. The types of information worth persisting fall into several categories.
Entity Knowledge
Entities are the named things that matter to the business: products, customers, metrics, teams, and systems. When a user mentions "the Enterprise plan" or "the APAC sales team," the memory system should record these entities and accumulate attributes about them over time.
Entity records include:
- Name and aliases (users might say "Enterprise plan," "Ent tier," or "Plan E")
- Type (product, team, metric, customer segment)
- Attributes learned from conversations (price points, ownership, relationships)
- Last referenced timestamp
Metric Definitions
Business metrics are often ambiguous. "Revenue" might mean ARR, MRR, bookings, or recognized revenue depending on the team. When a user defines or clarifies a metric's meaning, the memory system captures that definition and applies it consistently in future conversations.
Query Patterns
When a user asks a question and the AI generates a successful SQL query, that mapping from natural language to SQL becomes a reusable pattern. Future similar questions can leverage the proven query structure, improving both accuracy and speed.
Preferences and Conventions
Users have preferences about how results are presented: chart types, date formats, rounding conventions, default time ranges, and summary styles. Capturing these preferences eliminates repetitive configuration.
| Memory Type | Example | Storage Strategy |
|---|---|---|
| Entity knowledge | "APAC team is led by Sarah Chen" | Entity graph with attributes |
| Metric definition | "Revenue means monthly recurring revenue" | Key-value with context |
| Query pattern | "Show revenue by region" mapped to SQL | Template with parameters |
| Preference | "Always show charts in dark mode" | User preference store |
| Relationship | "Enterprise plan is sold to Fortune 500" | Graph edges between entities |
Architecture
Entity Extraction Pipeline
Every conversation turn passes through an entity extraction pipeline that identifies mentioned entities, attributes, and relationships. The pipeline operates in three stages.
Recognition identifies entity mentions in the text. This uses a combination of named entity recognition (NER) for standard entity types and schema-aware matching for business-specific entities (table names, column values, product names).
Resolution links mentions to existing entities in the memory store. "The Enterprise plan," "Ent plan," and "our top tier" might all resolve to the same entity. Resolution uses a combination of exact matching, fuzzy matching, and embedding similarity.
Extraction pulls attributes and relationships from the surrounding context. If a user says "the Enterprise plan costs $299 per month," the system extracts a price attribute for the Enterprise plan entity.
Vector Storage for Semantic Retrieval
Not all business memory fits neatly into structured entities. Contextual knowledge like "our APAC team focuses on enterprise accounts in Japan and Singapore" is best stored as vector embeddings that can be retrieved based on semantic similarity.
The vector store indexes memory entries with their embeddings, metadata (source conversation, timestamp, confidence score), and tenant isolation tags. At query time, the user's question is embedded and used to retrieve the most relevant memory entries.
Memory Relevance Scoring
Not all memories are equally useful for a given query. The retrieval system scores each candidate memory on multiple dimensions:
- Semantic relevance: How closely the memory's embedding matches the current query
- Recency: More recently created or referenced memories score higher
- Frequency: Memories that have been referenced across multiple conversations score higher
- Confidence: Memories established through explicit user statements score higher than those inferred from context
The final score is a weighted combination of these factors, with weights that adapt based on user feedback (see the section on learning integration below).
Context Packing for Prompt Budgets
Retrieved memories must fit within the AI model's context window alongside the system prompt, tool definitions, conversation history, and current query. Context packing optimizes which memories to include given a fixed token budget.
The approach is analogous to the knapsack problem: each memory has a value (relevance score) and a cost (token count). The system selects the combination of memories that maximizes total value within the budget constraint. In practice, a greedy algorithm sorted by value-per-token produces near-optimal results.
Keeping Memory Current
Business context changes. Products are renamed, team structures shift, and metric definitions evolve. A memory system that accumulates stale information degrades rather than improves.
Decay and Refresh
Memories have a decay function that reduces their relevance score over time. Memories that are consistently reinforced through new conversations maintain high scores. Memories that stop being referenced gradually fade.
The decay rate should be calibrated to the type of information. Metric definitions change slowly (months), so their decay rate should be low. Project-specific context changes quickly (weeks), so its decay rate should be higher.
Conflict Resolution
When new information contradicts existing memory, the system must decide which to keep. The default strategy is to prefer the more recent statement, but with a confidence threshold. If the existing memory has high confidence (established through multiple conversations), a single contradictory statement is flagged for review rather than immediately overwriting.
Pruning
Over time, the memory store accumulates entries that are no longer useful: references to completed projects, departed team members, or deprecated products. A periodic pruning process removes memories that have decayed below a minimum score threshold and have not been referenced in a configurable time window.
Learning Integration
The memory system should integrate with the platform's learning engine. When retrieved memories lead to successful interactions (the user accepts the result without correction), the memory's confidence increases. When they lead to corrections, the system analyzes the correction to determine whether the memory was wrong or simply not applicable in that context.
This feedback loop creates a memory system that genuinely improves with use. Platforms like Skopx implement this by tracking response feedback and using it to adjust memory relevance scoring, ensuring that the most useful context surfaces in future conversations.
Privacy and Isolation
In multi-tenant systems, memory isolation is critical. Each tenant's memory must be completely separate, with no possibility of cross-contamination. This means:
- Tenant-scoped vector indices or strict namespace isolation
- Row-level security on all memory tables
- Separate entity graphs per tenant
- Audit logging for all memory access
Users should also have visibility into what the system remembers about their organization and the ability to edit or delete specific memories. Transparency builds trust in the system.
Measuring Memory Quality
The key metrics for evaluating a business memory system are:
- Retrieval precision: Percentage of retrieved memories that are relevant to the current query
- User correction rate: How often users correct the AI's understanding (lower is better)
- Context utilization: Percentage of included memories that the model actually uses in its response
- Memory freshness: Average age of memories used in responses
A well-functioning memory system should show decreasing correction rates and increasing retrieval precision over the first few weeks of use as it accumulates accurate business context. If these metrics plateau or regress, the memory system needs tuning.
Alexis Kelly
The Skopx engineering and product team