Skip to content
Back to Resources
Technical

Building a Real-Time Business Memory System

Alexis Kelly
May 29, 2026
10 min read

AI assistants that forget everything between conversations are frustrating. A user explains their KPIs, describes their data model, and provides context about their business, and the next day they have to do it all over again. Business memory systems solve this by accumulating context across conversations, building a persistent understanding of each organization's data, terminology, and priorities.

This article covers the architecture for building a real-time business memory system, including entity extraction, vector storage, retrieval strategies, and the challenge of keeping memory current as business context evolves.

What Business Memory Captures

A business memory system is not a simple conversation log. It extracts and organizes structured knowledge from unstructured interactions. The types of information worth persisting fall into several categories.

Entity Knowledge

Entities are the named things that matter to the business: products, customers, metrics, teams, and systems. When a user mentions "the Enterprise plan" or "the APAC sales team," the memory system should record these entities and accumulate attributes about them over time.

Entity records include:

  • Name and aliases (users might say "Enterprise plan," "Ent tier," or "Plan E")
  • Type (product, team, metric, customer segment)
  • Attributes learned from conversations (price points, ownership, relationships)
  • Last referenced timestamp

Metric Definitions

Business metrics are often ambiguous. "Revenue" might mean ARR, MRR, bookings, or recognized revenue depending on the team. When a user defines or clarifies a metric's meaning, the memory system captures that definition and applies it consistently in future conversations.

Query Patterns

When a user asks a question and the AI generates a successful SQL query, that mapping from natural language to SQL becomes a reusable pattern. Future similar questions can leverage the proven query structure, improving both accuracy and speed.

Preferences and Conventions

Users have preferences about how results are presented: chart types, date formats, rounding conventions, default time ranges, and summary styles. Capturing these preferences eliminates repetitive configuration.

Memory TypeExampleStorage Strategy
Entity knowledge"APAC team is led by Sarah Chen"Entity graph with attributes
Metric definition"Revenue means monthly recurring revenue"Key-value with context
Query pattern"Show revenue by region" mapped to SQLTemplate with parameters
Preference"Always show charts in dark mode"User preference store
Relationship"Enterprise plan is sold to Fortune 500"Graph edges between entities

Architecture

Entity Extraction Pipeline

Every conversation turn passes through an entity extraction pipeline that identifies mentioned entities, attributes, and relationships. The pipeline operates in three stages.

Recognition identifies entity mentions in the text. This uses a combination of named entity recognition (NER) for standard entity types and schema-aware matching for business-specific entities (table names, column values, product names).

Resolution links mentions to existing entities in the memory store. "The Enterprise plan," "Ent plan," and "our top tier" might all resolve to the same entity. Resolution uses a combination of exact matching, fuzzy matching, and embedding similarity.

Extraction pulls attributes and relationships from the surrounding context. If a user says "the Enterprise plan costs $299 per month," the system extracts a price attribute for the Enterprise plan entity.

Vector Storage for Semantic Retrieval

Not all business memory fits neatly into structured entities. Contextual knowledge like "our APAC team focuses on enterprise accounts in Japan and Singapore" is best stored as vector embeddings that can be retrieved based on semantic similarity.

The vector store indexes memory entries with their embeddings, metadata (source conversation, timestamp, confidence score), and tenant isolation tags. At query time, the user's question is embedded and used to retrieve the most relevant memory entries.

Memory Relevance Scoring

Not all memories are equally useful for a given query. The retrieval system scores each candidate memory on multiple dimensions:

  • Semantic relevance: How closely the memory's embedding matches the current query
  • Recency: More recently created or referenced memories score higher
  • Frequency: Memories that have been referenced across multiple conversations score higher
  • Confidence: Memories established through explicit user statements score higher than those inferred from context

The final score is a weighted combination of these factors, with weights that adapt based on user feedback (see the section on learning integration below).

Context Packing for Prompt Budgets

Retrieved memories must fit within the AI model's context window alongside the system prompt, tool definitions, conversation history, and current query. Context packing optimizes which memories to include given a fixed token budget.

The approach is analogous to the knapsack problem: each memory has a value (relevance score) and a cost (token count). The system selects the combination of memories that maximizes total value within the budget constraint. In practice, a greedy algorithm sorted by value-per-token produces near-optimal results.

Keeping Memory Current

Business context changes. Products are renamed, team structures shift, and metric definitions evolve. A memory system that accumulates stale information degrades rather than improves.

Decay and Refresh

Memories have a decay function that reduces their relevance score over time. Memories that are consistently reinforced through new conversations maintain high scores. Memories that stop being referenced gradually fade.

The decay rate should be calibrated to the type of information. Metric definitions change slowly (months), so their decay rate should be low. Project-specific context changes quickly (weeks), so its decay rate should be higher.

Conflict Resolution

When new information contradicts existing memory, the system must decide which to keep. The default strategy is to prefer the more recent statement, but with a confidence threshold. If the existing memory has high confidence (established through multiple conversations), a single contradictory statement is flagged for review rather than immediately overwriting.

Pruning

Over time, the memory store accumulates entries that are no longer useful: references to completed projects, departed team members, or deprecated products. A periodic pruning process removes memories that have decayed below a minimum score threshold and have not been referenced in a configurable time window.

Learning Integration

The memory system should integrate with the platform's learning engine. When retrieved memories lead to successful interactions (the user accepts the result without correction), the memory's confidence increases. When they lead to corrections, the system analyzes the correction to determine whether the memory was wrong or simply not applicable in that context.

This feedback loop creates a memory system that genuinely improves with use. Platforms like Skopx implement this by tracking response feedback and using it to adjust memory relevance scoring, ensuring that the most useful context surfaces in future conversations.

Privacy and Isolation

In multi-tenant systems, memory isolation is critical. Each tenant's memory must be completely separate, with no possibility of cross-contamination. This means:

  • Tenant-scoped vector indices or strict namespace isolation
  • Row-level security on all memory tables
  • Separate entity graphs per tenant
  • Audit logging for all memory access

Users should also have visibility into what the system remembers about their organization and the ability to edit or delete specific memories. Transparency builds trust in the system.

Measuring Memory Quality

The key metrics for evaluating a business memory system are:

  • Retrieval precision: Percentage of retrieved memories that are relevant to the current query
  • User correction rate: How often users correct the AI's understanding (lower is better)
  • Context utilization: Percentage of included memories that the model actually uses in its response
  • Memory freshness: Average age of memories used in responses

A well-functioning memory system should show decreasing correction rates and increasing retrieval precision over the first few weeks of use as it accumulates accurate business context. If these metrics plateau or regress, the memory system needs tuning.

Share this article

Alexis Kelly

The Skopx engineering and product team

Related Articles

Stay Updated

Get the latest insights on AI-powered code intelligence delivered to your inbox.