Back to Resources
Technical

Real-Time Anomaly Detection with AI: Architecture and Implementation

Sarah Chen
January 28, 2026
10 min read

Real-Time Anomaly Detection with AI: Architecture and Implementation

Real-time anomaly detection is the process of automatically identifying unusual patterns in streaming data as they occur, rather than discovering them hours or days later in batch analysis. Skopx's anomaly detection system processes metric snapshots every 15 minutes, identifies deviations from learned baselines within 45 seconds, and generates actionable insights with 87% precision (meaning 87% of flagged anomalies are confirmed as genuine issues by users).

What Is Anomaly Detection in the Context of AI Analytics?

Anomaly detection in AI analytics is the automated identification of data points, patterns, or behaviors that deviate significantly from expected norms. In Skopx, this means detecting unusual patterns across connected data sources, a sudden spike in error rates from your database, an unexpected drop in deployment frequency from GitHub, or an unusual pattern in customer support tickets from Jira. The system learns what "normal" looks like for each metric and each team, then alerts when reality diverges.

How Does the Metric Collection Pipeline Work?

Our anomaly detection starts with the MetricCollector, which runs on a 15-minute cron cycle. Each cycle, the collector queries all connected data sources for their latest metrics, row counts, error rates, response times, commit frequency, ticket volumes, and any custom metrics the team has configured.

Metrics are stored as time-series snapshots in the metric_snapshots table in Supabase PostgreSQL. Each snapshot records the metric name, value, timestamp, source, and tenant ID. We retain 90 days of snapshots at 15-minute granularity, which gives us sufficient history for seasonal pattern detection.

// MetricCollector core loop
interface MetricSnapshot {
  metric_name: string;
  value: number;
  timestamp: Date;
  source_id: string;
  organization_id: string;
  metadata: Record<string, unknown>;
}

async function collectMetrics(orgId: string): Promise<MetricSnapshot[]> {
  const sources = await getConnectedSources(orgId);
  const snapshots: MetricSnapshot[] = [];

  for (const source of sources) {
    const metrics = await source.adapter.getLatestMetrics();
    snapshots.push(...metrics.map(m => ({
      ...m,
      timestamp: new Date(),
      source_id: source.id,
      organization_id: orgId
    })));
  }

  await supabase.from('metric_snapshots').insert(snapshots);
  return snapshots;
}

How Does Baseline Learning Work?

Before we can detect anomalies, we need to know what "normal" looks like. Our baseline model uses exponential moving averages (EMA) with debiasing, a technique borrowed from optimization algorithms. The EMA tracks both the mean and variance of each metric over time, adapting to gradual trends while remaining sensitive to sudden shifts.

We use a warmup period of 48 hours (192 data points at 15-minute intervals) before activating anomaly detection for a new metric. During warmup, we only collect data and build the baseline. After warmup, the adaptive threshold adjusts based on the metric's observed volatility, stable metrics get tighter thresholds (1.5 standard deviations), volatile metrics get wider thresholds (3.0 standard deviations).

The threshold adapts over time. If a metric consistently triggers false alarms, the system widens its threshold automatically. If users consistently acknowledge alerts as genuine, the threshold tightens. This feedback loop reduced our false positive rate from 34% to 13% over three months of production use.

How Does the AI Insight Generator Classify Anomalies?

When the statistical detector flags an anomaly, the InsightGenerator classifies it and generates a human-readable explanation. We use a three-tier AI strategy to balance cost and quality.

Tier 1 (70% of anomalies): Local pattern matching. Simple rule-based classification that runs entirely locally with zero AI cost. Patterns like "metric dropped to zero" (service outage), "metric exceeded 2x historical max" (traffic spike), or "metric has been steadily increasing for 7 days" (growth trend) are handled without any LLM call.

Tier 2 (25% of anomalies): Claude Haiku. For anomalies that do not match simple patterns, we send the metric history and context to Claude 3.5 Haiku for classification. Haiku analyzes the time series, correlates it with other metrics from the same time period, and generates an explanation. Cost: approximately $0.001 per insight.

Tier 3 (5% of anomalies): Claude Sonnet for deep analysis. Complex anomalies involving multiple correlated metrics or requiring business context understanding are escalated to Sonnet. These are typically cross-source anomalies like "error rate spiked in the database at the same time deployment frequency dropped in GitHub, suggesting a bad deploy that the team is afraid to fix."

How Does Cross-Source Correlation Work?

The most valuable anomalies are not single-metric deviations but correlated patterns across multiple data sources. Our EntityGraph maintains relationships between metrics from different sources. When an anomaly is detected in one metric, the system checks correlated metrics for concurrent anomalies.

For example, the EntityGraph knows that the "orders" table in PostgreSQL is related to the "payment-service" repository in GitHub and the "Payment Issues" Jira board. When order volume drops anomalously, the system automatically checks whether there was a recent deployment to the payment service or a spike in payment-related Jira tickets.

This cross-source correlation catches issues that single-source monitoring misses. In our production data, 23% of the highest-value insights (rated 5/5 by users) involved correlations across two or more data sources.

How Do You Handle Alert Fatigue?

Alert fatigue is the primary failure mode of anomaly detection systems. We mitigate it through four mechanisms.

First, severity scoring. Each anomaly receives a severity score from 0-100 based on deviation magnitude, metric importance (user-configurable), and historical false positive rate for that metric. Only anomalies above the user's configured threshold generate notifications.

Second, deduplication. If the same anomaly persists across multiple collection cycles, we update the existing insight rather than creating duplicates. Users see one insight that says "Error rate has been elevated for 3 hours" rather than twelve separate alerts.

Third, notification preferences. Users configure per-category notification preferences, they might want immediate notifications for database anomalies but weekly digests for code quality trends.

Fourth, automatic dismissal. If an anomaly self-resolves within two collection cycles (30 minutes), we automatically mark it as resolved and downgrade its severity. Transient spikes generate insights but not notifications.

What Are the Accuracy Metrics?

We measure anomaly detection quality across three dimensions: precision (87%, percentage of flagged anomalies confirmed as genuine), recall (73%, percentage of genuine anomalies that were flagged), and time-to-detection (median 22 minutes from anomaly onset to user notification). The precision-recall trade-off is tunable per metric via the threshold configuration. We default to higher precision because false alarms erode user trust faster than missed anomalies.

Key Takeaways

Effective anomaly detection requires statistical baselines with adaptive thresholds, tiered AI classification to manage costs, cross-source correlation for high-value insights, and aggressive alert fatigue mitigation. The most impactful architectural decision was the three-tier AI strategy, handling 70% of anomalies locally with zero AI cost while reserving expensive model calls for complex, multi-source patterns. The hardest ongoing challenge is recall improvement: detecting subtle, slow-moving anomalies that do not trigger statistical thresholds but represent genuine business issues.

Share this article

Sarah Chen

Contributing writer at Skopx

Stay Updated

Get the latest insights on AI-powered code intelligence delivered to your inbox.