Back to Resources
Analytics

Univariate Analysis: Methods, Examples, and When to Use It

Saad Selim
May 4, 2026
9 min read

Univariate analysis examines one variable at a time. It is the simplest form of statistical analysis and the essential first step before exploring relationships between variables. Before asking "does X affect Y?" you need to understand X on its own: its distribution, central tendency, spread, and outliers.

Why Start with Univariate Analysis

Every thorough analysis begins here because:

  1. Detect data quality issues. Impossible values, unexpected distributions, and missing data patterns are visible.
  2. Understand each variable independently. Know the range, typical values, and shape before combining variables.
  3. Identify outliers. Extreme values that might distort later analysis.
  4. Choose appropriate methods. The distribution of a variable determines which statistical tests are valid.

Measures of Central Tendency

These describe the "typical" value:

Mean (Average)

Sum of all values divided by the number of values.

When to use: Data is roughly symmetric without extreme outliers. When NOT to use: Skewed data or data with outliers (the mean is pulled toward extremes).

Example: Salaries in a department: $60K, $65K, $70K, $72K, $75K, $500K (CEO) Mean = $140K (misleading because one extreme value distorts it)

Median

The middle value when data is sorted. Half the values are above, half below.

When to use: Skewed data, data with outliers, ordinal data. Advantage: Not affected by extreme values.

Example: Same salaries: Median = $71K (much more representative of "typical")

Mode

The most frequently occurring value.

When to use: Categorical data, finding the most common category.

Example: Shirt sizes sold: S(20), M(45), L(38), XL(15). Mode = M (most popular).

Measures of Spread (Dispersion)

These describe how spread out the data is:

Range

Maximum minus minimum value.

Formula: Range = Max - Min Limitation: Extremely sensitive to outliers (one extreme value makes range huge).

Interquartile Range (IQR)

The range of the middle 50% of data (Q3 - Q1).

Formula: IQR = 75th percentile - 25th percentile Advantage: Not affected by outliers. Use: Often used to define outliers (values beyond 1.5 x IQR from Q1 or Q3).

Standard Deviation

Average distance of values from the mean.

Interpretation:

  • Small SD: Data points cluster tightly around the mean
  • Large SD: Data points are spread widely

Rule of thumb (normal distribution):

  • 68% of values within 1 SD of mean
  • 95% within 2 SD
  • 99.7% within 3 SD

Variance

Standard deviation squared. Used in formulas but harder to interpret directly (units are squared).

Coefficient of Variation (CV)

Standard deviation divided by mean, expressed as percentage. Allows comparison of spread across variables with different scales.

Example: Comparing variability of revenue ($1M average, $200K SD) vs. orders (500 average, 100 SD).

  • Revenue CV = 20%
  • Orders CV = 20%
  • Same relative variability despite different scales.

Distribution Shape

Skewness

Measures asymmetry of the distribution:

  • Right-skewed (positive): Tail extends right. Mean > Median. Examples: income, house prices, website session duration.
  • Left-skewed (negative): Tail extends left. Mean < Median. Examples: age at retirement, exam scores (with ceiling effect).
  • Symmetric: Mean = Median. Example: height, IQ scores.

Kurtosis

Measures the "tailedness" of the distribution:

  • High kurtosis: Heavy tails, more outliers than normal distribution
  • Low kurtosis: Light tails, fewer outliers
  • Normal kurtosis: Similar to bell curve

Visualization Methods for Univariate Analysis

MethodBest ForShows
HistogramContinuous data distribution shapeFrequency distribution, skewness, modes
Box plotSummary statistics and outliersMedian, IQR, range, outliers
Bar chartCategorical frequencyCount or proportion per category
Density plotSmooth distribution estimateShape without bin sensitivity
Dot plotSmall datasetsIndividual values
QQ plotChecking normalityHow closely data follows normal distribution

Reading a Histogram

A histogram divides continuous data into bins and counts values in each bin:

  • Bell-shaped: Data is approximately normal
  • Right-skewed: Long tail on right (most values on left)
  • Bimodal: Two peaks (possibly two subgroups in the data)
  • Uniform: All bins roughly equal height (no preferred value)

Reading a Box Plot

ComponentMeaning
Box bottom (Q1)25th percentile
Line in boxMedian (50th percentile)
Box top (Q3)75th percentile
WhiskersExtend to 1.5 x IQR from box edges
Dots beyond whiskersOutliers

Univariate Analysis in Practice

Example: Analyzing Response Times

Data: 10,000 API response times from the past week.

Step 1: Summary statistics

  • Mean: 245ms
  • Median: 180ms
  • SD: 320ms
  • Min: 15ms, Max: 8,500ms

Step 2: Interpretation

  • Mean > Median indicates right skew (confirmed by histogram)
  • Large SD relative to mean suggests high variability
  • Max of 8.5s is an extreme outlier worth investigating

Step 3: Distribution analysis

  • 90% of requests complete under 400ms (acceptable)
  • 5% take 400-1000ms (slow)
  • 5% take over 1 second (problematic)
  • The long tail distorts the mean; median is more representative of typical experience

Step 4: Action

  • Report P50 (180ms) and P95 (950ms) rather than mean
  • Investigate the 5% > 1 second (likely a specific endpoint or condition)
  • Set SLO at P99 < 2000ms

Example: Analyzing Deal Sizes

Data: 500 closed deals from the past year.

Step 1: Summary statistics

  • Mean: $42K
  • Median: $28K
  • SD: $38K
  • Min: $2K, Max: $350K

Step 2: Distribution shape

  • Strongly right-skewed (a few large enterprise deals pull the mean up)
  • Bimodal: peaks around $15K (SMB) and $60K (enterprise)

Step 3: Insight The bimodal distribution suggests two distinct segments purchasing differently. Analyzing them separately would yield better insights than treating all deals as one population.

Univariate Analysis for Different Data Types

Continuous Data (numbers with any value)

  • Summary statistics: mean, median, SD, IQR
  • Visualization: histogram, box plot, density plot
  • Shape: skewness, kurtosis, modality

Discrete Data (countable numbers)

  • Summary statistics: mean, median, mode
  • Visualization: bar chart, frequency table
  • Special consideration: zero-inflation (many zeros)

Categorical Data (groups/labels)

  • Summary: frequency counts, proportions, mode
  • Visualization: bar chart, pie chart (2-5 categories only)
  • Analysis: chi-square goodness-of-fit test

Ordinal Data (ordered categories)

  • Summary: median, mode, percentiles (not mean)
  • Visualization: bar chart (ordered), cumulative frequency
  • Special consideration: do not treat as continuous (intervals may be unequal)

Tools for Univariate Analysis

SQL:

SELECT
    COUNT(*) AS n,
    AVG(amount) AS mean,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median,
    STDDEV(amount) AS std_dev,
    MIN(amount) AS min_val,
    MAX(amount) AS max_val,
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) AS q1,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount) AS q3
FROM orders;

AI-powered tools: Platforms like Skopx let you ask "describe the distribution of order amounts" or "show me a histogram of response times" in natural language and get the analysis instantly.

Summary

Univariate analysis is the foundation of all statistical work. Before exploring relationships between variables, understand each variable independently: its center, spread, shape, and outliers. This step catches data quality issues, informs method selection, and often reveals insights on its own. Never skip it.

Share this article

Saad Selim

The Skopx engineering and product team

Stay Updated

Get the latest insights on AI-powered code intelligence delivered to your inbox.