Analytics

Univariate Analysis: Methods, Examples, and When to Use It

Saad Selim

May 4, 2026

9 min read

Univariate analysis examines one variable at a time. It is the simplest form of statistical analysis and the essential first step before exploring relationships between variables. Before asking "does X affect Y?" you need to understand X on its own: its distribution, central tendency, spread, and outliers.

Why Start with Univariate Analysis

Every thorough analysis begins here because:

Detect data quality issues. Impossible values, unexpected distributions, and missing data patterns are visible.
Understand each variable independently. Know the range, typical values, and shape before combining variables.
Identify outliers. Extreme values that might distort later analysis.
Choose appropriate methods. The distribution of a variable determines which statistical tests are valid.

Measures of Central Tendency

These describe the "typical" value:

Mean (Average)

Sum of all values divided by the number of values.

When to use: Data is roughly symmetric without extreme outliers. When NOT to use: Skewed data or data with outliers (the mean is pulled toward extremes).

Example: Salaries in a department: $60K, $65K, $70K, $72K, $75K, $500K (CEO) Mean = $140K (misleading because one extreme value distorts it)

Median

The middle value when data is sorted. Half the values are above, half below.

When to use: Skewed data, data with outliers, ordinal data. Advantage: Not affected by extreme values.

Example: Same salaries: Median = $71K (much more representative of "typical")

Mode

The most frequently occurring value.

When to use: Categorical data, finding the most common category.

Example: Shirt sizes sold: S(20), M(45), L(38), XL(15). Mode = M (most popular).

Measures of Spread (Dispersion)

These describe how spread out the data is:

Range

Maximum minus minimum value.

Formula: Range = Max - Min Limitation: Extremely sensitive to outliers (one extreme value makes range huge).

Interquartile Range (IQR)

The range of the middle 50% of data (Q3 - Q1).

Formula: IQR = 75th percentile - 25th percentile Advantage: Not affected by outliers. Use: Often used to define outliers (values beyond 1.5 x IQR from Q1 or Q3).

Standard Deviation

Average distance of values from the mean.

Interpretation:

Small SD: Data points cluster tightly around the mean
Large SD: Data points are spread widely

Rule of thumb (normal distribution):

68% of values within 1 SD of mean
95% within 2 SD
99.7% within 3 SD

Variance

Standard deviation squared. Used in formulas but harder to interpret directly (units are squared).

Coefficient of Variation (CV)

Standard deviation divided by mean, expressed as percentage. Allows comparison of spread across variables with different scales.

Example: Comparing variability of revenue ($1M average, $200K SD) vs. orders (500 average, 100 SD).

Revenue CV = 20%
Orders CV = 20%
Same relative variability despite different scales.

Distribution Shape

Skewness

Measures asymmetry of the distribution:

Right-skewed (positive): Tail extends right. Mean > Median. Examples: income, house prices, website session duration.
Left-skewed (negative): Tail extends left. Mean < Median. Examples: age at retirement, exam scores (with ceiling effect).
Symmetric: Mean = Median. Example: height, IQ scores.

Kurtosis

Measures the "tailedness" of the distribution:

High kurtosis: Heavy tails, more outliers than normal distribution
Low kurtosis: Light tails, fewer outliers
Normal kurtosis: Similar to bell curve

Visualization Methods for Univariate Analysis

Method	Best For	Shows
Histogram	Continuous data distribution shape	Frequency distribution, skewness, modes
Box plot	Summary statistics and outliers	Median, IQR, range, outliers
Bar chart	Categorical frequency	Count or proportion per category
Density plot	Smooth distribution estimate	Shape without bin sensitivity
Dot plot	Small datasets	Individual values
QQ plot	Checking normality	How closely data follows normal distribution

Reading a Histogram

A histogram divides continuous data into bins and counts values in each bin:

Bell-shaped: Data is approximately normal
Right-skewed: Long tail on right (most values on left)
Bimodal: Two peaks (possibly two subgroups in the data)
Uniform: All bins roughly equal height (no preferred value)

Reading a Box Plot

Component	Meaning
Box bottom (Q1)	25th percentile
Line in box	Median (50th percentile)
Box top (Q3)	75th percentile
Whiskers	Extend to 1.5 x IQR from box edges
Dots beyond whiskers	Outliers

Univariate Analysis in Practice

Example: Analyzing Response Times

Data: 10,000 API response times from the past week.

Step 1: Summary statistics

Mean: 245ms
Median: 180ms
SD: 320ms
Min: 15ms, Max: 8,500ms

Step 2: Interpretation

Mean > Median indicates right skew (confirmed by histogram)
Large SD relative to mean suggests high variability
Max of 8.5s is an extreme outlier worth investigating

Step 3: Distribution analysis

90% of requests complete under 400ms (acceptable)
5% take 400-1000ms (slow)
5% take over 1 second (problematic)
The long tail distorts the mean; median is more representative of typical experience

Step 4: Action

Report P50 (180ms) and P95 (950ms) rather than mean
Investigate the 5% > 1 second (likely a specific endpoint or condition)
Set SLO at P99 < 2000ms

Example: Analyzing Deal Sizes

Data: 500 closed deals from the past year.

Step 1: Summary statistics

Mean: $42K
Median: $28K
SD: $38K
Min: $2K, Max: $350K

Step 2: Distribution shape

Strongly right-skewed (a few large enterprise deals pull the mean up)
Bimodal: peaks around $15K (SMB) and $60K (enterprise)

Step 3: Insight The bimodal distribution suggests two distinct segments purchasing differently. Analyzing them separately would yield better insights than treating all deals as one population.

Univariate Analysis for Different Data Types

Continuous Data (numbers with any value)

Summary statistics: mean, median, SD, IQR
Visualization: histogram, box plot, density plot
Shape: skewness, kurtosis, modality

Discrete Data (countable numbers)

Summary statistics: mean, median, mode
Visualization: bar chart, frequency table
Special consideration: zero-inflation (many zeros)

Categorical Data (groups/labels)

Summary: frequency counts, proportions, mode
Visualization: bar chart, pie chart (2-5 categories only)
Analysis: chi-square goodness-of-fit test

Ordinal Data (ordered categories)

Summary: median, mode, percentiles (not mean)
Visualization: bar chart (ordered), cumulative frequency
Special consideration: do not treat as continuous (intervals may be unequal)

Tools for Univariate Analysis

SQL:

SELECT
    COUNT(*) AS n,
    AVG(amount) AS mean,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median,
    STDDEV(amount) AS std_dev,
    MIN(amount) AS min_val,
    MAX(amount) AS max_val,
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) AS q1,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount) AS q3
FROM orders;

AI-powered tools: Platforms like Skopx let you ask "describe the distribution of order amounts" or "show me a histogram of response times" in natural language and get the analysis instantly.

Summary

Univariate analysis is the foundation of all statistical work. Before exploring relationships between variables, understand each variable independently: its center, spread, shape, and outliers. This step catches data quality issues, informs method selection, and often reveals insights on its own. Never skip it.

Share this article

Saad Selim

The Skopx engineering and product team

Why Start with Univariate Analysis

Measures of Central Tendency

Mean (Average)

Median

Mode

Measures of Spread (Dispersion)

Range

Interquartile Range (IQR)

Standard Deviation

Variance

Coefficient of Variation (CV)

Distribution Shape

Skewness

Kurtosis

Visualization Methods for Univariate Analysis

Reading a Histogram

Reading a Box Plot

Univariate Analysis in Practice

Example: Analyzing Response Times

Example: Analyzing Deal Sizes

Univariate Analysis for Different Data Types

Continuous Data (numbers with any value)

Discrete Data (countable numbers)

Categorical Data (groups/labels)

Ordinal Data (ordered categories)

Tools for Univariate Analysis

Summary

Share this article

Saad Selim

Stay Updated