Back to Resources
SQL

SQL for Data Analysis: The Essential Commands Every Analyst Needs

Saad Selim
May 4, 2026
12 min read

SQL is the language of data analysis. Whether you use Python, Excel, or AI-powered tools, understanding SQL gives you the ability to ask any question of any database. This guide covers the commands and patterns analysts use daily, organized by analytical task rather than by SQL syntax.

The Analytical SQL Toolkit

Counting and Summarizing

-- How many customers do we have?
SELECT COUNT(*) AS total_customers FROM customers;

-- How many by segment?
SELECT segment, COUNT(*) AS count
FROM customers
GROUP BY segment
ORDER BY count DESC;

-- How many unique values?
SELECT COUNT(DISTINCT country) AS countries FROM customers;

Aggregating Metrics

-- Revenue summary
SELECT
    SUM(amount) AS total_revenue,
    AVG(amount) AS avg_order,
    MIN(amount) AS smallest_order,
    MAX(amount) AS largest_order,
    COUNT(*) AS order_count
FROM orders
WHERE order_date >= '2026-01-01';

Grouping and Comparing

-- Revenue by product category
SELECT
    p.category,
    COUNT(DISTINCT o.customer_id) AS customers,
    COUNT(*) AS orders,
    SUM(o.amount) AS revenue,
    AVG(o.amount) AS avg_order_value
FROM orders o
JOIN products p ON o.product_id = p.id
WHERE o.order_date >= '2026-01-01'
GROUP BY p.category
ORDER BY revenue DESC;

Time-Based Analysis

-- Monthly revenue trend
SELECT
    DATE_TRUNC('month', order_date) AS month,
    SUM(amount) AS revenue,
    COUNT(*) AS orders,
    COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
GROUP BY 1
ORDER BY 1;

Period-Over-Period Comparison

-- Month-over-month growth
WITH monthly AS (
    SELECT
        DATE_TRUNC('month', order_date) AS month,
        SUM(amount) AS revenue
    FROM orders
    GROUP BY 1
)
SELECT
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_month,
    ROUND((revenue - LAG(revenue) OVER (ORDER BY month)) * 100.0
        / LAG(revenue) OVER (ORDER BY month), 1) AS growth_pct
FROM monthly
ORDER BY month DESC
LIMIT 12;

Ranking and Top-N

-- Top 10 customers by revenue
SELECT
    customer_id,
    customer_name,
    SUM(amount) AS total_revenue,
    COUNT(*) AS order_count
FROM orders o
JOIN customers c ON o.customer_id = c.id
GROUP BY 1, 2
ORDER BY total_revenue DESC
LIMIT 10;

-- Rank within groups
SELECT
    department,
    employee_name,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

Cohort Analysis

-- Customer retention by signup cohort
WITH first_purchase AS (
    SELECT customer_id, MIN(order_date) AS cohort_date
    FROM orders GROUP BY 1
)
SELECT
    DATE_TRUNC('month', fp.cohort_date) AS cohort_month,
    DATEDIFF('month', fp.cohort_date, o.order_date) AS months_since_first,
    COUNT(DISTINCT o.customer_id) AS active_customers
FROM orders o
JOIN first_purchase fp ON o.customer_id = fp.customer_id
GROUP BY 1, 2
ORDER BY 1, 2;

Funnel Analysis

-- Conversion funnel
SELECT
    COUNT(DISTINCT CASE WHEN visited THEN user_id END) AS visitors,
    COUNT(DISTINCT CASE WHEN signed_up THEN user_id END) AS signups,
    COUNT(DISTINCT CASE WHEN trial_started THEN user_id END) AS trials,
    COUNT(DISTINCT CASE WHEN paid THEN user_id END) AS paying,
    ROUND(COUNT(DISTINCT CASE WHEN paid THEN user_id END) * 100.0 /
        NULLIF(COUNT(DISTINCT CASE WHEN visited THEN user_id END), 0), 1) AS overall_conversion
FROM user_events
WHERE event_date >= CURRENT_DATE - INTERVAL '30 days';

Window Functions for Running Calculations

-- Running total and moving average
SELECT
    order_date,
    daily_revenue,
    SUM(daily_revenue) OVER (ORDER BY order_date) AS cumulative_revenue,
    AVG(daily_revenue) OVER (
        ORDER BY order_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_7day_avg
FROM (
    SELECT order_date, SUM(amount) AS daily_revenue
    FROM orders GROUP BY 1
) daily;

Finding Anomalies

-- Days with unusual order volume (> 2 standard deviations from mean)
WITH daily_stats AS (
    SELECT
        order_date,
        COUNT(*) AS order_count
    FROM orders
    GROUP BY 1
),
stats AS (
    SELECT AVG(order_count) AS avg_count, STDDEV(order_count) AS std_count
    FROM daily_stats
)
SELECT ds.order_date, ds.order_count
FROM daily_stats ds
CROSS JOIN stats s
WHERE ds.order_count > s.avg_count + 2 * s.std_count
   OR ds.order_count < s.avg_count - 2 * s.std_count
ORDER BY ds.order_date DESC;

Segmentation

-- RFM segmentation
WITH customer_metrics AS (
    SELECT
        customer_id,
        DATEDIFF('day', MAX(order_date), CURRENT_DATE) AS recency_days,
        COUNT(*) AS frequency,
        SUM(amount) AS monetary
    FROM orders
    GROUP BY 1
)
SELECT
    customer_id,
    CASE
        WHEN recency_days <= 30 AND frequency >= 10 AND monetary >= 5000 THEN 'Champion'
        WHEN recency_days <= 60 AND frequency >= 5 THEN 'Loyal'
        WHEN recency_days <= 30 AND frequency <= 2 THEN 'New'
        WHEN recency_days > 90 AND frequency >= 5 THEN 'At Risk'
        WHEN recency_days > 180 THEN 'Lost'
        ELSE 'Regular'
    END AS segment
FROM customer_metrics;

CTEs: Organizing Complex Analysis

Common Table Expressions (WITH clauses) make complex queries readable:

WITH
-- Step 1: Get customer-level metrics
customer_summary AS (
    SELECT
        customer_id,
        MIN(order_date) AS first_order,
        MAX(order_date) AS last_order,
        COUNT(*) AS total_orders,
        SUM(amount) AS lifetime_revenue
    FROM orders
    GROUP BY 1
),
-- Step 2: Calculate retention
retention AS (
    SELECT
        customer_id,
        lifetime_revenue,
        DATEDIFF('day', first_order, last_order) AS customer_lifespan_days,
        DATEDIFF('day', last_order, CURRENT_DATE) AS days_since_last_order
    FROM customer_summary
)
-- Step 3: Classify
SELECT
    CASE
        WHEN days_since_last_order > 90 THEN 'churned'
        WHEN days_since_last_order > 30 THEN 'at_risk'
        ELSE 'active'
    END AS status,
    COUNT(*) AS customer_count,
    AVG(lifetime_revenue) AS avg_ltv
FROM retention
GROUP BY 1;

Key SQL Patterns for Analysts

PatternWhen to Use
GROUP BY + aggregatesSummarize by category
Window functions (OVER)Running totals, rankings, comparisons
CTEs (WITH clause)Break complex queries into readable steps
CASE WHENCreate categories, conditional logic
JOINsCombine data from multiple tables
SubqueriesFilter based on calculated values
DATE_TRUNCGroup by time periods
LAG/LEADCompare to previous/next period
PERCENTILE_CONTFind medians and percentiles
COALESCEHandle NULL values gracefully

Beyond Manual SQL

For teams that want analytical power without writing SQL, platforms like Skopx translate natural language questions into optimized SQL automatically. Ask "What is our customer retention rate by acquisition channel?" and get the query, result, and visualization instantly. This democratizes the analytical patterns above to anyone in the organization.

Summary

SQL for data analysis is not about memorizing syntax. It is about knowing which patterns solve which business questions. Master GROUP BY for summarization, window functions for sequential analysis, CTEs for organization, and JOINs for combining data sources. These patterns cover 90% of analytical queries you will ever need.

Share this article

Saad Selim

The Skopx engineering and product team

Stay Updated

Get the latest insights on AI-powered code intelligence delivered to your inbox.