SQL Subquery: Types, Syntax, and Practical Examples
A subquery is a query nested inside another query. It runs first, produces a result, and that result is used by the outer query. Subqueries let you break complex problems into manageable steps and answer questions that a single query cannot.
Basic Subquery Syntax
-- Subquery in WHERE clause
SELECT employee_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
The inner query calculates the average salary. The outer query returns all employees who earn more than that average.
Types of Subqueries
1. Scalar Subquery (Returns One Value)
Returns exactly one row with one column. Can be used anywhere a single value is expected.
-- Compare to a single computed value
SELECT product_name, price,
price - (SELECT AVG(price) FROM products) AS diff_from_avg
FROM products
ORDER BY diff_from_avg DESC;
-- Use in SELECT to add a computed column
SELECT
department,
COUNT(*) AS dept_size,
(SELECT COUNT(*) FROM employees) AS total_company,
ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM employees), 1) AS pct_of_company
FROM employees
GROUP BY department;
2. Row Subquery (Returns One Row, Multiple Columns)
Returns a single row with multiple columns. Useful for comparisons against multiple values.
-- Find the employee with the highest salary in each department
SELECT * FROM employees
WHERE (department, salary) IN (
SELECT department, MAX(salary)
FROM employees
GROUP BY department
);
3. Table Subquery (Returns Multiple Rows)
Returns a result set that the outer query can filter against or join with.
-- IN subquery: find customers who have placed orders
SELECT customer_name
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
WHERE order_date >= '2026-01-01'
);
-- NOT IN: find customers who have NOT placed orders
SELECT customer_name
FROM customers
WHERE customer_id NOT IN (
SELECT customer_id
FROM orders
WHERE order_date >= '2026-01-01'
);
4. Correlated Subquery (References Outer Query)
A correlated subquery references columns from the outer query and executes once per outer row.
-- Find employees who earn more than their department average
SELECT e.employee_name, e.department, e.salary
FROM employees e
WHERE e.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department = e.department -- References outer query
);
-- Find the most recent order for each customer
SELECT o.*
FROM orders o
WHERE o.order_date = (
SELECT MAX(o2.order_date)
FROM orders o2
WHERE o2.customer_id = o.customer_id
);
5. EXISTS Subquery
Tests whether a subquery returns any rows. Often more efficient than IN for large datasets.
-- Customers who have placed at least one order over $1000
SELECT c.customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.amount > 1000
);
-- Products that have never been ordered
SELECT p.product_name
FROM products p
WHERE NOT EXISTS (
SELECT 1
FROM order_items oi
WHERE oi.product_id = p.product_id
);
Subqueries in Different Clauses
In FROM (Derived Tables)
Use a subquery as a virtual table:
SELECT segment, avg_revenue, customer_count
FROM (
SELECT
CASE
WHEN total_orders > 10 THEN 'power_user'
WHEN total_orders > 3 THEN 'regular'
ELSE 'casual'
END AS segment,
AVG(total_revenue) AS avg_revenue,
COUNT(*) AS customer_count
FROM (
SELECT customer_id,
COUNT(*) AS total_orders,
SUM(amount) AS total_revenue
FROM orders
GROUP BY customer_id
) customer_stats
GROUP BY 1
) segments
ORDER BY avg_revenue DESC;
In SELECT (Scalar Subqueries)
Add computed columns from other tables:
SELECT
d.department_name,
(SELECT COUNT(*) FROM employees e WHERE e.dept_id = d.id) AS headcount,
(SELECT AVG(salary) FROM employees e WHERE e.dept_id = d.id) AS avg_salary,
(SELECT MAX(hire_date) FROM employees e WHERE e.dept_id = d.id) AS last_hire
FROM departments d;
In HAVING
Filter groups based on a subquery result:
-- Departments where average salary exceeds company average
SELECT department, AVG(salary) AS dept_avg
FROM employees
GROUP BY department
HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);
Subquery vs. JOIN
Many subqueries can be rewritten as JOINs and vice versa:
-- Subquery approach
SELECT customer_name
FROM customers
WHERE customer_id IN (SELECT customer_id FROM orders WHERE amount > 1000);
-- Equivalent JOIN approach
SELECT DISTINCT c.customer_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.amount > 1000;
When to prefer subqueries:
- The logic is clearer as nested steps
- You only need to check existence (EXISTS is often faster than JOIN + DISTINCT)
- The subquery result is small and reusable
When to prefer JOINs:
- You need columns from both tables in the result
- Performance is critical (JOINs are often better optimized by query planners)
- The relationship is straightforward
Subquery vs. CTE
Common Table Expressions (CTEs) are an alternative to subqueries that improve readability:
-- Complex nested subqueries (hard to read)
SELECT * FROM (
SELECT customer_id, SUM(amount) AS total
FROM orders
WHERE order_date >= '2026-01-01'
GROUP BY customer_id
) totals
WHERE total > (SELECT AVG(total) FROM (
SELECT SUM(amount) AS total FROM orders GROUP BY customer_id
) avgs);
-- Same logic as CTE (much clearer)
WITH customer_totals AS (
SELECT customer_id, SUM(amount) AS total
FROM orders
WHERE order_date >= '2026-01-01'
GROUP BY customer_id
),
avg_total AS (
SELECT AVG(total) AS avg_val FROM customer_totals
)
SELECT ct.*
FROM customer_totals ct
CROSS JOIN avg_total
WHERE ct.total > avg_total.avg_val;
Performance Considerations
-
Correlated subqueries execute per row. For a 1M-row outer query, the subquery runs 1M times. Use JOINs or window functions when possible.
-
IN with large result sets is slow. If the subquery returns 100K+ rows, consider EXISTS (short-circuits on first match) or a JOIN.
-
NOT IN with NULLs is dangerous. If the subquery can return NULL, NOT IN returns no rows. Use NOT EXISTS instead.
-- Dangerous: returns nothing if any order has NULL customer_id
SELECT * FROM customers
WHERE customer_id NOT IN (SELECT customer_id FROM orders);
-- Safe: handles NULLs correctly
SELECT * FROM customers c
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);
- Materialize repeated subqueries. If the same subquery appears multiple times, use a CTE (WITH clause) so it executes once.
Real-World Patterns
Find records with no matching entry (anti-join):
SELECT p.* FROM products p
WHERE NOT EXISTS (
SELECT 1 FROM order_items oi WHERE oi.product_id = p.id
);
Get the top N per group:
SELECT * FROM employees e
WHERE (
SELECT COUNT(*) FROM employees e2
WHERE e2.department = e.department AND e2.salary > e.salary
) < 3; -- Top 3 earners per department
Conditional aggregation with subquery:
SELECT
product_id,
sales_count,
CASE
WHEN sales_count > (SELECT AVG(sales_count) * 2 FROM products) THEN 'best_seller'
WHEN sales_count < (SELECT AVG(sales_count) * 0.5 FROM products) THEN 'underperformer'
ELSE 'average'
END AS performance_tier
FROM products;
Platforms like Skopx handle subquery complexity automatically. Ask "Which products have never been ordered?" and the AI generates the appropriate NOT EXISTS subquery without you writing SQL.
Summary
| Subquery Type | Returns | Use When |
|---|---|---|
| Scalar | Single value | Comparing to a computed value |
| Row | One row, multiple columns | Multi-column comparison |
| Table (IN/NOT IN) | Multiple rows | Filtering against a list |
| Correlated | Depends on outer row | Per-row calculations |
| EXISTS | Boolean (any rows?) | Checking existence efficiently |
| Derived table (FROM) | Result set | Creating intermediate calculations |
Master subqueries to break complex analytical questions into logical steps that SQL can execute efficiently.
Saad Selim
The Skopx engineering and product team