· SQL Guroo · SQL Tutorial · 3 min read
SQL Subqueries vs CTEs - When to Use What in Data Analysis
Master the art of writing cleaner and more efficient SQL queries by understanding when to use subqueries versus Common Table Expressions (CTEs).
Introduction
As data analysts, we often need to write complex SQL queries that involve multiple steps. Two powerful features help us handle these situations: Subqueries and Common Table Expressions (CTEs). In this guide, we’ll understand when to use each approach with practical examples.
Why This Matters?
Imagine you’re analyzing customer data and need to answer questions like:
- Which customers spend more than average?
- How has each customer’s spending changed over time?
- Who are our most valuable customers?
You can solve these problems using either subqueries or CTEs. The right choice can make your code easier to read, maintain, and debug.
Let’s use this sample data for our examples:
-- Create and populate tables
CREATE TABLE orders (
order_id INTEGER,
customer_id INTEGER,
order_date DATE,
total_amount DECIMAL(10,2)
);
CREATE TABLE customers (
customer_id INTEGER,
name VARCHAR(100),
signup_date DATE
);
-- Sample data
INSERT INTO orders VALUES
(1, 101, '2024-01-01', 150.00),
(2, 102, '2024-01-01', 200.00),
(3, 101, '2024-01-02', 75.00),
(4, 103, '2024-01-02', 300.00);
INSERT INTO customers VALUES
(101, 'Alice', '2023-12-01'),
(102, 'Bob', '2023-12-15'),
(103, 'Charlie', '2024-01-01');
Main Content: Understanding Through Examples
Example 1: Finding Above-Average Orders
Let’s find customers who placed orders larger than the average order amount.
Using a Subquery:
SELECT
c.name,
o.order_date,
o.total_amount,
(SELECT AVG(total_amount) FROM orders) as avg_order_amount
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.total_amount > (
SELECT AVG(total_amount)
FROM orders
);
This query:
- Calculates the average order amount in a subquery
- Uses it to filter orders above this average
- Shows customer details for these orders
Results:
name | order_date | total_amount | avg_order_amount
---------|------------|--------------|------------------
Bob | 2024-01-01 | 200.00 | 181.25
Charlie | 2024-01-02 | 300.00 | 181.25
Using a CTE:
WITH avg_order AS (
-- Step 1: Calculate average order amount
SELECT AVG(total_amount) as avg_amount
FROM orders
),
above_average_orders AS (
-- Step 2: Find orders above average
SELECT
o.customer_id,
o.order_date,
o.total_amount
FROM orders o
CROSS JOIN avg_order
WHERE o.total_amount > avg_order.avg_amount
)
-- Step 3: Get customer details
SELECT
c.name,
aao.order_date,
aao.total_amount,
avg_order.avg_amount as avg_order_amount
FROM above_average_orders aao
JOIN customers c ON aao.customer_id = c.customer_id
CROSS JOIN avg_order;
This CTE approach:
- First calculates the average in a named step (avg_order)
- Uses this to find qualifying orders (above_average_orders)
- Finally joins with customer data
- Makes each step clear and readable
Example 2: Customer Spending Analysis
Let’s analyze how customer spending has changed between their first and latest orders.
WITH customer_orders AS (
-- Step 1: Add row numbers to orders
SELECT
customer_id,
order_date,
total_amount,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date
) as first_order,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date DESC
) as last_order
FROM orders
)
SELECT
c.name,
first_order.total_amount as first_order_amount,
last_order.total_amount as latest_order_amount,
last_order.total_amount - first_order.total_amount as amount_change
FROM customers c
JOIN customer_orders first_order
ON c.customer_id = first_order.customer_id
AND first_order.first_order = 1
JOIN customer_orders last_order
ON c.customer_id = last_order.customer_id
AND last_order.last_order = 1
WHERE first_order.order_date != last_order.order_date;
This query:
- Uses a CTE to identify first and last orders
- Calculates the change in order amounts
- Shows how spending patterns have evolved
Results:
name | first_order_amount | latest_order_amount | amount_change
-------|-------------------|--------------------|--------------
Alice | 150.00 | 75.00 | -75.00
Best Practices
Use Subqueries When:
- You need a simple lookup value
- The logic is straightforward
- You’re doing a one-time calculation
Use CTEs When:
- You need to break down complex logic
- You want to reuse intermediate results
- Readability is important
General Tips:
- Start simple and add complexity as needed
- Test with small data sets first
- Comment your code for future reference
Remember: The best approach is often the simplest one that clearly expresses your intent.
Happy querying! 🚀