· SQL Guroo · SQL Tutorial  · 3 min read

SQL Subqueries vs CTEs - When to Use What in Data Analysis

Master the art of writing cleaner and more efficient SQL queries by understanding when to use subqueries versus Common Table Expressions (CTEs).

Master the art of writing cleaner and more efficient SQL queries by understanding when to use subqueries versus Common Table Expressions (CTEs).

Introduction

As data analysts, we often need to write complex SQL queries that involve multiple steps. Two powerful features help us handle these situations: Subqueries and Common Table Expressions (CTEs). In this guide, we’ll understand when to use each approach with practical examples.

Why This Matters?

Imagine you’re analyzing customer data and need to answer questions like:

  • Which customers spend more than average?
  • How has each customer’s spending changed over time?
  • Who are our most valuable customers?

You can solve these problems using either subqueries or CTEs. The right choice can make your code easier to read, maintain, and debug.

Let’s use this sample data for our examples:

-- Create and populate tables
CREATE TABLE orders (
    order_id INTEGER,
    customer_id INTEGER,
    order_date DATE,
    total_amount DECIMAL(10,2)
);

CREATE TABLE customers (
    customer_id INTEGER,
    name VARCHAR(100),
    signup_date DATE
);

-- Sample data
INSERT INTO orders VALUES
(1, 101, '2024-01-01', 150.00),
(2, 102, '2024-01-01', 200.00),
(3, 101, '2024-01-02', 75.00),
(4, 103, '2024-01-02', 300.00);

INSERT INTO customers VALUES
(101, 'Alice', '2023-12-01'),
(102, 'Bob', '2023-12-15'),
(103, 'Charlie', '2024-01-01');

Main Content: Understanding Through Examples

Example 1: Finding Above-Average Orders

Let’s find customers who placed orders larger than the average order amount.

Using a Subquery:

SELECT 
    c.name,
    o.order_date,
    o.total_amount,
    (SELECT AVG(total_amount) FROM orders) as avg_order_amount
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.total_amount > (
    SELECT AVG(total_amount) 
    FROM orders
);

This query:

  1. Calculates the average order amount in a subquery
  2. Uses it to filter orders above this average
  3. Shows customer details for these orders

Results:

name     | order_date | total_amount | avg_order_amount
---------|------------|--------------|------------------
Bob      | 2024-01-01 |      200.00 |          181.25
Charlie  | 2024-01-02 |      300.00 |          181.25

Using a CTE:

WITH avg_order AS (
    -- Step 1: Calculate average order amount
    SELECT AVG(total_amount) as avg_amount
    FROM orders
),
above_average_orders AS (
    -- Step 2: Find orders above average
    SELECT 
        o.customer_id,
        o.order_date,
        o.total_amount
    FROM orders o
    CROSS JOIN avg_order
    WHERE o.total_amount > avg_order.avg_amount
)
-- Step 3: Get customer details
SELECT 
    c.name,
    aao.order_date,
    aao.total_amount,
    avg_order.avg_amount as avg_order_amount
FROM above_average_orders aao
JOIN customers c ON aao.customer_id = c.customer_id
CROSS JOIN avg_order;

This CTE approach:

  1. First calculates the average in a named step (avg_order)
  2. Uses this to find qualifying orders (above_average_orders)
  3. Finally joins with customer data
  4. Makes each step clear and readable

Example 2: Customer Spending Analysis

Let’s analyze how customer spending has changed between their first and latest orders.

WITH customer_orders AS (
    -- Step 1: Add row numbers to orders
    SELECT 
        customer_id,
        order_date,
        total_amount,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id 
            ORDER BY order_date
        ) as first_order,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id 
            ORDER BY order_date DESC
        ) as last_order
    FROM orders
)
SELECT 
    c.name,
    first_order.total_amount as first_order_amount,
    last_order.total_amount as latest_order_amount,
    last_order.total_amount - first_order.total_amount as amount_change
FROM customers c
JOIN customer_orders first_order 
    ON c.customer_id = first_order.customer_id 
    AND first_order.first_order = 1
JOIN customer_orders last_order 
    ON c.customer_id = last_order.customer_id 
    AND last_order.last_order = 1
WHERE first_order.order_date != last_order.order_date;

This query:

  1. Uses a CTE to identify first and last orders
  2. Calculates the change in order amounts
  3. Shows how spending patterns have evolved

Results:

name   | first_order_amount | latest_order_amount | amount_change
-------|-------------------|--------------------|--------------
Alice  |            150.00 |             75.00 |       -75.00

Best Practices

  1. Use Subqueries When:

    • You need a simple lookup value
    • The logic is straightforward
    • You’re doing a one-time calculation
  2. Use CTEs When:

    • You need to break down complex logic
    • You want to reuse intermediate results
    • Readability is important
  3. General Tips:

    • Start simple and add complexity as needed
    • Test with small data sets first
    • Comment your code for future reference

Remember: The best approach is often the simplest one that clearly expresses your intent.

Happy querying! 🚀

Back to Blog

Related Posts

View All Posts »