· SQL Guroo · SQL Tutorial  · 4 min read

Advanced SQL Aggregations - Beyond GROUP BY for Data Science Applications

Master advanced SQL aggregation techniques to transform raw data into meaningful insights for data science and analytics workflows.

Master advanced SQL aggregation techniques to transform raw data into meaningful insights for data science and analytics workflows.

Introduction

While basic SQL aggregations like COUNT(), SUM(), and AVG() are essential, modern data analysis often requires more sophisticated techniques. In this guide, we’ll explore advanced aggregation methods that will help you extract deeper insights from your data.

Why Advanced Aggregations Matter?

As a data analyst, you often need to answer complex questions like:

  • “What percentage of our customers are in each spending tier?”
  • “How many products sell above their category’s average price?”
  • “What’s the distribution of order values by customer type?”

These questions require combining multiple aggregation techniques. Let’s learn through practical examples.

Main Content: Advanced Aggregation Techniques

Let’s use this sample data for our examples:

-- Create our tables
CREATE TABLE sales (
    sale_date DATE,
    product_id INTEGER,
    category VARCHAR(50),
    customer_type VARCHAR(20),  -- 'new' or 'returning'
    quantity INTEGER,
    unit_price DECIMAL(10,2),
    total_amount DECIMAL(10,2)
);

-- Insert sample data
INSERT INTO sales VALUES
('2024-01-01', 1, 'Electronics', 'new', 1, 1200.00, 1200.00),
('2024-01-01', 2, 'Electronics', 'returning', 2, 25.00, 50.00),
('2024-01-02', 1, 'Electronics', 'returning', 1, 1200.00, 1200.00),
('2024-01-02', 3, 'Clothing', 'new', 3, 50.00, 150.00),
('2024-01-03', 4, 'Clothing', 'returning', 2, 75.00, 150.00),
('2024-01-03', 1, 'Electronics', 'new', 1, 1200.00, 1200.00);

Example 1: Conditional Aggregations with CASE

Let’s analyze sales patterns between new and returning customers:

SELECT 
    category,
    COUNT(*) as total_sales,
    -- Count sales by customer type
    SUM(CASE 
        WHEN customer_type = 'new' THEN 1 
        ELSE 0 
    END) as new_customer_sales,
    -- Calculate revenue by customer type
    SUM(CASE 
        WHEN customer_type = 'new' THEN total_amount 
        ELSE 0 
    END) as new_customer_revenue,
    -- Calculate average order value by customer type
    ROUND(
        AVG(CASE 
            WHEN customer_type = 'new' THEN total_amount 
        END),
    2) as new_customer_avg_order
FROM sales
GROUP BY category;

Let’s break down this query:

  1. COUNT(*) as total_sales:

    • Counts all sales in each category
    • Simple baseline metric
  2. First CASE statement:

    • Counts sales to new customers
    • Returns 1 for new customers, 0 for others
    • SUM adds up these 1s and 0s
  3. Second CASE statement:

    • Sums revenue from new customers
    • Uses actual total_amount for new customers
    • Returns 0 for returning customers
  4. Third CASE statement:

    • Calculates average order value for new customers
    • NULL for returning customers (ignored by AVG)
    • ROUND for cleaner numbers

Results:

category    | total_sales | new_customer_sales | new_customer_revenue | new_customer_avg_order
------------|-------------|-------------------|---------------------|----------------------
Electronics |           4 |                 2 |             2400.00 |               1200.00
Clothing    |           2 |                 1 |              150.00 |                150.00

Example 2: Comparing to Group Averages

Let’s identify products that sell above their category’s average price:

WITH category_stats AS (
    -- Step 1: Calculate category averages
    SELECT 
        category,
        AVG(unit_price) as avg_category_price,
        COUNT(DISTINCT product_id) as product_count
    FROM sales
    GROUP BY category
)
SELECT 
    s.category,
    s.product_id,
    s.unit_price,
    cs.avg_category_price,
    ROUND(
        ((s.unit_price - cs.avg_category_price) / 
        cs.avg_category_price * 100),
    2) as price_diff_percent,
    CASE
        WHEN s.unit_price > cs.avg_category_price THEN 'Above Average'
        WHEN s.unit_price < cs.avg_category_price THEN 'Below Average'
        ELSE 'Average'
    END as price_category
FROM sales s
JOIN category_stats cs ON s.category = cs.category
GROUP BY 
    s.category, 
    s.product_id, 
    s.unit_price, 
    cs.avg_category_price;

Let’s understand what’s happening:

  1. The CTE (category_stats):

    • Calculates average price per category
    • Counts unique products per category
    • Creates a reference point for comparisons
  2. Main Query:

    • Joins sales with category averages
    • Calculates percentage difference from average
    • Categorizes products based on price

Results:

category    | product_id | unit_price | avg_category_price | price_diff_percent | price_category
------------|------------|------------|-------------------|-------------------|---------------
Electronics |          1 |    1200.00 |            808.33 |             48.45 | Above Average
Electronics |          2 |      25.00 |            808.33 |            -96.91 | Below Average
Clothing    |          3 |      50.00 |             62.50 |            -20.00 | Below Average
Clothing    |          4 |      75.00 |             62.50 |             20.00 | Above Average

Best Practices

  1. Use CTEs for Complex Aggregations

    • Break calculations into logical steps
    • Makes code easier to understand and maintain
    • Helps with debugging
  2. Handle NULL Values Properly

    • Use COALESCE for default values
    • Consider NULLIF to avoid division by zero
    • Document NULL handling in comments
  3. Optimize Performance

    • Index commonly grouped columns
    • Use WHERE before GROUP BY
    • Consider materialized views for complex calculations
  4. Write Clear Code

    • Use descriptive column aliases
    • Add comments for complex logic
    • Break long queries into CTEs

Remember: Advanced aggregations are powerful tools for data analysis. Start with simple cases and build up to more complex scenarios as you become comfortable with the techniques.

Happy analyzing! 🚀

Back to Blog

Related Posts

View All Posts »