· SQL Guroo · SQL Tutorial · 4 min read
Advanced SQL Aggregations - Beyond GROUP BY for Data Science Applications
Master advanced SQL aggregation techniques to transform raw data into meaningful insights for data science and analytics workflows.
Introduction
While basic SQL aggregations like COUNT(), SUM(), and AVG() are essential, modern data analysis often requires more sophisticated techniques. In this guide, we’ll explore advanced aggregation methods that will help you extract deeper insights from your data.
Why Advanced Aggregations Matter?
As a data analyst, you often need to answer complex questions like:
- “What percentage of our customers are in each spending tier?”
- “How many products sell above their category’s average price?”
- “What’s the distribution of order values by customer type?”
These questions require combining multiple aggregation techniques. Let’s learn through practical examples.
Main Content: Advanced Aggregation Techniques
Let’s use this sample data for our examples:
-- Create our tables
CREATE TABLE sales (
sale_date DATE,
product_id INTEGER,
category VARCHAR(50),
customer_type VARCHAR(20), -- 'new' or 'returning'
quantity INTEGER,
unit_price DECIMAL(10,2),
total_amount DECIMAL(10,2)
);
-- Insert sample data
INSERT INTO sales VALUES
('2024-01-01', 1, 'Electronics', 'new', 1, 1200.00, 1200.00),
('2024-01-01', 2, 'Electronics', 'returning', 2, 25.00, 50.00),
('2024-01-02', 1, 'Electronics', 'returning', 1, 1200.00, 1200.00),
('2024-01-02', 3, 'Clothing', 'new', 3, 50.00, 150.00),
('2024-01-03', 4, 'Clothing', 'returning', 2, 75.00, 150.00),
('2024-01-03', 1, 'Electronics', 'new', 1, 1200.00, 1200.00);
Example 1: Conditional Aggregations with CASE
Let’s analyze sales patterns between new and returning customers:
SELECT
category,
COUNT(*) as total_sales,
-- Count sales by customer type
SUM(CASE
WHEN customer_type = 'new' THEN 1
ELSE 0
END) as new_customer_sales,
-- Calculate revenue by customer type
SUM(CASE
WHEN customer_type = 'new' THEN total_amount
ELSE 0
END) as new_customer_revenue,
-- Calculate average order value by customer type
ROUND(
AVG(CASE
WHEN customer_type = 'new' THEN total_amount
END),
2) as new_customer_avg_order
FROM sales
GROUP BY category;
Let’s break down this query:
COUNT(*) as total_sales
:- Counts all sales in each category
- Simple baseline metric
First CASE statement:
- Counts sales to new customers
- Returns 1 for new customers, 0 for others
- SUM adds up these 1s and 0s
Second CASE statement:
- Sums revenue from new customers
- Uses actual total_amount for new customers
- Returns 0 for returning customers
Third CASE statement:
- Calculates average order value for new customers
- NULL for returning customers (ignored by AVG)
- ROUND for cleaner numbers
Results:
category | total_sales | new_customer_sales | new_customer_revenue | new_customer_avg_order
------------|-------------|-------------------|---------------------|----------------------
Electronics | 4 | 2 | 2400.00 | 1200.00
Clothing | 2 | 1 | 150.00 | 150.00
Example 2: Comparing to Group Averages
Let’s identify products that sell above their category’s average price:
WITH category_stats AS (
-- Step 1: Calculate category averages
SELECT
category,
AVG(unit_price) as avg_category_price,
COUNT(DISTINCT product_id) as product_count
FROM sales
GROUP BY category
)
SELECT
s.category,
s.product_id,
s.unit_price,
cs.avg_category_price,
ROUND(
((s.unit_price - cs.avg_category_price) /
cs.avg_category_price * 100),
2) as price_diff_percent,
CASE
WHEN s.unit_price > cs.avg_category_price THEN 'Above Average'
WHEN s.unit_price < cs.avg_category_price THEN 'Below Average'
ELSE 'Average'
END as price_category
FROM sales s
JOIN category_stats cs ON s.category = cs.category
GROUP BY
s.category,
s.product_id,
s.unit_price,
cs.avg_category_price;
Let’s understand what’s happening:
The CTE (
category_stats
):- Calculates average price per category
- Counts unique products per category
- Creates a reference point for comparisons
Main Query:
- Joins sales with category averages
- Calculates percentage difference from average
- Categorizes products based on price
Results:
category | product_id | unit_price | avg_category_price | price_diff_percent | price_category
------------|------------|------------|-------------------|-------------------|---------------
Electronics | 1 | 1200.00 | 808.33 | 48.45 | Above Average
Electronics | 2 | 25.00 | 808.33 | -96.91 | Below Average
Clothing | 3 | 50.00 | 62.50 | -20.00 | Below Average
Clothing | 4 | 75.00 | 62.50 | 20.00 | Above Average
Best Practices
Use CTEs for Complex Aggregations
- Break calculations into logical steps
- Makes code easier to understand and maintain
- Helps with debugging
Handle NULL Values Properly
- Use COALESCE for default values
- Consider NULLIF to avoid division by zero
- Document NULL handling in comments
Optimize Performance
- Index commonly grouped columns
- Use WHERE before GROUP BY
- Consider materialized views for complex calculations
Write Clear Code
- Use descriptive column aliases
- Add comments for complex logic
- Break long queries into CTEs
Remember: Advanced aggregations are powerful tools for data analysis. Start with simple cases and build up to more complex scenarios as you become comfortable with the techniques.
Happy analyzing! 🚀