Chapter 11: Grouping Data with GROUP BY and HAVING
In the previous chapter, you learned about aggregate functions that summarize data across an entire table (or a filtered set of it). But what if you want to know the count of products per category, or the total sales per user, or the average price of products per active status? Simply using COUNT()
, SUM()
, etc., won’t give you this breakdown.
This is where the GROUP BY
clause becomes indispensable. It allows you to divide the rows in your result set into groups based on common values in one or more columns, and then apply aggregate functions to each of those groups. The HAVING
clause then allows you to filter these groups (whereas WHERE
filters individual rows).
The Purpose of GROUP BY
The GROUP BY
clause groups rows that have the same values in specified columns into a summary row for each group. For each group, you can then apply aggregate functions (like COUNT
, SUM
, AVG
, MIN
, MAX
) to get a single result per group.
Think of it like this: If you have a products
table and you GROUP BY category
, the database will internally collect all products belonging to ‘Electronics’ into one group, all products belonging to ‘Accessories’ into another, and so on. Then, any aggregate function you use in the SELECT
list will operate independently on each of these groups.
The Purpose of HAVING
The WHERE
clause filters individual rows before they are grouped. The HAVING
clause filters groups after the GROUP BY
clause has aggregated the data. You cannot use aggregate functions directly in a WHERE
clause because WHERE
operates on individual rows, not aggregated results. HAVING
is specifically designed for filtering the results of aggregate functions.
Order of Execution (Conceptual):
FROM
: Determine the source table(s).WHERE
: Filter individual rows based on conditions.GROUP BY
: Group the filtered rows based on specified columns.HAVING
: Filter the created groups based on aggregate conditions.SELECT
: Choose which columns (including aggregate results) to display.ORDER BY
: Sort the final result set.LIMIT
: Limit the number of returned rows.
SQL Syntax & Examples
Let’s use our scriptbuzz_db
and the data we’ve populated. Remember to run USE scriptbuzz_db;
first.
GROUP BY
with Aggregate Functions
Syntax:
SELECT column_to_group_by, AGGREGATE_FUNCTION(column_to_aggregate) AS alias
FROM table_name
WHERE condition (optional, filters rows BEFORE grouping)
GROUP BY column_to_group_by;
Important Rule for SELECT
with GROUP BY
: Any column in the SELECT
list that is not part of an aggregate function must also be listed in the GROUP BY
clause. This ensures that for each row in your result, the values for the non-aggregated columns are consistent within that group.
Example 1: Count products per category
SELECT category, COUNT(product_id) AS product_count
FROM products
GROUP BY category;
Expected Output (example):
category | product_count |
Accessories | 1 |
Electronics | 3 |
NULL | 4 |
What it does: It groups products by their category
. For each unique category, it counts how many products belong to that category. Products with a NULL
category form their own group.
Example 2: Sum of stock quantity per category and average price per category
SELECT category,
SUM(stock_quantity) AS total_stock_in_category,
AVG(price) AS average_price_in_category
FROM products
GROUP BY category;
Expected Output (example):
category | total_stock_in_category | average_price_in_category |
Accessories | 150 | 30.000000 |
Electronics | 110 | 541.666667 |
NULL | 330 | 55.122500 |
What it does: Groups by category
, then for each group, calculates the sum of stock_quantity
and the average price
.
Grouping by Multiple Columns
You can group by more than one column. The groups will be formed based on the unique combination of values in all specified GROUP BY
columns.
Example 3: Count users by is_admin
status and by created_at
date (grouping by date part only)
To group by just the date, we use DATE()
function in MySQL/MariaDB.
SELECT DATE(created_at) AS registration_date,
is_admin,
COUNT(user_id) AS user_count
FROM users
GROUP BY DATE(created_at), is_admin
ORDER BY registration_date ASC, is_admin DESC;
Expected Output (example – based on previous insertions):
registration_date | is_admin | user_count |
2025-06-29 | 1 | 1 |
2025-06-29 | 0 | 2 |
What it does: Groups users by the day they registered AND whether they are an admin or not. Then counts users within each unique combination.
Filtering Groups with HAVING
HAVING
is used to filter the groups created by GROUP BY
based on conditions involving aggregate functions.
Syntax:
SELECT column_to_group_by, AGGREGATE_FUNCTION(column_to_aggregate) AS alias
FROM table_name
GROUP BY column_to_group_by
HAVING aggregate_condition; -- This condition uses the aggregate alias or function directly
Example 4: Categories with more than 1 product
SELECT category, COUNT(product_id) AS product_count
FROM products
GROUP BY category
HAVING COUNT(product_id) > 1;
Expected Output (example):
category | product_count |
Electronics | 3 |
NULL | 4 |
What it does: First, it groups products by category and counts them. Then, it filters those groups, showing only the categories that have more than 1 product. Notice you can’t use WHERE COUNT(product_id) > 1
here.
Example 5: Categories where the average price is over $100
SELECT category, AVG(price) AS average_price
FROM products
GROUP BY category
HAVING AVG(price) > 100;
Expected Output (example):
category | average_price |
Electronics | 541.666667 |
What it does: Groups by category, calculates the average price for each. Then, it filters these groups to show only those where the calculated average_price
is greater than 100.
Combining WHERE
, GROUP BY
, and HAVING
You can use all three clauses together. WHERE
filters rows before grouping, and HAVING
filters groups after aggregation.
Example 6: Count active products per category, but only for categories with at least 2 active products
SELECT category, COUNT(product_id) AS active_product_count
FROM products
WHERE is_active = 1 -- Filters out inactive products first
GROUP BY category
HAVING COUNT(product_id) >= 2; -- Filters groups where active product count is >= 2
Expected Output (example):
category | active_product_count |
Electronics | 3 |
NULL | 3 |
What it does:
- Filters rows: Only products where
is_active = 1
are considered. - Groups: The remaining active products are grouped by
category
. - Aggregates:
COUNT(product_id)
is performed on each active category group. - Filters Groups: Only groups where the
active_product_count
is 2 or more are returned.
Common Pitfalls and Tricky Aspects
WHERE
vs.HAVING
: This is the most common confusion.- Use
WHERE
for conditions on individual rows. - Use
HAVING
for conditions on aggregated groups.
- Use
- Non-Aggregated Columns in
SELECT
: Any column inSELECT
that is not inside an aggregate function must also appear in theGROUP BY
clause. Otherwise, the database wouldn’t know which value to pick for that column for each group (since a group contains multiple rows). - Performance:
GROUP BY
can be resource-intensive, especially on large tables with many distinct values in the grouping columns. Indexes onGROUP BY
columns can help.
Notes:
- The
GROUP BY
clause groups rows with identical values in specified columns into summary rows, allowing aggregate functions to operate on these groups. - Any non-aggregated column in the
SELECT
list must also be in theGROUP BY
clause. - The
HAVING
clause filters the results ofGROUP BY
based on conditions that involve aggregate functions. - Order of operations:
WHERE
(filters rows) ->GROUP BY
(groups rows) ->HAVING
(filters groups). GROUP BY
is essential for generating summary reports and statistics per category or dimension.
Small Exercise
Using your scriptbuzz_db
:
- Count the number of users for each
is_admin
status (0 or 1). - Find the average price of products for each
category
, but only show categories where the average price is less than $100. - If you have an
orders
table, calculate the totaltotal_amount
for eachuser_id
, but only for users who have placed orders totaling more than $500. - If you have a
posts
table, count the number of posts peruser_id
, but only show users who have written at least 2 published posts.
You’ve now unlocked advanced analytical capabilities in SQL! Next, we’ll shift back to modifying data with the UPDATE
statement.