💡 Ask Tutor

Chapter 11: Grouping Data with GROUP BY and HAVING

In the previous chapter, you learned about aggregate functions that summarize data across an entire table (or a filtered set of it). But what if you want to know the count of products per category, or the total sales per user, or the average price of products per active status? Simply using COUNT(), SUM(), etc., won’t give you this breakdown.

This is where the GROUP BY clause becomes indispensable. It allows you to divide the rows in your result set into groups based on common values in one or more columns, and then apply aggregate functions to each of those groups. The HAVING clause then allows you to filter these groups (whereas WHERE filters individual rows).

The Purpose of GROUP BY

The GROUP BY clause groups rows that have the same values in specified columns into a summary row for each group. For each group, you can then apply aggregate functions (like COUNT, SUM, AVG, MIN, MAX) to get a single result per group.

Think of it like this: If you have a products table and you GROUP BY category, the database will internally collect all products belonging to ‘Electronics’ into one group, all products belonging to ‘Accessories’ into another, and so on. Then, any aggregate function you use in the SELECT list will operate independently on each of these groups.

The Purpose of HAVING

The WHERE clause filters individual rows before they are grouped. The HAVING clause filters groups after the GROUP BY clause has aggregated the data. You cannot use aggregate functions directly in a WHERE clause because WHERE operates on individual rows, not aggregated results. HAVING is specifically designed for filtering the results of aggregate functions.

Order of Execution (Conceptual):

  1. FROM: Determine the source table(s).
  2. WHERE: Filter individual rows based on conditions.
  3. GROUP BY: Group the filtered rows based on specified columns.
  4. HAVING: Filter the created groups based on aggregate conditions.
  5. SELECT: Choose which columns (including aggregate results) to display.
  6. ORDER BY: Sort the final result set.
  7. LIMIT: Limit the number of returned rows.

SQL Syntax & Examples

Let’s use our scriptbuzz_db and the data we’ve populated. Remember to run USE scriptbuzz_db; first.

GROUP BY with Aggregate Functions

Syntax:

SQL
SELECT column_to_group_by, AGGREGATE_FUNCTION(column_to_aggregate) AS alias
FROM table_name
WHERE condition (optional, filters rows BEFORE grouping)
GROUP BY column_to_group_by;

Important Rule for SELECT with GROUP BY: Any column in the SELECT list that is not part of an aggregate function must also be listed in the GROUP BY clause. This ensures that for each row in your result, the values for the non-aggregated columns are consistent within that group.

Example 1: Count products per category

SQL
SELECT category, COUNT(product_id) AS product_count
FROM products
GROUP BY category;

Expected Output (example):

categoryproduct_count
Accessories1
Electronics3
NULL4

What it does: It groups products by their category. For each unique category, it counts how many products belong to that category. Products with a NULL category form their own group.

Example 2: Sum of stock quantity per category and average price per category

SQL
SELECT category,
       SUM(stock_quantity) AS total_stock_in_category,
       AVG(price) AS average_price_in_category
FROM products
GROUP BY category;

Expected Output (example):

categorytotal_stock_in_categoryaverage_price_in_category
Accessories15030.000000
Electronics110541.666667
NULL33055.122500

What it does: Groups by category, then for each group, calculates the sum of stock_quantity and the average price.

Grouping by Multiple Columns

You can group by more than one column. The groups will be formed based on the unique combination of values in all specified GROUP BY columns.

Example 3: Count users by is_admin status and by created_at date (grouping by date part only)

To group by just the date, we use DATE() function in MySQL/MariaDB.

SQL
SELECT DATE(created_at) AS registration_date,
       is_admin,
       COUNT(user_id) AS user_count
FROM users
GROUP BY DATE(created_at), is_admin
ORDER BY registration_date ASC, is_admin DESC;

Expected Output (example – based on previous insertions):

registration_dateis_adminuser_count
2025-06-2911
2025-06-2902

What it does: Groups users by the day they registered AND whether they are an admin or not. Then counts users within each unique combination.

Filtering Groups with HAVING

HAVING is used to filter the groups created by GROUP BY based on conditions involving aggregate functions.

Syntax:

SQL
SELECT column_to_group_by, AGGREGATE_FUNCTION(column_to_aggregate) AS alias
FROM table_name
GROUP BY column_to_group_by
HAVING aggregate_condition; -- This condition uses the aggregate alias or function directly

Example 4: Categories with more than 1 product

SQL
SELECT category, COUNT(product_id) AS product_count
FROM products
GROUP BY category
HAVING COUNT(product_id) > 1;

Expected Output (example):

categoryproduct_count
Electronics3
NULL4

What it does: First, it groups products by category and counts them. Then, it filters those groups, showing only the categories that have more than 1 product. Notice you can’t use WHERE COUNT(product_id) > 1 here.

Example 5: Categories where the average price is over $100

SQL
SELECT category, AVG(price) AS average_price
FROM products
GROUP BY category
HAVING AVG(price) > 100;

Expected Output (example):

categoryaverage_price
Electronics541.666667

What it does: Groups by category, calculates the average price for each. Then, it filters these groups to show only those where the calculated average_price is greater than 100.

Combining WHERE, GROUP BY, and HAVING

You can use all three clauses together. WHERE filters rows before grouping, and HAVING filters groups after aggregation.

Example 6: Count active products per category, but only for categories with at least 2 active products

SQL
SELECT category, COUNT(product_id) AS active_product_count
FROM products
WHERE is_active = 1 -- Filters out inactive products first
GROUP BY category
HAVING COUNT(product_id) >= 2; -- Filters groups where active product count is >= 2

Expected Output (example):

categoryactive_product_count
Electronics3
NULL3

What it does:

  1. Filters rows: Only products where is_active = 1 are considered.
  2. Groups: The remaining active products are grouped by category.
  3. Aggregates: COUNT(product_id) is performed on each active category group.
  4. Filters Groups: Only groups where the active_product_count is 2 or more are returned.

Common Pitfalls and Tricky Aspects

  • WHERE vs. HAVING: This is the most common confusion.
    • Use WHERE for conditions on individual rows.
    • Use HAVING for conditions on aggregated groups.
  • Non-Aggregated Columns in SELECT: Any column in SELECT that is not inside an aggregate function must also appear in the GROUP BY clause. Otherwise, the database wouldn’t know which value to pick for that column for each group (since a group contains multiple rows).
  • Performance: GROUP BY can be resource-intensive, especially on large tables with many distinct values in the grouping columns. Indexes on GROUP BY columns can help.

Notes:

  • The GROUP BY clause groups rows with identical values in specified columns into summary rows, allowing aggregate functions to operate on these groups.
  • Any non-aggregated column in the SELECT list must also be in the GROUP BY clause.
  • The HAVING clause filters the results of GROUP BY based on conditions that involve aggregate functions.
  • Order of operations: WHERE (filters rows) -> GROUP BY (groups rows) -> HAVING (filters groups).
  • GROUP BY is essential for generating summary reports and statistics per category or dimension.

Small Exercise

Using your scriptbuzz_db:

  1. Count the number of users for each is_admin status (0 or 1).
  2. Find the average price of products for each category, but only show categories where the average price is less than $100.
  3. If you have an orders table, calculate the total total_amount for each user_id, but only for users who have placed orders totaling more than $500.
  4. If you have a posts table, count the number of posts per user_id, but only show users who have written at least 2 published posts.

You’ve now unlocked advanced analytical capabilities in SQL! Next, we’ll shift back to modifying data with the UPDATE statement.

🚀 Explore Popular Learning Tracks