Chapter 16: Subqueries (Nested Queries)
So far, you’ve combined data from multiple tables primarily using JOINs. While JOINs are incredibly powerful and often the most efficient way to link related data, there are situations where you might need to use the result of one query as an input for another query. This is precisely what subqueries (also known as nested queries or inner queries) allow you to do.
A subquery is a SELECT
statement embedded within another SQL statement (e.g., another SELECT
, INSERT
, UPDATE
, DELETE
, or even within WHERE
, FROM
, HAVING
clauses). The inner query executes first, and its result is then used by the outer query.
How Subqueries Work
The database executes the innermost query first. The result of this subquery is then passed to the outer query, which then executes using that result.
Types of Subqueries (based on where they are used):
- In the
WHERE
clause (most common): Used to filter rows based on a condition that depends on the result of another query.IN
/NOT IN
: Checks if a value is present/absent in a list returned by the subquery.=
/!=
/>
/<
etc.: Checks against a single value returned by the subquery (subquery must return only one row and one column).EXISTS
/NOT EXISTS
: Checks for the existence of rows returned by the subquery.
- In the
FROM
clause (Derived Tables / Inline Views): The subquery acts as a temporary, virtual table that the outer query can then query. This is useful for pre-aggregating data or simplifying complex joins. - In the
SELECT
clause (Scalar Subqueries): The subquery returns a single value (one row, one column) for each row in the outer query’s result.
When to Use Subqueries (and when not to)
- Use when:
- You need to filter data based on values that are themselves derived from another query.
- You need to compute aggregates or complex conditions that are then used in the main query.
- You need to check for existence of related records without retrieving all their details.
- You are performing an
INSERT
,UPDATE
, orDELETE
based on a condition determined by another query.
- Be cautious when:
- An equivalent
JOIN
can achieve the same result. JOINs are often more efficient and readable, especially for large datasets. Subqueries can sometimes lead to performance issues if not used carefully.
- An equivalent
SQL Syntax & Examples
Let’s use our scriptbuzz_db
. Remember to run USE scriptbuzz_db;
first.
1. Subqueries in the WHERE
Clause (Filtering)
Example 1: Find users who have placed orders Instead of a JOIN
, we can use IN
with a subquery.
SELECT user_id, username, email
FROM users
WHERE user_id IN (SELECT user_id FROM orders);
What it does:
- Inner Query:
SELECT user_id FROM orders
returns a list of alluser_id
s that exist in theorders
table (e.g.,(1, 2, 1, 1, 2)
which becomes(1, 2)
distinct values). - Outer Query:
SELECT ... FROM users WHERE user_id IN (1, 2)
retrieves users whoseuser_id
is present in that list.
Expected Output (example):
user_id | username | |
1 | john_doe | john.new.email@example.com |
2 | admin_user | new.admin@example.com |
Equivalent JOIN (often preferred for performance/readability):
SELECT DISTINCT u.user_id, u.username, u.email
FROM users AS u
INNER JOIN orders AS o ON u.user_id = o.user_id;
Example 2: Find products that have a price higher than the average product price
SELECT product_id, name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products);
What it does:
- Inner Query:
SELECT AVG(price) FROM products
calculates the average price of all products (e.g.,264.212857
). - Outer Query:
SELECT ... FROM products WHERE price > 264.212857
filters products whose price is greater than that average.
Expected Output (example):
product_id | name | price |
1 | Laptop Pro | 1320.00 |
7 | Gaming Monitor | 385.00 |
Example 3: Find users who have NOT placed any orders (using NOT IN
)
SELECT user_id, username, email
FROM users
WHERE user_id NOT IN (SELECT user_id FROM orders);
What it does: Returns users whose user_id
is not found in the list of user_id
s from the orders
table.
Expected Output (example, assuming user 3 ‘jane_doe’ has no orders):
user_id | username | |
3 | jane_doe | jane.doe@example.com |
Example 4: Using EXISTS
to find users with orders EXISTS
is efficient because it stops scanning as soon as it finds one match in the subquery.
SELECT u.user_id, u.username
FROM users AS u
WHERE EXISTS (SELECT 1 FROM orders AS o WHERE o.user_id = u.user_id);
What it does: For each user in the outer query, the inner query checks if any record exists in orders
where o.user_id
matches u.user_id
. If EXISTS
returns true (even if multiple matches exist), the outer query includes that user. It’s usually more efficient than IN
when the subquery is correlated (references a column from the outer query).
2. Subqueries in the FROM
Clause (Derived Tables)
When a subquery is used in the FROM
clause, its result set is treated as a temporary, inline table. It must be given an alias.
Example 5: Find the average number of products per active category
SELECT AVG(products_in_category) AS avg_products_per_active_category
FROM (
SELECT category, COUNT(product_id) AS products_in_category
FROM products
WHERE is_active = 1
GROUP BY category
) AS category_counts;
What it does:
Inner Query (Derived Table): (SELECT category, COUNT(product_id) AS products_in_category FROM products WHERE is_active = 1 GROUP BY category)
first calculates the count of active products for each category. This temporary result set might look like:
category | products_in_category |
---|---|
Accessories | 1 |
Electronics | 3 |
NULL | 3 |
Outer Query: SELECT AVG(products_in_category) FROM category_counts
then calculates the average of the products_in_category
column from this temporary table.
Expected Output (example):
avg_products_per_active_category |
2.3333 |
3. Subqueries in the SELECT
Clause (Scalar Subqueries)
A scalar subquery returns a single value (one row, one column). If it returns more than one value, an error occurs.
Example 6: List each user and their total order amount (or 0 if no orders)
SELECT
u.user_id,
u.username,
(SELECT COALESCE(SUM(o.total_amount), 0)
FROM orders AS o
WHERE o.user_id = u.user_id
) AS total_order_amount
FROM users AS u;
What it does: For each user retrieved by the outer query, the inner subquery executes. It sums the total_amount
for orders belonging to that specific user. If the user has no orders, SUM()
returns NULL
, and COALESCE
converts it to 0
.
Expected Output (example):
user_id | username | total_order_amount |
1 | john_doe | 380.75 |
2 | admin_user | 150.25 |
3 | jane_doe | 0.00 |
Equivalent LEFT JOIN
(often preferred for performance):
SELECT
u.user_id,
u.username,
COALESCE(SUM(o.total_amount), 0) AS total_order_amount
FROM users AS u
LEFT JOIN orders AS o ON u.user_id = o.user_id
GROUP BY u.user_id, u.username;
Common Pitfalls and Best Practices
- Performance: Subqueries can sometimes be less efficient than equivalent JOINs, especially correlated subqueries (where the inner query depends on the outer query, like in
EXISTS
or scalar subqueries). The inner query might be re-executed for every row of the outer query. Databases often optimize JOINs better. - Readability: For complex logic, subqueries can sometimes be more readable by breaking down the problem. However, for simple filtering or direct relationships, JOINs are usually clearer.
- Single Value Requirement: Scalar subqueries (
SELECT
clause subqueries) must return only one row and one column. If they return multiple, it’s an error. - Naming Derived Tables: Always alias derived tables (subqueries in the
FROM
clause). - When to choose?
- JOIN when you’re primarily combining columns from multiple tables.
- Subquery in
WHERE
when you’re filtering based on a condition derived from another query’s result set. - Subquery in
FROM
when you need to perform aggregations or complex filtering before joining or applying further logic to the main query. - Subquery in
SELECT
when you need a single, aggregated value for each row of the main query.
Notes:
- A subquery (or nested query) is a
SELECT
statement embedded within another SQL statement. - Subqueries execute first, and their results are used by the outer query.
- Common uses:
WHERE
clause: For filtering rows (IN
,NOT IN
,=
,>
,<>
,EXISTS
,NOT EXISTS
).FROM
clause (Derived Tables): Treats the subquery’s result as a temporary table. Must be aliased.SELECT
clause (Scalar Subqueries): Returns a single value per row of the outer query.
- While powerful, often an equivalent
JOIN
can be more efficient and readable. Choose wisely based on the complexity and desired outcome.
Small Exercise
Using your scriptbuzz_db
:
- Find the
username
andemail
of users who have not purchased ‘Laptop Pro’. (Hint: You’ll needproducts
,order_items
,orders
, andusers
. Think about subqueries withNOT IN
onuser_id
after finding users who did buy ‘Laptop Pro’). - List the
name
andprice
of products that are more expensive than any product in the ‘Accessories’ category. (Hint: UseMAX()
in a subquery andWHERE
clause). - Calculate the total number of orders placed by the user with the most orders. (Hint: This will involve a subquery in the
WHERE
clause withCOUNT()
andMAX()
, or a derived table approach).
You’ve now expanded your SQL arsenal with subqueries! This gives you more tools to solve complex data retrieval problems. Next, we’ll dive into another essential aspect of database design and performance: managing table structures.