💡 Ask Tutor

Chapter 16: Subqueries (Nested Queries)

So far, you’ve combined data from multiple tables primarily using JOINs. While JOINs are incredibly powerful and often the most efficient way to link related data, there are situations where you might need to use the result of one query as an input for another query. This is precisely what subqueries (also known as nested queries or inner queries) allow you to do.

A subquery is a SELECT statement embedded within another SQL statement (e.g., another SELECT, INSERT, UPDATE, DELETE, or even within WHERE, FROM, HAVING clauses). The inner query executes first, and its result is then used by the outer query.

How Subqueries Work

The database executes the innermost query first. The result of this subquery is then passed to the outer query, which then executes using that result.

Types of Subqueries (based on where they are used):

  1. In the WHERE clause (most common): Used to filter rows based on a condition that depends on the result of another query.
    • IN / NOT IN: Checks if a value is present/absent in a list returned by the subquery.
    • = / != / > / < etc.: Checks against a single value returned by the subquery (subquery must return only one row and one column).
    • EXISTS / NOT EXISTS: Checks for the existence of rows returned by the subquery.
  2. In the FROM clause (Derived Tables / Inline Views): The subquery acts as a temporary, virtual table that the outer query can then query. This is useful for pre-aggregating data or simplifying complex joins.
  3. In the SELECT clause (Scalar Subqueries): The subquery returns a single value (one row, one column) for each row in the outer query’s result.

When to Use Subqueries (and when not to)

  • Use when:
    • You need to filter data based on values that are themselves derived from another query.
    • You need to compute aggregates or complex conditions that are then used in the main query.
    • You need to check for existence of related records without retrieving all their details.
    • You are performing an INSERT, UPDATE, or DELETE based on a condition determined by another query.
  • Be cautious when:
    • An equivalent JOIN can achieve the same result. JOINs are often more efficient and readable, especially for large datasets. Subqueries can sometimes lead to performance issues if not used carefully.

SQL Syntax & Examples

Let’s use our scriptbuzz_db. Remember to run USE scriptbuzz_db; first.

1. Subqueries in the WHERE Clause (Filtering)

Example 1: Find users who have placed orders Instead of a JOIN, we can use IN with a subquery.

SQL
SELECT user_id, username, email
FROM users
WHERE user_id IN (SELECT user_id FROM orders);

What it does:

  1. Inner Query: SELECT user_id FROM orders returns a list of all user_ids that exist in the orders table (e.g., (1, 2, 1, 1, 2) which becomes (1, 2) distinct values).
  2. Outer Query: SELECT ... FROM users WHERE user_id IN (1, 2) retrieves users whose user_id is present in that list.

Expected Output (example):

user_idusernameemail
1john_doejohn.new.email@example.com
2admin_usernew.admin@example.com

Equivalent JOIN (often preferred for performance/readability):

SQL
SELECT DISTINCT u.user_id, u.username, u.email
FROM users AS u
INNER JOIN orders AS o ON u.user_id = o.user_id;

Example 2: Find products that have a price higher than the average product price

SQL
SELECT product_id, name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products);

What it does:

  1. Inner Query: SELECT AVG(price) FROM products calculates the average price of all products (e.g., 264.212857).
  2. Outer Query: SELECT ... FROM products WHERE price > 264.212857 filters products whose price is greater than that average.

Expected Output (example):

product_idnameprice
1Laptop Pro1320.00
7Gaming Monitor385.00

Example 3: Find users who have NOT placed any orders (using NOT IN)

SQL
SELECT user_id, username, email
FROM users
WHERE user_id NOT IN (SELECT user_id FROM orders);

What it does: Returns users whose user_id is not found in the list of user_ids from the orders table.

Expected Output (example, assuming user 3 ‘jane_doe’ has no orders):

user_idusernameemail
3jane_doejane.doe@example.com

Example 4: Using EXISTS to find users with orders EXISTS is efficient because it stops scanning as soon as it finds one match in the subquery.

SQL
SELECT u.user_id, u.username
FROM users AS u
WHERE EXISTS (SELECT 1 FROM orders AS o WHERE o.user_id = u.user_id);

What it does: For each user in the outer query, the inner query checks if any record exists in orders where o.user_id matches u.user_id. If EXISTS returns true (even if multiple matches exist), the outer query includes that user. It’s usually more efficient than IN when the subquery is correlated (references a column from the outer query).

2. Subqueries in the FROM Clause (Derived Tables)

When a subquery is used in the FROM clause, its result set is treated as a temporary, inline table. It must be given an alias.

Example 5: Find the average number of products per active category

SQL
SELECT AVG(products_in_category) AS avg_products_per_active_category
FROM (
    SELECT category, COUNT(product_id) AS products_in_category
    FROM products
    WHERE is_active = 1
    GROUP BY category
) AS category_counts;

What it does:

Inner Query (Derived Table): (SELECT category, COUNT(product_id) AS products_in_category FROM products WHERE is_active = 1 GROUP BY category) first calculates the count of active products for each category. This temporary result set might look like:

categoryproducts_in_category
Accessories1
Electronics3
NULL3

Outer Query: SELECT AVG(products_in_category) FROM category_counts then calculates the average of the products_in_category column from this temporary table.

Expected Output (example):

avg_products_per_active_category
2.3333

3. Subqueries in the SELECT Clause (Scalar Subqueries)

A scalar subquery returns a single value (one row, one column). If it returns more than one value, an error occurs.

Example 6: List each user and their total order amount (or 0 if no orders)

SQL
SELECT
    u.user_id,
    u.username,
    (SELECT COALESCE(SUM(o.total_amount), 0)
     FROM orders AS o
     WHERE o.user_id = u.user_id
    ) AS total_order_amount
FROM users AS u;

What it does: For each user retrieved by the outer query, the inner subquery executes. It sums the total_amount for orders belonging to that specific user. If the user has no orders, SUM() returns NULL, and COALESCE converts it to 0.

Expected Output (example):

user_idusernametotal_order_amount
1john_doe380.75
2admin_user150.25
3jane_doe0.00

Equivalent LEFT JOIN (often preferred for performance):

SQL
SELECT
    u.user_id,
    u.username,
    COALESCE(SUM(o.total_amount), 0) AS total_order_amount
FROM users AS u
LEFT JOIN orders AS o ON u.user_id = o.user_id
GROUP BY u.user_id, u.username;

Common Pitfalls and Best Practices

  • Performance: Subqueries can sometimes be less efficient than equivalent JOINs, especially correlated subqueries (where the inner query depends on the outer query, like in EXISTS or scalar subqueries). The inner query might be re-executed for every row of the outer query. Databases often optimize JOINs better.
  • Readability: For complex logic, subqueries can sometimes be more readable by breaking down the problem. However, for simple filtering or direct relationships, JOINs are usually clearer.
  • Single Value Requirement: Scalar subqueries (SELECT clause subqueries) must return only one row and one column. If they return multiple, it’s an error.
  • Naming Derived Tables: Always alias derived tables (subqueries in the FROM clause).
  • When to choose?
    • JOIN when you’re primarily combining columns from multiple tables.
    • Subquery in WHERE when you’re filtering based on a condition derived from another query’s result set.
    • Subquery in FROM when you need to perform aggregations or complex filtering before joining or applying further logic to the main query.
    • Subquery in SELECT when you need a single, aggregated value for each row of the main query.

Notes:

  • A subquery (or nested query) is a SELECT statement embedded within another SQL statement.
  • Subqueries execute first, and their results are used by the outer query.
  • Common uses:
    • WHERE clause: For filtering rows (IN, NOT IN, =, >, <>, EXISTS, NOT EXISTS).
    • FROM clause (Derived Tables): Treats the subquery’s result as a temporary table. Must be aliased.
    • SELECT clause (Scalar Subqueries): Returns a single value per row of the outer query.
  • While powerful, often an equivalent JOIN can be more efficient and readable. Choose wisely based on the complexity and desired outcome.

Small Exercise

Using your scriptbuzz_db:

  1. Find the username and email of users who have not purchased ‘Laptop Pro’. (Hint: You’ll need products, order_items, orders, and users. Think about subqueries with NOT IN on user_id after finding users who did buy ‘Laptop Pro’).
  2. List the name and price of products that are more expensive than any product in the ‘Accessories’ category. (Hint: Use MAX() in a subquery and WHERE clause).
  3. Calculate the total number of orders placed by the user with the most orders. (Hint: This will involve a subquery in the WHERE clause with COUNT() and MAX(), or a derived table approach).

You’ve now expanded your SQL arsenal with subqueries! This gives you more tools to solve complex data retrieval problems. Next, we’ll dive into another essential aspect of database design and performance: managing table structures.

🚀 Explore Popular Learning Tracks