Chapter 14: Joining Tables (INNER JOIN)

So far, all your queries have focused on a single table. However, in a relational database, data is often distributed across multiple tables to avoid redundancy and maintain data integrity (a concept called normalization). For instance, user information is in the users table, and product details are in the products table. But what if you want to see which user placed a particular order, or which products are associated with a specific category?

You can’t get all this information from a single table. This is where JOINs come in. JOINs are used to combine rows from two or more tables based on a related column between them. This chapter will introduce you to the most common type of join: the INNER JOIN.

Why Join Tables?

  • Normalization: Storing data efficiently by splitting it into related tables.
  • Completeness: Retrieving a complete picture of information that spans across multiple related entities.
  • Efficiency: Avoiding redundant data entry and storage.

The Concept of a Relational Database

Recall that relational databases organize data into tables, and these tables are related to each other through common columns.

  • Primary Key (PK): A column (or set of columns) that uniquely identifies each row in a table. (e.g., user_id in users).
  • Foreign Key (FK): A column (or set of columns) in one table that refers to the Primary Key in another table. It establishes a link between the two tables. (e.g., if orders has a user_id, that user_id is a Foreign Key referencing the user_id Primary Key in the users table).

When you join tables, you’re telling the database how to use these related columns to match rows from one table with rows from another.

What is an INNER JOIN?

An INNER JOIN (often just called JOIN) returns only the rows that have matching values in both tables based on the join condition. If a row in one table doesn’t have a match in the other table, it is excluded from the result set.

Think of it as: “Show me the records where there’s a match in both Table A AND Table B.”

SQL Syntax & Examples

Let’s use our scriptbuzz_db. First, let’s ensure our users and products tables are set up with potential related data (we’ll primarily use the orders table which you created as an exercise for a clear FK example).

To set up an orders table that correctly links to users, let’s make sure it has a user_id column. If you haven’t, or if you want to ensure the foreign key constraint is there (which is good practice, but not strictly necessary for the JOIN syntax itself, but is for data integrity), you might re-create it:

SQL
USE scriptbuzz_db;

-- Drop orders table if it exists to add FK constraint cleanly
DROP TABLE IF EXISTS orders;

CREATE TABLE orders (
    order_id INT AUTO_INCREMENT PRIMARY KEY,
    user_id INT NOT NULL, -- This will be our Foreign Key
    order_date DATETIME DEFAULT CURRENT_TIMESTAMP,
    total_amount DECIMAL(10,2) NOT NULL,
    status VARCHAR(20) DEFAULT 'pending',
    -- Define Foreign Key constraint
    FOREIGN KEY (user_id) REFERENCES users(user_id)
);

-- Insert some sample orders that link to existing users
INSERT INTO orders (user_id, total_amount, status) VALUES
(1, 150.75, 'completed'),
(1, 200.00, 'pending'),
(2, 50.25, 'shipped'),
(1, 30.00, 'completed'),
(2, 100.00, 'processing');

-- Let's also ensure user 3 (jane_doe) doesn't have an order for now
-- so we can see what happens with non-matches in other join types later.

Now, let’s dive into INNER JOIN examples.

Syntax for INNER JOIN

SQL
SELECT columns
FROM table1
INNER JOIN table2
ON table1.common_column = table2.common_column;

  • SELECT columns: Specify the columns you want from either table. Use table_name.column_name to avoid ambiguity if columns have the same name (e.g., id).
  • FROM table1: The first table.
  • INNER JOIN table2: The second table you want to join with.
  • ON table1.common_column = table2.common_column: The join condition. This specifies which columns in each table should be matched. Usually, this is the Foreign Key in one table matching the Primary Key in the other.

Example 1: Get orders along with the username of the customer who placed them

We need data from orders and users. The common column is user_id.

SQL
SELECT
    orders.order_id,
    users.username,
    orders.total_amount,
    orders.status,
    orders.order_date
FROM orders
INNER JOIN users
ON orders.user_id = users.user_id;

Expected Output (example):

order_idusernametotal_amountstatusorder_date
1john_doe150.75completed2025-06-29 21:40:00
2john_doe200.00pending2025-06-29 21:40:00
3admin_user50.25shipped2025-06-29 21:40:00
4john_doe30.00completed2025-06-29 21:40:00
5admin_user100.00processing2025-06-29 21:40:00

What it does: It combines rows from orders with rows from users wherever the user_id in orders matches a user_id in users. Notice that ‘jane_doe’ (user_id 3) is not in the result because she has no corresponding orders, and INNER JOIN only returns matches from both tables.

Typing table_name.column_name can become cumbersome, especially with long table names or many joins. You can use table aliases to shorten names within your query.

Syntax:

SQL
SELECT t1.column1, t2.column2
FROM table1 AS t1
INNER JOIN table2 AS t2
ON t1.common_column = t2.common_column;

AS is optional for table aliases, so FROM table1 t1 is common.

Example 2: Same query with table aliases

SQL
SELECT
    o.order_id,
    u.username,
    o.total_amount,
    o.status,
    o.order_date
FROM orders AS o
INNER JOIN users AS u
ON o.user_id = u.user_id;

What it does: This query is identical to Example 1 in its result, but it’s much more concise and readable due to the o and u aliases.

3. Joining Multiple Tables

You can join more than two tables by chaining INNER JOIN clauses.

Scenario: Let’s say we had an order_items table that links orders to products. (Create this table if you want to follow along exactly):

SQL
-- Drop order_items table if it exists
DROP TABLE IF EXISTS order_items;

CREATE TABLE order_items (
    item_id INT AUTO_INCREMENT PRIMARY KEY,
    order_id INT NOT NULL,
    product_id INT NOT NULL,
    quantity INT NOT NULL,
    price_per_item DECIMAL(10,2) NOT NULL,
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

-- Insert some sample order items
-- Assuming product_id 1 is 'Laptop Pro', 2 is 'Wireless Mouse', 3 is 'Mechanical Keyboard'
INSERT INTO order_items (order_id, product_id, quantity, price_per_item) VALUES
(1, 1, 1, 1200.00), -- Order 1: 1 Laptop Pro
(1, 2, 2, 25.50),   -- Order 1: 2 Wireless Mouse
(2, 3, 1, 99.99),   -- Order 2: 1 Mechanical Keyboard
(3, 2, 1, 25.50),   -- Order 3: 1 Wireless Mouse
(4, 1, 1, 1200.00), -- Order 4: 1 Laptop Pro (Oops, total_amount for order 4 was 30.00, this is just for example)
(5, 3, 1, 99.99);   -- Order 5: 1 Mechanical Keyboard

Example 3: Get order details, customer username, and product names for each item in an order

SQL
SELECT
    o.order_id,
    u.username,
    p.name AS product_name,
    oi.quantity,
    oi.price_per_item
FROM orders AS o
INNER JOIN users AS u
    ON o.user_id = u.user_id
INNER JOIN order_items AS oi
    ON o.order_id = oi.order_id
INNER JOIN products AS p
    ON oi.product_id = p.product_id
ORDER BY o.order_id, p.name;

Expected Output (example):

order_idusernameproduct_namequantityprice_per_item
1john_doeLaptop Pro11200.00
1john_doeWireless Mouse225.50
2john_doeMechanical Keyboard199.99
3admin_userWireless Mouse125.50
4john_doeLaptop Pro11200.00
5admin_userMechanical Keyboard199.99

What it does: This query performs three INNER JOIN operations to link orders with users, then with order_items, and finally order_items with products to retrieve all the interconnected information.

4. Combining JOIN with WHERE, GROUP BY, ORDER BY, LIMIT

All the clauses you’ve learned so far can be combined with JOINs. The order remains important: FROM/JOIN -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY -> LIMIT.

Example 4: Get usernames and total amount of their completed orders, only for users with more than one completed order, ordered by total amount (descending)

SQL
SELECT
    u.username,
    SUM(o.total_amount) AS total_completed_orders_amount,
    COUNT(o.order_id) AS number_of_completed_orders
FROM users AS u
INNER JOIN orders AS o
    ON u.user_id = o.user_id
WHERE o.status = 'completed'
GROUP BY u.username
HAVING COUNT(o.order_id) > 1
ORDER BY total_completed_orders_amount DESC;

Expected Output (example – based on current data, only John Doe has multiple ‘completed’ orders):

usernametotal_completed_orders_amountnumber_of_completed_orders
john_doe180.752

What it does:

  1. JOINs: Links users and orders based on user_id.
  2. WHERE: Filters the joined rows to only include orders with status = 'completed'.
  3. GROUP BY: Groups the remaining rows by username.
  4. HAVING: Filters these groups to include only users who have COUNT(o.order_id) greater than 1 completed orders.
  5. SELECT: Selects the username and calculates the SUM of total_amount and COUNT of order_id for each group.
  6. ORDER BY: Sorts the final result set by total_completed_orders_amount in descending order.

Common Pitfalls and Best Practices

  • Ambiguous Column Names: When joining tables, if two tables have columns with the same name (e.g., id in both users and products), you must qualify them with the table name or alias (e.g., users.id or u.id). It’s good practice to always qualify, even if not strictly necessary, for clarity.
  • Join Condition: Ensure your ON clause correctly specifies the relationship between the tables (Primary Key = Foreign Key). An incorrect join condition will lead to wrong results or a “Cartesian product” (every row from table 1 joined with every row from table 2, which is usually undesirable and massive).
  • Performance: Joining many large tables can be performance-intensive. Indexes on the join columns (especially primary and foreign keys) are crucial for fast joins.
  • Table Aliases: Always use table aliases (e.g., u for users, o for orders) for readability and conciseness, especially when joining multiple tables.
  • Understanding INNER JOIN Behavior: Remember, INNER JOIN only returns matching rows. If a user_id in orders doesn’t exist in users, that order will not appear in the result set. If a user_id exists in users but has no corresponding orders, that user will not appear either.

Notes:

  • JOINs combine rows from two or more tables based on a related column.
  • INNER JOIN (or just JOIN) returns only rows that have matching values in both tables.
  • The ON clause specifies the join condition, typically linking a Foreign Key to a Primary Key.
  • Use table aliases (e.g., SELECT u.username FROM users AS u JOIN orders AS o ON u.user_id = o.user_id;) for clearer and more concise queries, especially with multiple joins.
  • You can chain multiple INNER JOIN clauses to combine data from many tables.
  • All other SQL clauses (WHERE, GROUP BY, HAVING, ORDER BY, LIMIT) can be used with JOIN statements, following their specific order of execution.

Small Exercise

Using your scriptbuzz_db (assuming orders and order_items tables exist and have data):

  1. List all products that have been part of any order, along with the product_id and name from the products table. (Hint: Join products and order_items).
  2. Retrieve the order_id, the username of the customer, and the name of the product for all items in “pending” orders. (Hint: You’ll need to join orders, users, order_items, and products, and use a WHERE clause).
  3. Count how many different products each user has ever ordered. Display the username and the total_distinct_products_ordered. (Hint: Join users, orders, order_items. You’ll need GROUP BY and COUNT(DISTINCT column_name)).

You’ve now taken a monumental leap in your SQL journey! Understanding joins is fundamental to working with relational databases effectively. Next, we’ll explore other types of joins that handle non-matching data.

🚀 Explore Popular Learning Tracks