Chapter 14: Joining Tables (INNER JOIN)
So far, all your queries have focused on a single table. However, in a relational database, data is often distributed across multiple tables to avoid redundancy and maintain data integrity (a concept called normalization). For instance, user information is in the users
table, and product details are in the products
table. But what if you want to see which user placed a particular order, or which products are associated with a specific category?
You can’t get all this information from a single table. This is where JOINs come in. JOINs are used to combine rows from two or more tables based on a related column between them. This chapter will introduce you to the most common type of join: the INNER JOIN.
Why Join Tables?
- Normalization: Storing data efficiently by splitting it into related tables.
- Completeness: Retrieving a complete picture of information that spans across multiple related entities.
- Efficiency: Avoiding redundant data entry and storage.
The Concept of a Relational Database
Recall that relational databases organize data into tables, and these tables are related to each other through common columns.
- Primary Key (PK): A column (or set of columns) that uniquely identifies each row in a table. (e.g.,
user_id
inusers
). - Foreign Key (FK): A column (or set of columns) in one table that refers to the Primary Key in another table. It establishes a link between the two tables. (e.g., if
orders
has auser_id
, thatuser_id
is a Foreign Key referencing theuser_id
Primary Key in theusers
table).
When you join tables, you’re telling the database how to use these related columns to match rows from one table with rows from another.
What is an INNER JOIN
?
An INNER JOIN
(often just called JOIN
) returns only the rows that have matching values in both tables based on the join condition. If a row in one table doesn’t have a match in the other table, it is excluded from the result set.
Think of it as: “Show me the records where there’s a match in both Table A AND Table B.”
SQL Syntax & Examples
Let’s use our scriptbuzz_db
. First, let’s ensure our users
and products
tables are set up with potential related data (we’ll primarily use the orders
table which you created as an exercise for a clear FK example).
To set up an orders
table that correctly links to users
, let’s make sure it has a user_id
column. If you haven’t, or if you want to ensure the foreign key constraint is there (which is good practice, but not strictly necessary for the JOIN
syntax itself, but is for data integrity), you might re-create it:
USE scriptbuzz_db;
-- Drop orders table if it exists to add FK constraint cleanly
DROP TABLE IF EXISTS orders;
CREATE TABLE orders (
order_id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT NOT NULL, -- This will be our Foreign Key
order_date DATETIME DEFAULT CURRENT_TIMESTAMP,
total_amount DECIMAL(10,2) NOT NULL,
status VARCHAR(20) DEFAULT 'pending',
-- Define Foreign Key constraint
FOREIGN KEY (user_id) REFERENCES users(user_id)
);
-- Insert some sample orders that link to existing users
INSERT INTO orders (user_id, total_amount, status) VALUES
(1, 150.75, 'completed'),
(1, 200.00, 'pending'),
(2, 50.25, 'shipped'),
(1, 30.00, 'completed'),
(2, 100.00, 'processing');
-- Let's also ensure user 3 (jane_doe) doesn't have an order for now
-- so we can see what happens with non-matches in other join types later.
Now, let’s dive into INNER JOIN
examples.
Syntax for INNER JOIN
SELECT columns
FROM table1
INNER JOIN table2
ON table1.common_column = table2.common_column;
SELECT columns
: Specify the columns you want from either table. Usetable_name.column_name
to avoid ambiguity if columns have the same name (e.g.,id
).FROM table1
: The first table.INNER JOIN table2
: The second table you want to join with.ON table1.common_column = table2.common_column
: The join condition. This specifies which columns in each table should be matched. Usually, this is the Foreign Key in one table matching the Primary Key in the other.
Example 1: Get orders along with the username of the customer who placed them
We need data from orders
and users
. The common column is user_id
.
SELECT
orders.order_id,
users.username,
orders.total_amount,
orders.status,
orders.order_date
FROM orders
INNER JOIN users
ON orders.user_id = users.user_id;
Expected Output (example):
order_id | username | total_amount | status | order_date |
1 | john_doe | 150.75 | completed | 2025-06-29 21:40:00 |
2 | john_doe | 200.00 | pending | 2025-06-29 21:40:00 |
3 | admin_user | 50.25 | shipped | 2025-06-29 21:40:00 |
4 | john_doe | 30.00 | completed | 2025-06-29 21:40:00 |
5 | admin_user | 100.00 | processing | 2025-06-29 21:40:00 |
What it does: It combines rows from orders
with rows from users
wherever the user_id
in orders
matches a user_id
in users
. Notice that ‘jane_doe’ (user_id 3) is not in the result because she has no corresponding orders, and INNER JOIN
only returns matches from both tables.
2. Using Table Aliases (Highly Recommended!)
Typing table_name.column_name
can become cumbersome, especially with long table names or many joins. You can use table aliases to shorten names within your query.
Syntax:
SELECT t1.column1, t2.column2
FROM table1 AS t1
INNER JOIN table2 AS t2
ON t1.common_column = t2.common_column;
AS
is optional for table aliases, so FROM table1 t1
is common.
Example 2: Same query with table aliases
SELECT
o.order_id,
u.username,
o.total_amount,
o.status,
o.order_date
FROM orders AS o
INNER JOIN users AS u
ON o.user_id = u.user_id;
What it does: This query is identical to Example 1 in its result, but it’s much more concise and readable due to the o
and u
aliases.
3. Joining Multiple Tables
You can join more than two tables by chaining INNER JOIN
clauses.
Scenario: Let’s say we had an order_items
table that links orders
to products
. (Create this table if you want to follow along exactly):
-- Drop order_items table if it exists
DROP TABLE IF EXISTS order_items;
CREATE TABLE order_items (
item_id INT AUTO_INCREMENT PRIMARY KEY,
order_id INT NOT NULL,
product_id INT NOT NULL,
quantity INT NOT NULL,
price_per_item DECIMAL(10,2) NOT NULL,
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
-- Insert some sample order items
-- Assuming product_id 1 is 'Laptop Pro', 2 is 'Wireless Mouse', 3 is 'Mechanical Keyboard'
INSERT INTO order_items (order_id, product_id, quantity, price_per_item) VALUES
(1, 1, 1, 1200.00), -- Order 1: 1 Laptop Pro
(1, 2, 2, 25.50), -- Order 1: 2 Wireless Mouse
(2, 3, 1, 99.99), -- Order 2: 1 Mechanical Keyboard
(3, 2, 1, 25.50), -- Order 3: 1 Wireless Mouse
(4, 1, 1, 1200.00), -- Order 4: 1 Laptop Pro (Oops, total_amount for order 4 was 30.00, this is just for example)
(5, 3, 1, 99.99); -- Order 5: 1 Mechanical Keyboard
Example 3: Get order details, customer username, and product names for each item in an order
SELECT
o.order_id,
u.username,
p.name AS product_name,
oi.quantity,
oi.price_per_item
FROM orders AS o
INNER JOIN users AS u
ON o.user_id = u.user_id
INNER JOIN order_items AS oi
ON o.order_id = oi.order_id
INNER JOIN products AS p
ON oi.product_id = p.product_id
ORDER BY o.order_id, p.name;
Expected Output (example):
order_id | username | product_name | quantity | price_per_item |
1 | john_doe | Laptop Pro | 1 | 1200.00 |
1 | john_doe | Wireless Mouse | 2 | 25.50 |
2 | john_doe | Mechanical Keyboard | 1 | 99.99 |
3 | admin_user | Wireless Mouse | 1 | 25.50 |
4 | john_doe | Laptop Pro | 1 | 1200.00 |
5 | admin_user | Mechanical Keyboard | 1 | 99.99 |
What it does: This query performs three INNER JOIN
operations to link orders
with users
, then with order_items
, and finally order_items
with products
to retrieve all the interconnected information.
4. Combining JOIN
with WHERE
, GROUP BY
, ORDER BY
, LIMIT
All the clauses you’ve learned so far can be combined with JOINs. The order remains important: FROM
/JOIN
-> WHERE
-> GROUP BY
-> HAVING
-> SELECT
-> ORDER BY
-> LIMIT
.
Example 4: Get usernames and total amount of their completed orders, only for users with more than one completed order, ordered by total amount (descending)
SELECT
u.username,
SUM(o.total_amount) AS total_completed_orders_amount,
COUNT(o.order_id) AS number_of_completed_orders
FROM users AS u
INNER JOIN orders AS o
ON u.user_id = o.user_id
WHERE o.status = 'completed'
GROUP BY u.username
HAVING COUNT(o.order_id) > 1
ORDER BY total_completed_orders_amount DESC;
Expected Output (example – based on current data, only John Doe has multiple ‘completed’ orders):
username | total_completed_orders_amount | number_of_completed_orders |
john_doe | 180.75 | 2 |
What it does:
- JOINs: Links
users
andorders
based onuser_id
. - WHERE: Filters the joined rows to only include orders with
status = 'completed'
. - GROUP BY: Groups the remaining rows by
username
. - HAVING: Filters these groups to include only users who have
COUNT(o.order_id)
greater than 1 completed orders. - SELECT: Selects the
username
and calculates theSUM
oftotal_amount
andCOUNT
oforder_id
for each group. - ORDER BY: Sorts the final result set by
total_completed_orders_amount
in descending order.
Common Pitfalls and Best Practices
- Ambiguous Column Names: When joining tables, if two tables have columns with the same name (e.g.,
id
in bothusers
andproducts
), you must qualify them with the table name or alias (e.g.,users.id
oru.id
). It’s good practice to always qualify, even if not strictly necessary, for clarity. - Join Condition: Ensure your
ON
clause correctly specifies the relationship between the tables (Primary Key = Foreign Key). An incorrect join condition will lead to wrong results or a “Cartesian product” (every row from table 1 joined with every row from table 2, which is usually undesirable and massive). - Performance: Joining many large tables can be performance-intensive. Indexes on the join columns (especially primary and foreign keys) are crucial for fast joins.
- Table Aliases: Always use table aliases (e.g.,
u
forusers
,o
fororders
) for readability and conciseness, especially when joining multiple tables. - Understanding
INNER JOIN
Behavior: Remember,INNER JOIN
only returns matching rows. If auser_id
inorders
doesn’t exist inusers
, that order will not appear in the result set. If auser_id
exists inusers
but has no corresponding orders, that user will not appear either.
Notes:
- JOINs combine rows from two or more tables based on a related column.
INNER JOIN
(or justJOIN
) returns only rows that have matching values in both tables.- The
ON
clause specifies the join condition, typically linking a Foreign Key to a Primary Key. - Use table aliases (e.g.,
SELECT u.username FROM users AS u JOIN orders AS o ON u.user_id = o.user_id;
) for clearer and more concise queries, especially with multiple joins. - You can chain multiple
INNER JOIN
clauses to combine data from many tables. - All other SQL clauses (
WHERE
,GROUP BY
,HAVING
,ORDER BY
,LIMIT
) can be used withJOIN
statements, following their specific order of execution.
Small Exercise
Using your scriptbuzz_db
(assuming orders
and order_items
tables exist and have data):
- List all products that have been part of any order, along with the
product_id
andname
from theproducts
table. (Hint: Joinproducts
andorder_items
). - Retrieve the
order_id
, theusername
of the customer, and thename
of the product for all items in “pending” orders. (Hint: You’ll need to joinorders
,users
,order_items
, andproducts
, and use aWHERE
clause). - Count how many different products each user has ever ordered. Display the
username
and thetotal_distinct_products_ordered
. (Hint: Joinusers
,orders
,order_items
. You’ll needGROUP BY
andCOUNT(DISTINCT column_name)
).
You’ve now taken a monumental leap in your SQL journey! Understanding joins is fundamental to working with relational databases effectively. Next, we’ll explore other types of joins that handle non-matching data.