Chapter 18: Understanding and Using Indexes
Imagine you have a physical dictionary (or phone book) with millions of words (or names). If you had to find a specific word, and the dictionary wasn’t sorted or didn’t have an alphabetical index, you’d have to read every single page, word by word, until you found it. This is a full table scan, and it’s very slow.
Now, imagine using a real dictionary. You quickly flip to the section starting with ‘P’, then ‘Pe’, then ‘Per’, etc., until you find your word. This is how a database index works.
An index is a special lookup table that the database search engine can use to speed up data retrieval. It’s like an index at the back of a book, pointing to where specific topics or names can be found.
What is an Index?
- An index is a data structure (most commonly a B-tree) that stores the values of one or more columns from a table, along with pointers to the physical location of the corresponding rows in the table.
- They are created on columns that are frequently used in
WHERE
clauses,JOIN
conditions,ORDER BY
clauses, andGROUP BY
clauses. - Indexes are designed to speed up
SELECT
queries.
Why Use Indexes? (Pros)
- Faster Data Retrieval (
SELECT
): This is the primary benefit. Queries involving indexed columns inWHERE
,JOIN
,ORDER BY
,GROUP BY
clauses run significantly faster. - Faster Sorting (
ORDER BY
): If you often sort by an indexed column, the database can use the index’s pre-sorted order, avoiding an expensive sort operation. - Faster Aggregations (
GROUP BY
): Similar to sorting, grouping by an indexed column can speed up the process. - Enforce Uniqueness:
PRIMARY KEY
andUNIQUE
constraints automatically create an index (unique index) on the specified columns, ensuring data uniqueness.
Why Not Index Everything? (Cons)
- Increased Storage Space: Indexes consume disk space.
- Slower Data Modification (
INSERT
,UPDATE
,DELETE
): When you modify data in an indexed column, the database not only has to update the table data but also the index data structure. This adds overhead and can slow down write operations. - Maintenance Overhead: Indexes need to be maintained by the database.
- Too Many Indexes: Over-indexing can actually hurt performance, as the overhead of maintaining many indexes can outweigh the benefits for reads.
Types of Indexes
- Clustered Index: (Usually your
PRIMARY KEY
). This type of index sorts and stores the actual data rows in the table based on the index’s key values. A table can only have one clustered index because the data rows themselves can only be stored in one physical order. It determines the physical storage order of the data. - Non-Clustered Index: A separate data structure that contains the indexed column’s value and a pointer to the actual data row. A table can have many non-clustered indexes.
(Note: MySQL’s InnoDB storage engine implements primary keys as clustered indexes by default. Other engines or databases might handle this differently.)
SQL Syntax & Examples (MySQL/MariaDB)
Let’s use our scriptbuzz_db
. Remember to run USE scriptbuzz_db;
first.
1. Creating an Index (CREATE INDEX
)
Syntax:
CREATE [UNIQUE] INDEX index_name
ON table_name (column1, column2, ...);
UNIQUE
: (Optional) Creates a unique index, ensuring all values in the indexed column(s) are unique. (This is automatically applied forPRIMARY KEY
andUNIQUE
constraints).index_name
: A descriptive name for your index (e.g.,idx_users_username
).table_name
: The table on which to create the index.(column1, column2, ...)
: The column(s) to include in the index. For multiple columns, the order matters for query optimization.
Example 1: Create an index on username
in the users
table
This will speed up queries like SELECT * FROM users WHERE username = 'john_doe';
CREATE INDEX idx_users_username ON users (username);
What it does: Creates a non-unique index named idx_users_username
on the username
column of the users
table.
Example 2: Create a unique index on email
in the users
table
This will also enforce uniqueness on the email
column, similar to ADD CONSTRAINT UNIQUE
.
-- First, ensure no duplicate emails exist to avoid error
-- You might have added a UNIQUE constraint in the previous chapter, which already created an index.
-- If so, you'd drop that constraint first, or skip this example if it already exists.
-- ALTER TABLE users DROP CONSTRAINT UQ_email; -- If you named it UQ_email
CREATE UNIQUE INDEX idx_users_email_unique ON users (email);
What it does: Creates a unique index named idx_users_email_unique
on the email
column. Any attempt to insert a duplicate email will fail.
Example 3: Create a composite (multi-column) index
If you frequently query WHERE category = 'Electronics' AND price > 100
, a composite index on (category, price)
can be beneficial. The order of columns in a composite index is important. For WHERE category = X AND price = Y
, (category, price)
is good. For WHERE category = X
, (category, price)
is also good (can use the leftmost part of the index). For WHERE price = Y
only, it’s not as effective.
CREATE INDEX idx_products_category_price ON products (category, price);
What it does:
Creates an index that sorts data first by category
, then by price
within each category. This is useful for queries filtering or sorting by both.
2. Listing Indexes
To see existing indexes on a table:
Syntax:
SHOW INDEX FROM table_name;
-- OR
SHOW INDEXES FROM table_name;
Example 4: Show indexes on the users
table
SHOW INDEX FROM users;
Expected Output (example): You’ll see indexes for PRIMARY
(on user_id
), and any other indexes you’ve created (idx_users_username
, idx_users_email_unique
).
Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible |
users | 0 | PRIMARY | 1 | user_id | A | 3 | NULL | NULL | BTREE | YES | |||
users | 1 | idx_users_username | 1 | username | A | 3 | NULL | NULL | YES | BTREE | YES | ||
users | 0 | idx_users_email_unique | 1 | A | 3 | NULL | BTREE | YES |
Non_unique
: 0 for unique index, 1 for non-unique.Key_name
: The name of the index.Column_name
: The column(s) in the index.Index_type
: UsuallyBTREE
(B-tree), a common data structure for indexes.
3. Dropping an Index (DROP INDEX
)
Syntax:
DROP INDEX index_name ON table_name;
Example 5: Drop the idx_users_username
index
DROP INDEX idx_users_username ON users;
What it does: Removes the specified index. This frees up disk space and reduces write overhead but might slow down queries that previously benefited from this index.
Verification:
SHOW INDEX FROM users;
When to Create Indexes
- Primary Keys & Unique Constraints: Automatically indexed.
- Foreign Keys: Should almost always be indexed on the child table (the FK column) to speed up
JOIN
operations and ensure referential integrity checks are fast. - Columns in
WHERE
clauses: If you frequently search or filter by a column. - Columns in
ORDER BY
andGROUP BY
clauses: Can help avoid expensive sort operations. - Columns in
JOIN
conditions: Speeds up the matching process.
Tools for Index Analysis
EXPLAIN
statement: This is a crucial SQL command to analyze query performance. You can prefix anySELECT
statement withEXPLAIN
to see how MySQL will execute the query, including which indexes it uses (or doesn’t use) and why.
EXPLAIN SELECT * FROM users WHERE username = 'john_doe';
- Looking at the
key
androws
columns in theEXPLAIN
output is particularly insightful. - Monitoring Tools: Database monitoring tools can help identify slow queries that would benefit from indexing.
Common Pitfalls and Best Practices
- Over-indexing: Don’t index every column. The performance benefits for reads can be outweighed by the overhead for writes and increased storage.
- Index on appropriate data types: Indexes are most effective on columns with high cardinality (many distinct values, e.g.,
email
,user_id
). They are less effective on columns with low cardinality (few distinct values, e.g.,is_active
(0 or 1),gender
). - Composite Index Order: For multi-column indexes, put the most commonly queried/filtered column first.
(category, price)
is good forWHERE category = X
andWHERE category = X AND price = Y
. It’s generally not useful forWHERE price = Y
alone. LIKE
with leading wildcard (%abc
): Indexes are generally not used forLIKE
queries that start with a wildcard (e.g.,WHERE name LIKE '%abc'
). They can be used forWHERE name LIKE 'abc%'
.- Indexes are not magic: They don’t solve all performance problems. Badly written queries, inefficient data models, or hardware limitations can still lead to slow performance.
Notes:
- Indexes are special lookup structures that speed up data retrieval (
SELECT
queries). - They are like the index of a book, allowing the database to quickly find specific rows.
- Pros: Faster
SELECT
,ORDER BY
,GROUP BY
, enforce uniqueness. - Cons: Consume storage, slow down
INSERT
,UPDATE
,DELETE
operations. CREATE INDEX
: To add an index.SHOW INDEX FROM
: To view indexes on a table.DROP INDEX
: To remove an index.EXPLAIN
: A crucial tool to analyze query execution and verify index usage.- Index columns frequently used in
WHERE
,JOIN
,ORDER BY
,GROUP BY
clauses, and foreign keys. - Avoid over-indexing and understand the impact of data cardinality and
LIKE
operators.
Small Exercise
Using your scriptbuzz_db
:
- Create an index on the
order_date
column in theorders
table. Why might this be useful? - Create a composite index on
product_id
andorder_id
in theorder_items
table. Why might this be useful? - Use
EXPLAIN
on a simpleSELECT
query that filters byorder_date
(e.g.,SELECT * FROM orders WHERE order_date > '2025-06-01';
). Observe if your new index is used. - Drop the composite index you created in step 2.
Congratulations! Understanding and effectively using indexes is a mark of a capable database developer. You’re now equipped to not only retrieve data but to make your data retrieval lightning fast. Next, we’ll dive into Transactions, a critical concept for data integrity.