💡 Ask Tutor

Chapter 18: Understanding and Using Indexes

Imagine you have a physical dictionary (or phone book) with millions of words (or names). If you had to find a specific word, and the dictionary wasn’t sorted or didn’t have an alphabetical index, you’d have to read every single page, word by word, until you found it. This is a full table scan, and it’s very slow.

Now, imagine using a real dictionary. You quickly flip to the section starting with ‘P’, then ‘Pe’, then ‘Per’, etc., until you find your word. This is how a database index works.

An index is a special lookup table that the database search engine can use to speed up data retrieval. It’s like an index at the back of a book, pointing to where specific topics or names can be found.

What is an Index?

  • An index is a data structure (most commonly a B-tree) that stores the values of one or more columns from a table, along with pointers to the physical location of the corresponding rows in the table.
  • They are created on columns that are frequently used in WHERE clauses, JOIN conditions, ORDER BY clauses, and GROUP BY clauses.
  • Indexes are designed to speed up SELECT queries.

Why Use Indexes? (Pros)

  1. Faster Data Retrieval (SELECT): This is the primary benefit. Queries involving indexed columns in WHERE, JOIN, ORDER BY, GROUP BY clauses run significantly faster.
  2. Faster Sorting (ORDER BY): If you often sort by an indexed column, the database can use the index’s pre-sorted order, avoiding an expensive sort operation.
  3. Faster Aggregations (GROUP BY): Similar to sorting, grouping by an indexed column can speed up the process.
  4. Enforce Uniqueness: PRIMARY KEY and UNIQUE constraints automatically create an index (unique index) on the specified columns, ensuring data uniqueness.

Why Not Index Everything? (Cons)

  1. Increased Storage Space: Indexes consume disk space.
  2. Slower Data Modification (INSERT, UPDATE, DELETE): When you modify data in an indexed column, the database not only has to update the table data but also the index data structure. This adds overhead and can slow down write operations.
  3. Maintenance Overhead: Indexes need to be maintained by the database.
  4. Too Many Indexes: Over-indexing can actually hurt performance, as the overhead of maintaining many indexes can outweigh the benefits for reads.

Types of Indexes

  • Clustered Index: (Usually your PRIMARY KEY). This type of index sorts and stores the actual data rows in the table based on the index’s key values. A table can only have one clustered index because the data rows themselves can only be stored in one physical order. It determines the physical storage order of the data.
  • Non-Clustered Index: A separate data structure that contains the indexed column’s value and a pointer to the actual data row. A table can have many non-clustered indexes.

(Note: MySQL’s InnoDB storage engine implements primary keys as clustered indexes by default. Other engines or databases might handle this differently.)

SQL Syntax & Examples (MySQL/MariaDB)

Let’s use our scriptbuzz_db. Remember to run USE scriptbuzz_db; first.

1. Creating an Index (CREATE INDEX)

Syntax:

SQL
CREATE [UNIQUE] INDEX index_name
ON table_name (column1, column2, ...);

  • UNIQUE: (Optional) Creates a unique index, ensuring all values in the indexed column(s) are unique. (This is automatically applied for PRIMARY KEY and UNIQUE constraints).
  • index_name: A descriptive name for your index (e.g., idx_users_username).
  • table_name: The table on which to create the index.
  • (column1, column2, ...): The column(s) to include in the index. For multiple columns, the order matters for query optimization.

Example 1: Create an index on username in the users table

This will speed up queries like SELECT * FROM users WHERE username = 'john_doe';

SQL
CREATE INDEX idx_users_username ON users (username);

What it does: Creates a non-unique index named idx_users_username on the username column of the users table.

Example 2: Create a unique index on email in the users table

This will also enforce uniqueness on the email column, similar to ADD CONSTRAINT UNIQUE.

SQL
-- First, ensure no duplicate emails exist to avoid error
-- You might have added a UNIQUE constraint in the previous chapter, which already created an index.
-- If so, you'd drop that constraint first, or skip this example if it already exists.
-- ALTER TABLE users DROP CONSTRAINT UQ_email; -- If you named it UQ_email

CREATE UNIQUE INDEX idx_users_email_unique ON users (email);

What it does: Creates a unique index named idx_users_email_unique on the email column. Any attempt to insert a duplicate email will fail.

Example 3: Create a composite (multi-column) index

If you frequently query WHERE category = 'Electronics' AND price > 100, a composite index on (category, price) can be beneficial. The order of columns in a composite index is important. For WHERE category = X AND price = Y, (category, price) is good. For WHERE category = X, (category, price) is also good (can use the leftmost part of the index). For WHERE price = Y only, it’s not as effective.

SQL
CREATE INDEX idx_products_category_price ON products (category, price);

What it does:

Creates an index that sorts data first by category, then by price within each category. This is useful for queries filtering or sorting by both.

2. Listing Indexes

To see existing indexes on a table:

Syntax:

SQL
SHOW INDEX FROM table_name;
-- OR
SHOW INDEXES FROM table_name;

Example 4: Show indexes on the users table

SQL
SHOW INDEX FROM users;

Expected Output (example): You’ll see indexes for PRIMARY (on user_id), and any other indexes you’ve created (idx_users_username, idx_users_email_unique).

TableNon_uniqueKey_nameSeq_in_indexColumn_nameCollationCardinalitySub_partPackedNullIndex_typeCommentIndex_commentVisible
users0PRIMARY1user_idA3NULLNULLBTREEYES
users1idx_users_username1usernameA3NULLNULLYESBTREEYES
users0idx_users_email_unique1emailA3NULLBTREEYES
  • Non_unique: 0 for unique index, 1 for non-unique.
  • Key_name: The name of the index.
  • Column_name: The column(s) in the index.
  • Index_type: Usually BTREE (B-tree), a common data structure for indexes.

3. Dropping an Index (DROP INDEX)

Syntax:

SQL
DROP INDEX index_name ON table_name;

Example 5: Drop the idx_users_username index

SQL
DROP INDEX idx_users_username ON users;

What it does: Removes the specified index. This frees up disk space and reduces write overhead but might slow down queries that previously benefited from this index.

Verification:

SQL
SHOW INDEX FROM users;

When to Create Indexes

  • Primary Keys & Unique Constraints: Automatically indexed.
  • Foreign Keys: Should almost always be indexed on the child table (the FK column) to speed up JOIN operations and ensure referential integrity checks are fast.
  • Columns in WHERE clauses: If you frequently search or filter by a column.
  • Columns in ORDER BY and GROUP BY clauses: Can help avoid expensive sort operations.
  • Columns in JOIN conditions: Speeds up the matching process.

Tools for Index Analysis

  • EXPLAIN statement: This is a crucial SQL command to analyze query performance. You can prefix any SELECT statement with EXPLAIN to see how MySQL will execute the query, including which indexes it uses (or doesn’t use) and why.
SQL
EXPLAIN SELECT * FROM users WHERE username = 'john_doe';

  • Looking at the key and rows columns in the EXPLAIN output is particularly insightful.
  • Monitoring Tools: Database monitoring tools can help identify slow queries that would benefit from indexing.

Common Pitfalls and Best Practices

  • Over-indexing: Don’t index every column. The performance benefits for reads can be outweighed by the overhead for writes and increased storage.
  • Index on appropriate data types: Indexes are most effective on columns with high cardinality (many distinct values, e.g., email, user_id). They are less effective on columns with low cardinality (few distinct values, e.g., is_active (0 or 1), gender).
  • Composite Index Order: For multi-column indexes, put the most commonly queried/filtered column first. (category, price) is good for WHERE category = X and WHERE category = X AND price = Y. It’s generally not useful for WHERE price = Y alone.
  • LIKE with leading wildcard (%abc): Indexes are generally not used for LIKE queries that start with a wildcard (e.g., WHERE name LIKE '%abc'). They can be used for WHERE name LIKE 'abc%'.
  • Indexes are not magic: They don’t solve all performance problems. Badly written queries, inefficient data models, or hardware limitations can still lead to slow performance.

Notes:

  • Indexes are special lookup structures that speed up data retrieval (SELECT queries).
  • They are like the index of a book, allowing the database to quickly find specific rows.
  • Pros: Faster SELECT, ORDER BY, GROUP BY, enforce uniqueness.
  • Cons: Consume storage, slow down INSERT, UPDATE, DELETE operations.
  • CREATE INDEX: To add an index.
  • SHOW INDEX FROM: To view indexes on a table.
  • DROP INDEX: To remove an index.
  • EXPLAIN: A crucial tool to analyze query execution and verify index usage.
  • Index columns frequently used in WHERE, JOIN, ORDER BY, GROUP BY clauses, and foreign keys.
  • Avoid over-indexing and understand the impact of data cardinality and LIKE operators.

Small Exercise

Using your scriptbuzz_db:

  1. Create an index on the order_date column in the orders table. Why might this be useful?
  2. Create a composite index on product_id and order_id in the order_items table. Why might this be useful?
  3. Use EXPLAIN on a simple SELECT query that filters by order_date (e.g., SELECT * FROM orders WHERE order_date > '2025-06-01';). Observe if your new index is used.
  4. Drop the composite index you created in step 2.

Congratulations! Understanding and effectively using indexes is a mark of a capable database developer. You’re now equipped to not only retrieve data but to make your data retrieval lightning fast. Next, we’ll dive into Transactions, a critical concept for data integrity.

🚀 Explore Popular Learning Tracks