Composite Index in Database
- Authors
- Name
- Pallavi
- Published on
I am Pallavi, Co-Founder @ Outcome School, experienced in creating robust and scalable backend systems. I love working on performance optimization and security. I am passionate about knowledge sharing, mentoring developers, and contributing to technical blogs.
Join Outcome School and get high paying tech job: Outcome School
In this blog, we will learn about the Composite Index in database and why it offers better performance. We will also explore the impact of column order in a composite index.
What is a Composite Index?
A Composite Index is an index created using multiple columns of a table.
Let’s take an example to understand this.
Suppose there is a payments
table in a database with the columns id
, customer_id
, and amount
. A customer can have multiple entries in this table for the various payments they have made. For this example, we have inserted 1 million rows into the table.
Let’s consider the query below, where we need to fetch all payments where customer_id
equals 20 and the amount
is between 100 and 300.
SELECT * FROM payments where customer_id = 20 and amount >= 100 AND amount <= 300.
First, let's check the number of rows that satisfy the query above. We will use count
here.
SELECT count(*) FROM payments WHERE customer_id = 20 AND amount >= 100 AND amount <= 300;
count(*) |
---|
4070 |
Here, we have total 4070 rows satisfying the above query.
Scenario 1: There is no index on customer_id
and amount
column.
Let’s analyze and understand how MySQL executes the query. We can view the execution plan using the EXPLAIN
keyword.
EXPLAIN SELECT count(*) FROM payments WHERE customer_id = 20 AND amount >= 100 AND amount <= 300;
id | table | possible_keys | key | key_len | rows | filtered |
---|---|---|---|---|---|---|
1 | payments | NULL | NULL | NULL | 998222 | 1.11 |
The rows column in the execution plan generated by MySQL's EXPLAIN
indicates the estimated number of rows that MySQL needs to examine to execute the query.
In this case, it is approximately a full table scan, as the value is 998222 and the number of rows in the table is 1 million.
The time taken to execute the query is 0.189 seconds.
Scenario 2: There are two indexes on the payments
table - one on the customer_id
column and one on the amount
column.
CREATE INDEX idx_customer on payments (customer_id);
CREATE INDEX idx_amount on payments (amount);
Let’s analyze and understand how MySQL executes the query in this scenario.
EXPLAIN SELECT count(*) FROM payments WHERE customer_id = 20 AND amount >= 100 AND amount <= 300;
id | table | possible_keys | key | key_len | rows | filtered |
---|---|---|---|---|---|---|
1 | payments | idx_customer, idx_amount | idx_customer | 5 | 9994 | 50.00 |
In this case the MySQL will scan approximately 10,000 rows. The time taken to execute the query is 0.027 seconds which is much better than the full table scan of 1 million rows when no index is present.
Scenario 3: When composite index is in payments
table - on customer_id
and amount
columns.
Let's create a composite index idx_customer_amount
using two columns: customer_id
, amount
.
CREATE INDEX idx_customer_amount on payments (customer_id, amount);
Let’s analyze and understand how MySQL executes the query in this scenario.
EXPLAIN SELECT count(*) FROM payments WHERE customer_id = 20 AND amount >= 100 AND amount <= 300;
id | table | possible_keys | key | key_len | rows | filtered |
---|---|---|---|---|---|---|
1 | payments | idx_customer_amount | idx_customer_amount | 10 | 4070 | 100.00 |
In this case, MySQL is scanning exactly 4070 rows which is equal to the number of records satisfying the query. The time taken to execute the query is 0.0021 seconds. This has much better performance than the case where two indexes are created. Here, we can clearly see that composite index has significantly improved the performance of SQL query.
Let's try to understand why we are getting better performance when using the composite index.
In our payments
table example, for the composite index idx_customer_amount
(customer_id
, amount
), the index data is first ordered based on customer_id
, and then for every entry of customer_id
, the index is sorted based on the amount
column.
You can see below how the data is sorted.
customer_id 1
| amount 120
| amount 250
| amount 310
customer_id 2
| amount 200
| amount 300
| amount 500
customer_id 3
| amount 50
| amount 250
| amount 550
customer_id 4
| amount 250
| amount 340
| amount 500
With a composite index, the matching rows can be found quickly, as the index is first sorted by customer_id
and then, for a given customer_id
, it is further sorted by the amount
column.
Now that we are convinced using the composite index in the above example gives us the best performance for the given query, we must learn about the impact of the order of columns considered when creating the composite index, as only then can we get the most out of it.
Impact of the Order of columns in Composite Index
The order of columns in a composite index determine how index is sorted.
Consider the example: idx_a_b_c
(A, B, C)
For the above composite index, the index data would first be sorted based on column A, then for every entry in column A, it would be sorted again based on column B, and for every entry in column B, the index data would be sorted based on column C.
Now, let's go back to our payments
table example. For the composite index idx_customer_amount
(customer_id
, amount
), we will examine different queries to understand the impact of the column order.
Query 1:
SELECT * FROM payments where customer_id >= 1 AND customer_id <= 3
This query is efficient as the composite index has leftmost column as customer_id
, so the index would be sorted on customer_id
.
Query 2:
SELECT * FROM payments where customer_id = 2 and amount >= 500 AND amount <= 5000;
With a composite index, the matching rows can be found quickly, as the index is first sorted by customer_id
and then, for a given customer_id
, it is further sorted by the amount
column. So, this query is efficient.
Query 3:
SELECT * FROM payments where amount >= 200 AND amount <= 2000
As the index is sorted on customer_id
first so the database would have to scan the entire index to find the matching amount. So, this query is not efficient.
With the three queries above, we can easily notice the importance of the column order when creating the composite index.
Conclusion: Based on our query pattern, we should decide whether to use a composite index or not. Also, when using a composite index, we should carefully consider the order of the columns, as it all depends on your query patterns.
Now, we have understood the Composite Index in Database.
That's it for now.
Thanks
Pallavi
Co-Founder @ Outcome School
You can connect with me on:
Follow Outcome School on: