Composite Index in Database

Authors
  • Pallavi
    Name
    Pallavi
    Published on
Composite Index in Database

I am Pallavi, Co-Founder @ Outcome School, experienced in creating robust and scalable backend systems. I love working on performance optimization and security. I am passionate about knowledge sharing, mentoring developers, and contributing to technical blogs.

Join Outcome School and get high paying tech job: Outcome School

In this blog, we will learn about the Composite Index in database and why it offers better performance. We will also explore the impact of column order in a composite index.

What is a Composite Index?

A Composite Index is an index created using multiple columns of a table.

Let’s take an example to understand this.

Suppose there is a payments table in a database with the columns id, customer_id, and amount. A customer can have multiple entries in this table for the various payments they have made. For this example, we have inserted 1 million rows into the table.

Let’s consider the query below, where we need to fetch all payments where customer_id equals 20 and the amount is between 100 and 300.

SELECT * FROM payments where customer_id = 20 and amount >= 100 AND amount <= 300.

First, let's check the number of rows that satisfy the query above. We will use count here.

SELECT count(*)  FROM payments WHERE customer_id = 20 AND amount >=  100 AND amount <= 300;
count(*)
4070

Here, we have total 4070 rows satisfying the above query.

Scenario 1: There is no index on customer_id and amount column.

Let’s analyze and understand how MySQL executes the query. We can view the execution plan using the EXPLAIN keyword.

EXPLAIN SELECT count(*)  FROM payments WHERE customer_id = 20 AND amount >=  100 AND amount <= 300;
idtablepossible_keyskeykey_lenrowsfiltered
1paymentsNULLNULLNULL9982221.11

The rows column in the execution plan generated by MySQL's EXPLAIN indicates the estimated number of rows that MySQL needs to examine to execute the query.

In this case, it is approximately a full table scan, as the value is 998222 and the number of rows in the table is 1 million.

The time taken to execute the query is 0.189 seconds.

Scenario 2: There are two indexes on the payments table - one on the customer_id column and one on the amount column.

CREATE INDEX idx_customer on payments (customer_id);
CREATE INDEX idx_amount on payments (amount);

Let’s analyze and understand how MySQL executes the query in this scenario.

EXPLAIN SELECT count(*)  FROM payments WHERE customer_id = 20 AND amount >=  100 AND amount <= 300;
idtablepossible_keyskeykey_lenrowsfiltered
1paymentsidx_customer, idx_amountidx_customer5999450.00

In this case the MySQL will scan approximately 10,000 rows. The time taken to execute the query is 0.027 seconds which is much better than the full table scan of 1 million rows when no index is present.

Scenario 3: When composite index is in payments table - on customer_id and amount columns.

Let's create a composite index idx_customer_amount using two columns: customer_id, amount.

CREATE INDEX idx_customer_amount on payments (customer_id, amount);

Let’s analyze and understand how MySQL executes the query in this scenario.

EXPLAIN SELECT count(*)  FROM payments WHERE customer_id = 20 AND amount >=  100 AND amount <= 300;
idtablepossible_keyskeykey_lenrowsfiltered
1paymentsidx_customer_amountidx_customer_amount104070100.00

In this case, MySQL is scanning exactly 4070 rows which is equal to the number of records satisfying the query. The time taken to execute the query is 0.0021 seconds. This has much better performance than the case where two indexes are created. Here, we can clearly see that composite index has significantly improved the performance of SQL query.

Let's try to understand why we are getting better performance when using the composite index.

In our payments table example, for the composite index idx_customer_amount (customer_id, amount), the index data is first ordered based on customer_id, and then for every entry of customer_id, the index is sorted based on the amount column.

You can see below how the data is sorted.

customer_id 1
    | amount 120
    | amount 250
    | amount 310

customer_id 2
    | amount 200
    | amount 300
    | amount 500

customer_id 3
    | amount 50
    | amount 250
    | amount 550

customer_id 4
    | amount 250
    | amount 340
    | amount 500

With a composite index, the matching rows can be found quickly, as the index is first sorted by customer_id and then, for a given customer_id, it is further sorted by the amount column.

Now that we are convinced using the composite index in the above example gives us the best performance for the given query, we must learn about the impact of the order of columns considered when creating the composite index, as only then can we get the most out of it.

Impact of the Order of columns in Composite Index

The order of columns in a composite index determine how index is sorted.

Consider the example: idx_a_b_c (A, B, C)

For the above composite index, the index data would first be sorted based on column A, then for every entry in column A, it would be sorted again based on column B, and for every entry in column B, the index data would be sorted based on column C.

Now, let's go back to our payments table example. For the composite index idx_customer_amount (customer_id, amount), we will examine different queries to understand the impact of the column order.

Query 1:

SELECT * FROM payments where customer_id >= 1 AND customer_id <= 3

This query is efficient as the composite index has leftmost column as customer_id, so the index would be sorted on customer_id.

Query 2:

SELECT * FROM payments where customer_id = 2 and amount >= 500 AND amount <= 5000;

With a composite index, the matching rows can be found quickly, as the index is first sorted by customer_id and then, for a given customer_id, it is further sorted by the amount column. So, this query is efficient.

Query 3:

SELECT * FROM payments where amount >= 200 AND amount <= 2000

As the index is sorted on customer_id first so the database would have to scan the entire index to find the matching amount. So, this query is not efficient.

With the three queries above, we can easily notice the importance of the column order when creating the composite index.

Conclusion: Based on our query pattern, we should decide whether to use a composite index or not. Also, when using a composite index, we should carefully consider the order of the columns, as it all depends on your query patterns.

Now, we have understood the Composite Index in Database.

That's it for now.

Thanks

Pallavi
Co-Founder @ Outcome School

You can connect with me on:

Follow Outcome School on:

Read all of our high-quality blogs here.