Databases

Fix Slow Queries Now: Must-Know Indexing Hacks!

A professional and visually striking image symbolizing database optimization. The design features a futuristic speedometer, glowing database icons, and interconnected lines representing indexing. The colors are vibrant blues and greens. The text overlay reads 'Fix Slow Queries Now!' in large, bold, and clear modern typography that is highly legible and stands out against the background.

Indexes are a powerful tool that significantly speed up query execution in databases. They function like pointers in a book, helping quickly locate the necessary information without reading every page. This article explains how indexes work, their internal structure, advantages, and limitations.

Why Are Indexes Needed?

When executing a query, the database searches for rows that meet the conditions. If the table contains a large number of records, searching without an index can be slow, as every row must be checked—this is called a full table scan. This increases query execution time and server load. Indexes help avoid this by providing a more efficient way to search, directing the query to relevant rows and minimizing read operations.

Index types

B-Trees

  • Algorithmic Complexity:
    • Search: O(log n)
    • Insertion: O(log n)
    • Deletion: O(log n)
  • Structure: Balanced tree where nodes contain keys and links to child nodes. All keys are ordered to support data sorting.
  • Search: Provides logarithmic search time due to minimal tree depth.
  • Disk Usage: Typically occupies 20-30% of the indexed data size as keys and metadata are stored.
  • Disk Storage: Stored in separate structures optimized for sequential reads, reducing I/O operations.
  • Advantages:
    • Suitable for range queries like WHERE age BETWEEN 20 AND 30.
    • Universal and used for most search operations.
  • Disadvantages:
    • Increased insertion and deletion time if rebalancing is required.

B-Tree+ (Enhanced B-Trees)

  • Algorithmic Complexity:
    • Search: O(log n)
    • Insertion: O(log n)
    • Deletion: O(log n)
  • Structure: In B-Tree+, data is stored only in leaf nodes, minimizing I/O operations. Internal nodes contain only keys and pointers.
  • Search: Efficient due to compact data organization in leaf nodes.
  • Disk Usage: May occupy 30-40% of indexed data size due to key duplication in leaf nodes.
  • Disk Storage: Stored in sequential disk blocks to minimize fragmentation.
  • Advantages:
    • Enhanced performance for batch queries and reads.
    • Supports sequential reading for ranges.
  • Disadvantages:
    • Consumes more disk space due to key duplication.

Hash Indexes

  • Algorithmic Complexity:
    • Search: O(1)
    • Insertion: O(1)
    • Deletion: O(1) (average)
  • Structure: Uses hash functions to transform keys into hash values for quick exact matches.
  • Search: Extremely fast for exact matches, e.g., WHERE id = 42.
  • Disk Usage: Requires approximately 10-20% of the indexed data size.
  • Disk Storage: Stored in segmented areas optimized for search operations.
  • Advantages:
    • High speed for exact matches.
  • Disadvantages:
    • Does not support ordered or range queries.
    • Can consume significant space with a large number of records.

Full-Text Indexes

  • Algorithmic Complexity:
    • Search: O(log n)
    • Insertion: O(log n)
    • Deletion: O(log n)
  • Structure: Optimized for tokenized textual data with mappings for quick lookups.
  • Search: Supports phrase matching, stemming, and relevance ranking.
  • Disk Usage: Requires additional space for tokenized data and metadata.
  • Disk Storage: Stored in compressed formats for fast retrieval of textual data.
  • Advantages:
    • Efficient text searching in large datasets.
    • Supports complex search operations like keyword proximity and partial word matching.
  • Disadvantages:
    • Requires specific configuration and may significantly increase storage.

Index Maintenance

Why is index maintenance important?
Over time, indexes can become fragmented due to frequent insert, update, and delete operations. This leads to reduced performance. Regular index maintenance helps maintain their efficiency.

Key Maintenance Methods

  • REINDEX (PostgreSQL): Rebuilds the index, eliminating fragmentation and restoring performance.REINDEX INDEX index_name;
  • OPTIMIZE (MySQL): Eliminates table and associated index fragmentation.OPTIMIZE TABLE table_name;
  • Automatic Maintenance: Some databases (e.g., SQL Server) have built-in mechanisms for automatic index maintenance.

Index Comparison

Index TypeSearch ComplexityInsert/Delete ComplexityDisk UsageMain AdvantagesMain Disadvantages
B-TreeO(log n)O(log n)ModerateSupports range queriesMay require rebalancing
B-Tree+O(log n)O(log n)HighSpeeds up sequential readsConsumes more disk space
HashO(1)O(1) (average)LowFast exact-match searchDoes not support range queries
Full-TextO(log n)O(log n)HighText searchingHigh storage requirements
BitmapO(1)O(n)Very LowEfficient for low cardinalityNot suitable for high workloads
CompositeO(log n)O(log n)HighSpeeds up multi-column queriesRequires careful column ordering
CoveringO(log n)O(log n)Very HighAvoids table lookupsIncreases storage, slows writes
SpatialO(log n)O(log n)ModerateEfficient for geospatial queriesComplex setup, limited scenarios
UniqueO(log n)O(log n)ModerateEnsures data uniquenessSlightly reduces insert performance

Using EXPLAIN for Query Optimization

EXPLAIN is a tool provided by most relational databases that allows analysis of query execution plans. It shows how the database plans to execute the query, including index usage, the number of rows to be read, and the order of operations.

How to Use EXPLAIN

  1. Execute the query with the EXPLAIN prefix:EXPLAIN SELECT * FROM users WHERE id = 42;
  2. The result will show:
    • Execution type (e.g., index search or full table scan).
    • Index used (if any).
    • The number of rows to be processed.

How EXPLAIN Helps Optimize Queries

  • Identifying Unused Indexes: If the query performs a full table scan instead of using an index, this may indicate the need for creating or modifying an index.
  • Improving Operation Order: Analysis reveals suboptimal sequences of operations (e.g., table joins).
  • Reducing Processed Rows: EXPLAIN shows the number of rows to be read, helping evaluate the need for additional indexes.
  • JOIN Optimization: Analysis shows how tables are joined and helps improve the join order.

Example:

EXPLAIN SELECT * FROM orders WHERE order_date BETWEEN '2024-12-01' AND '2024-12-31';

The result may show:

  • Use of an index for range queries.
  • Expected number of rows to be processed.

Index Selectivity

Definition: Index selectivity measures how effectively an index can reduce the number of rows returned by a query. It is calculated as the ratio of unique values in the index to the total number of rows in the table.

High Selectivity:

  • Characterized by a large number of unique values in the column.
  • Example: Primary keys or unique identifiers.
  • Advantages: Such indexes significantly reduce the search space and improve performance.

Low Selectivity:

  • Characterized by a small number of unique values (e.g., Boolean fields).
  • Example: Columns with gender or status data.
  • Limitations: Such indexes are less effective for filtering large datasets.

Tip: Prioritize high-selectivity indexes for better performance.

When Indexes Fail to Work

  1. When the query affects a large number of rows, even with an index, execution may fall back to a full table scan.
  2. Low index selectivity makes it less effective.
  3. Functions applied to an indexed field in the query (e.g., WHERE LOWER(name) = 'john') can prevent index usage.
  4. Queries with overly complex conditions not matching the column order in a composite index.

Example: SELECT * FROM users WHERE LOWER(name) = 'john';

  • Even with an index on the name field, it cannot be used due to the LOWER function.

Examples of Index Usage

Simple Example: Searching by ID

Suppose we have a users table with a million rows, and we want to find the user with id = 42:

  1. Without Index:
    • The database scans every table row until the desired one is found.
    • In the worst case, a million operations are needed.
  2. With Index:
    • The index quickly finds the row with id = 42, using the B-Tree structure.
    • The search takes only a few steps.

More Complex Example: Range Query

Suppose we have an orders table with a column order_date, and we want to find all orders made in December 2024:

SELECT * FROM orders WHERE order_date BETWEEN '2024-12-01' AND '2024-12-31';
  1. Without Index:
    • The database performs a full table scan, checking each row.
  2. With Index:
    • An index on the order_date column speeds up the query by focusing only on the specified range.

What to read next?

  • Database Systems: The Complete Book by Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom
  • SQL Performance Explained by Markus Winand
  • High Performance MySQL by Silvia Botros, Jeremy Tinley, and Baron Schwartz
  • PostgreSQL 14 Internals by Hans-Jürgen Schönig
  • Designing Data-Intensive Applications by Martin Kleppmann

Conclusion

Indexes are a powerful tool for optimizing database performance. Understanding how they work and applying them effectively helps significantly speed up operations on large datasets. However, it is important to consider their limitations and use indexes wisely to avoid unnecessary resource overhead.

This article does not cover advanced index types such as GiST, SP-GiST, GIN, and BRIN, which are useful for specialized use cases like full-text search, spatial data, and large sequential datasets. You can learn more about these indexes in the PostgreSQL Index Documentation.

Hi, I’m Ilya Isaev

Leave a Reply

Your email address will not be published. Required fields are marked *