Database Indexing Strategies That Actually Work

Indexes are the most impactful performance optimization in any database. A single well-chosen index can turn a 10-second query into a 10-millisecond query. But indexes consume disk space and slow down writes.

1. B-Tree vs Hash Indexes

Property B-Tree Hash
Lookup type Equality + Range Equality only
ORDER BY support Yes (in-order traversal) No
LIKE 'prefix%' Yes No
-- B-tree (default)
CREATE INDEX idx_email ON users(email);

-- Hash (MySQL/MariaDB, Memory engine only)
CREATE INDEX idx_email ON users(email) USING HASH;

-- PostgreSQL hash (WAL-logged since PG 10)
CREATE INDEX idx_email ON users(email) USING HASH;

Rule of thumb: Use B-tree. Hash only helps for pure equality lookups. B-tree is right for 95% of indexes.

2. Composite Indexes and Column Order

CREATE INDEX idx_user_status_created ON users(status, created_at);

Column order rules:

  1. Equality columns first (WHERE col = ?)
  2. Range columns second (WHERE col > ?, BETWEEN)
  3. Sort columns last (ORDER BY)
-- Query:
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'shipped'
ORDER BY created_at DESC;

-- Optimal index:
CREATE INDEX idx_customer_status_created
  ON orders(customer_id, status, created_at);

The "skip scan" limitation: An index on (a, b, c) supports queries on a, a,b, and a,b,c. It does NOT support queries on b only, or a,c (with b skipped) efficiently.

3. Covering Indexes

A covering index contains all columns needed by a query, so the DB never accesses the table:

-- Without covering index: index + table access
EXPLAIN SELECT id, email, name FROM users WHERE email = '[email protected]';
-- Extra: NULL (needs to read table row)

-- With covering index:
CREATE INDEX idx_email_covering ON users(email, name);
EXPLAIN SELECT id, email, name FROM users WHERE email = '[email protected]';
-- Extra: Using index (no table access needed)

4. Cardinality and Selectivity

-- Check cardinality
SHOW INDEX FROM orders;
Column Cardinality Selectivity Indexing Value
id (PK) 1M 100% Excellent
email 1M 100% Excellent
status 5 0.0005% Poor alone, useful in composite
is_active 2 0.0001% Very poor alone

The cardinality trap: An index on status (5 values) matching 20% of rows — full table scan is cheaper. But (status, created_at) is excellent.

5. Partial and Functional Indexes

-- Partial index (PostgreSQL, SQL Server, SQLite)
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';

-- Functional index (PostgreSQL, MySQL 8.0+, MariaDB 10.2+)
CREATE INDEX idx_lower_email ON users(LOWER(email));

6. Index Maintenance

# PostgreSQL: rebuild indexes
REINDEX INDEX idx_name;

# MySQL/MariaDB: rebuild
ALTER TABLE my_table ENGINE=InnoDB;
# Or: OPTIMIZE TABLE my_table;

When to rebuild: After bulk DELETE (>30% rows), when index size grows disproportionately, during low-traffic windows.

7. The Index Decision Framework

For every slow query:

  1. Is the WHERE clause using an index? (EXPLAIN: key column)
  2. Is ORDER BY using an index? (check for "Using filesort")
  3. Is the query covered? (check for "Using index")
  4. Are JOIN columns indexed? (every FK needs an index)
SET GLOBAL slow_query_log = ON;
SET GLOBAL long_query_time = 0.1;
EXPLAIN FORMAT=JSON SELECT ... G

Summary

Strategy When Example
B-tree Default — equality + range CREATE INDEX ... (default)
Composite Multi-column WHERE + ORDER BY (eq_cols, range_col, sort_col)
Covering High-frequency queries Include all SELECT columns
Partial Filtered subset WHERE status = 'active'
Functional Expression-based queries LOWER(email)

The most important skill is reading EXPLAIN output. Every DDL change should start with "what does the query plan look like?" and end with "did it improve?"