Indexes are the most impactful performance optimization in any database. A single well-chosen index can turn a 10-second query into a 10-millisecond query. But indexes consume disk space and slow down writes.
1. B-Tree vs Hash Indexes
| Property | B-Tree | Hash |
|---|---|---|
| Lookup type | Equality + Range | Equality only |
| ORDER BY support | Yes (in-order traversal) | No |
| LIKE 'prefix%' | Yes | No |
-- B-tree (default)
CREATE INDEX idx_email ON users(email);
-- Hash (MySQL/MariaDB, Memory engine only)
CREATE INDEX idx_email ON users(email) USING HASH;
-- PostgreSQL hash (WAL-logged since PG 10)
CREATE INDEX idx_email ON users(email) USING HASH;
Rule of thumb: Use B-tree. Hash only helps for pure equality lookups. B-tree is right for 95% of indexes.
2. Composite Indexes and Column Order
CREATE INDEX idx_user_status_created ON users(status, created_at);
Column order rules:
- Equality columns first (
WHERE col = ?) - Range columns second (
WHERE col > ?,BETWEEN) - Sort columns last (
ORDER BY)
-- Query:
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'shipped'
ORDER BY created_at DESC;
-- Optimal index:
CREATE INDEX idx_customer_status_created
ON orders(customer_id, status, created_at);
The "skip scan" limitation: An index on (a, b, c) supports queries on a, a,b, and a,b,c. It does NOT support queries on b only, or a,c (with b skipped) efficiently.
3. Covering Indexes
A covering index contains all columns needed by a query, so the DB never accesses the table:
-- Without covering index: index + table access
EXPLAIN SELECT id, email, name FROM users WHERE email = '[email protected]';
-- Extra: NULL (needs to read table row)
-- With covering index:
CREATE INDEX idx_email_covering ON users(email, name);
EXPLAIN SELECT id, email, name FROM users WHERE email = '[email protected]';
-- Extra: Using index (no table access needed)
4. Cardinality and Selectivity
-- Check cardinality
SHOW INDEX FROM orders;
| Column | Cardinality | Selectivity | Indexing Value |
|---|---|---|---|
| id (PK) | 1M | 100% | Excellent |
| 1M | 100% | Excellent | |
| status | 5 | 0.0005% | Poor alone, useful in composite |
| is_active | 2 | 0.0001% | Very poor alone |
The cardinality trap: An index on status (5 values) matching 20% of rows — full table scan is cheaper. But (status, created_at) is excellent.
5. Partial and Functional Indexes
-- Partial index (PostgreSQL, SQL Server, SQLite)
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';
-- Functional index (PostgreSQL, MySQL 8.0+, MariaDB 10.2+)
CREATE INDEX idx_lower_email ON users(LOWER(email));
6. Index Maintenance
# PostgreSQL: rebuild indexes
REINDEX INDEX idx_name;
# MySQL/MariaDB: rebuild
ALTER TABLE my_table ENGINE=InnoDB;
# Or: OPTIMIZE TABLE my_table;
When to rebuild: After bulk DELETE (>30% rows), when index size grows disproportionately, during low-traffic windows.
7. The Index Decision Framework
For every slow query:
- Is the WHERE clause using an index? (EXPLAIN:
keycolumn) - Is ORDER BY using an index? (check for "Using filesort")
- Is the query covered? (check for "Using index")
- Are JOIN columns indexed? (every FK needs an index)
SET GLOBAL slow_query_log = ON;
SET GLOBAL long_query_time = 0.1;
EXPLAIN FORMAT=JSON SELECT ... G
Summary
| Strategy | When | Example |
|---|---|---|
| B-tree | Default — equality + range | CREATE INDEX ... (default) |
| Composite | Multi-column WHERE + ORDER BY | (eq_cols, range_col, sort_col) |
| Covering | High-frequency queries | Include all SELECT columns |
| Partial | Filtered subset | WHERE status = 'active' |
| Functional | Expression-based queries | LOWER(email) |
The most important skill is reading EXPLAIN output. Every DDL change should start with "what does the query plan look like?" and end with "did it improve?"