Database Partitioning: Splitting Large Tables for Performance

Database partitioning divides a large table into smaller, more manageable pieces called partitions. It improves query performance by allowing the database to scan only relevant partitions and simplifies data archival and deletion.

Query Execution Plan: Understanding How Databases Run Queries

A query execution plan is the database's roadmap for executing a SQL query. It shows every step the database takes, in order, to retrieve the requested data. Understanding execution plans is the single most important skill for SQL performance optimization. Without reading execution plans, you are guessing at why queries are slow. With them, you see exactly what the database is doing and where the bottlenecks are.

Execution plans reveal whether indexes are being used, whether joins are efficient, and where the database is spending most of its time. To understand execution plans properly, it is helpful to be familiar with SQL basics, database indexing, and SQL query optimization.

Query execution plan overview:
SQL Query
    │
    ▼
Parser (checks syntax)
    │
    ▼
Optimizer (chooses cheapest plan)
    │
    ▼
Execution Plan (step-by-step instructions)
    │
    ▼
Executor (runs the plan, returns results)

Common operations in execution plans:
Seq Scan / Table Scan  → Full table read (slow on large tables)
Index Scan             → Using an index (fast)
Index Only Scan        → All data from index (fastest)
Nested Loop Join       → Join by iterating (good for small tables)
Hash Join              → Build hash table (good for large tables)
Merge Join             → Join sorted data (good for pre-sorted data)

What Is a Query Execution Plan

A query execution plan is a sequence of operations that the database engine uses to execute a SQL query. The query optimizer analyzes the query, considers available indexes and table statistics, and chooses what it estimates to be the cheapest execution plan. The execution plan shows this chosen path, including which indexes are used, how tables are joined, and what order operations are performed.

  • Node/Operation: Each step in the plan (e.g., Seq Scan, Index Scan, Hash Join).
  • Cost: The optimizer's estimate of how expensive an operation is.
  • Rows: The estimated number of rows produced by each operation.
  • Width: The estimated average row size in bytes.
  • Actual Time: The real execution time (with ANALYZE).
  • Loops: How many times the operation was executed.

Why Execution Plans Matter

Execution plans are the only reliable way to understand why a query is slow. They reveal exactly what the database is doing, eliminating guesswork.

  • Identify missing indexes: Table scans indicate missing indexes on filtered columns.
  • Find inefficient joins: See which join algorithm is used and whether it is appropriate.
  • Detect incorrect index usage: Sometimes the optimizer chooses the wrong index; the plan shows this.
  • Locate sorting operations: Expensive sorts often indicate missing indexes on ORDER BY columns.
  • Compare plan changes: See how adding an index or rewriting a query changes the execution plan.
  • Validate optimization efforts: Confirm that your changes actually improved the plan.

How to Generate Execution Plans

PostgreSQL:
-- Estimated plan (no execution)
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';

-- Actual plan with timing (executes the query)
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'john@example.com';

-- Verbose output (more detail)
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM users WHERE email = 'john@example.com';
MySQL:
-- Estimated plan
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';

-- Actual plan (MySQL 8.0.18+)
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'john@example.com';

-- JSON format (more detail)
EXPLAIN FORMAT=JSON SELECT * FROM users WHERE email = 'john@example.com';
SQL Server:
-- Estimated plan
SET SHOWPLAN_XML ON;
GO
SELECT * FROM users WHERE email = 'john@example.com';
GO
SET SHOWPLAN_XML OFF;

-- Actual plan
SET STATISTICS PROFILE ON;
SELECT * FROM users WHERE email = 'john@example.com';
SET STATISTICS PROFILE OFF;

Reading Execution Plan Operations

Scan Operations

Operation Description Performance What It Means
Seq Scan / Table Scan Scans every row in the table Slow on large tables (O(n)) Missing index on WHERE/JOIN columns
Index Scan Scans entire index in order Moderate (better than full scan) No filter, or non-selective filter
Index Seek Navigates index tree to find specific rows Fast (O(log n)) Good index usage
Index Only Scan All data comes from index Fastest possible Covering index, excellent
Example: Seq Scan (bad)
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';

Seq Scan on users  (cost=0.00..2500.00 rows=1 width=100)
  Filter: (email = 'john@example.com')

Problem: No index on email column → full table scan
Example: Index Scan (good)
CREATE INDEX idx_users_email ON users(email);

EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';

Index Scan using idx_users_email on users  (cost=0.00..8.27 rows=1 width=100)
  Index Cond: (email = 'john@example.com')

Solution: Index used for exact lookup

Join Operations

Join Type How It Works Best For Cost
Nested Loop Join For each row in outer table, scan inner table Small outer table, indexed inner table O(n × m) without index, O(n × log m) with index
Hash Join Build hash table from smaller table, probe larger table Larger tables, no indexes O(n + m)
Merge Join Both tables sorted, merge like merging two sorted lists Pre-sorted data, large tables O(n + m)
Example: Nested Loop Join (good with index)
EXPLAIN SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id;

Nested Loop  (cost=0.00..1000.00 rows=1000 width=200)
  -> Seq Scan on customers c  (cost=0.00..10.00 rows=100 width=100)
  -> Index Scan using idx_orders_customer_id on orders o  (cost=0.00..9.00 rows=10 width=100)
        Index Cond: (customer_id = c.id)
Example: Hash Join (good for large tables)
EXPLAIN SELECT * FROM large_orders o JOIN large_customers c ON o.customer_id = c.id;

Hash Join  (cost=1000.00..50000.00 rows=1000000 width=200)
  Hash Cond: (o.customer_id = c.id)
  -> Seq Scan on large_orders o  (cost=0.00..40000.00 rows=1000000 width=100)
  -> Hash  (cost=1000.00..1000.00 rows=100000 width=100)
        -> Seq Scan on large_customers c  (cost=0.00..1000.00 rows=100000 width=100)

Understanding Cost Estimates

The optimizer calculates cost estimates for each possible execution plan and chooses the one with the lowest total cost. Costs are measured in arbitrary units, not seconds.

Cost components (PostgreSQL):
cost=0.28..8.29 rows=1 width=100
       │      │      │        │
       │      │      │        └── Row width in bytes
       │      │      └── Estimated rows returned
       │      └── Estimated total cost (after this node)
       └── Estimated startup cost (before first row)

Total cost = (Seq Scan cost) + (CPU cost) + (I/O cost)

Rules:
- Lower cost is better
- Cost difference of 10x or more is significant
- Compare costs between different query versions

Common Execution Plan Patterns

1. Missing Index (Seq Scan)

Seq Scan on orders  (cost=0.00..50000.00 rows=10000 width=100)
  Filter: (status = 'pending')

Solution: CREATE INDEX idx_orders_status ON orders(status);

2. Index Used Correctly (Index Seek)

Index Scan using idx_orders_status on orders  (cost=0.00..50.00 rows=1000 width=100)
  Index Cond: (status = 'pending')

3. Function on Indexed Column (Cannot Use Index)

Seq Scan on users  (cost=0.00..2500.00 rows=500 width=100)
  Filter: (lower(email) = 'john@example.com')

Problem: LOWER() prevents index usage
Solution: CREATE INDEX idx_users_email_lower ON users(lower(email));

4. Sort Operation (Missing Index on ORDER BY)

Sort  (cost=5000.00..5200.00 rows=10000 width=100)
  Sort Key: created_at
  -> Seq Scan on orders  (cost=0.00..4000.00 rows=10000 width=100)

Solution: CREATE INDEX idx_orders_created_at ON orders(created_at);

Analyzing Actual vs Estimated Rows

When using EXPLAIN ANALYZE, you can compare estimated rows (optimizer guess) against actual rows (what actually happened). Large discrepancies indicate outdated statistics.

EXPLAIN ANALYZE SELECT * FROM users WHERE status = 'active';

Seq Scan on users  (cost=0.00..2500.00 rows=10000 width=100)
                  (actual time=0.05..50.00 rows=50000 loops=1)

Estimated: 10,000 rows
Actual: 50,000 rows
Difference: 5x (statistics outdated)

Solution: ANALYZE users;  -- Update statistics

Common Performance Issues in Execution Plans

Issue What the Plan Shows Likely Cause Solution
Full Table Scan Seq Scan on large table No index on filtered column Create index on WHERE/JOIN columns
Slow Join Nested Loop on large tables without index Missing index on join column Index foreign key columns
Expensive Sort Sort node with high cost No index on ORDER BY column Index the ORDER BY column(s)
Wrong Index Choice Index used but not optimal Outdated statistics, low selectivity Update statistics, adjust index
Type Mismatch Index not used, Seq Scan with filter Column type vs query type mismatch Cast column or query to match types

Execution Plan Best Practices

  • Always use EXPLAIN on slow queries: Never guess why a query is slow. Look at the execution plan.
  • Use EXPLAIN ANALYZE for actual numbers: Estimated plans are useful, but actual timing reveals real performance.
  • Look for Seq Scans on large tables: These are the most common performance problem.
  • Check for type mismatches: Compare column types with query values.
  • Compare costs before and after changes: Verify that your optimization actually improved the plan.
  • Update statistics regularly: Outdated statistics cause bad plan choices.
  • Test with realistic data volume: A plan that works on 1000 rows may fail on 1 million.
Execution plan analysis checklist:
□ Any Seq Scan on tables > 10,000 rows?
□ Are indexes being used for WHERE conditions?
□ Are indexes being used for JOIN conditions?
□ Is the join type appropriate (Hash/Merge for large tables)?
□ Is there an expensive Sort operation?
□ Are estimated rows close to actual rows?
□ Any type mismatch warnings?
□ Is the query using covering indexes?

Visual Execution Plan Tools

  • pgAdmin (PostgreSQL): Graphical execution plan visualizer with color coding.
  • MySQL Workbench: Visual explain output with execution plan diagrams.
  • SQL Server Management Studio (SSMS): Graphical execution plan with operator details.
  • Explain.depesz.com: Online PostgreSQL plan visualizer and analyzer.
  • pev2 (PostgreSQL): Open-source execution plan visualizer.

Frequently Asked Questions

  1. What is the difference between EXPLAIN and EXPLAIN ANALYZE?
    EXPLAIN shows the estimated plan without executing the query. EXPLAIN ANALYZE executes the query and shows actual execution times, row counts, and I/O statistics. Use EXPLAIN for planning, EXPLAIN ANALYZE for debugging real performance issues.
  2. Why does the optimizer choose a different plan than I expect?
    The optimizer uses table statistics (row count, data distribution, index selectivity) to estimate costs. Outdated statistics, low selectivity, or correlated columns can cause suboptimal plans. Run ANALYZE to update statistics.
  3. What does "cost=0.00..2500.00" mean?
    The first number (0.00) is the startup cost before returning the first row. The second number (2500.00) is the total cost after returning all rows. Lower is better.
  4. Why does my query use a Seq Scan even with an index?
    The optimizer may choose a Seq Scan if the index is not selective enough (e.g., 50% of rows match the condition). The cost of reading the index plus the table may be higher than a simple Seq Scan.
  5. How often should I update statistics?
    After significant data changes (10%+ of rows inserted/updated/deleted). Most databases have auto-vacuum/auto-analyze that handles this automatically.
  6. What should I learn next after execution plans?
    After mastering execution plans, explore SQL optimization techniques, advanced indexing, database performance tuning, and database monitoring for complete performance mastery.