DevOps

PostgreSQL Performance Tuning

PostgreSQL Performance Tuning

PostgreSQL is incredibly powerful, but default settings aren't optimized for production workloads. This guide covers the most impactful tuning options and techniques to get the best performance from your database.

📊 Always benchmark before and after changes. Performance tuning is highly workload-dependent - what works for OLTP might hurt OLAP.

Configuration Tuning

These postgresql.conf settings have the biggest impact on performance:

ini
# Memory Settings
# ===============

# Shared buffers: PostgreSQL's main memory cache
# Start with 25% of available RAM, max 8GB on most systems
shared_buffers = 4GB

# Effective cache size: Estimate of memory available for disk caching
# Set to 75% of total system memory
effective_cache_size = 12GB

# Work memory: Memory for sorts and hash operations per operation
# Be careful - this is per operation, not total
work_mem = 256MB

# Maintenance work memory: Memory for VACUUM, CREATE INDEX, etc.
maintenance_work_mem = 1GB

# Write-Ahead Log (WAL) Settings
# ==============================

# WAL buffers: Memory for WAL data before writing to disk
wal_buffers = 64MB

# Checkpoint settings: Balance between recovery time and I/O load
checkpoint_completion_target = 0.9
checkpoint_timeout = 15min
max_wal_size = 4GB

# Query Planner Settings
# ======================

# For SSDs, random reads are nearly as fast as sequential
random_page_cost = 1.1

# Number of concurrent I/O operations the disk can handle
effective_io_concurrency = 200

# Statistics target: Higher = better plans, slower ANALYZE
default_statistics_target = 200
What do these settings do?
shared_buffers - The most important setting. PostgreSQL uses this cache for all operations. Too low = excessive disk I/O. Too high = OS cache starvation.

work_mem - Dangerous if set too high because it's per-operation. A query with 10 sorts could use 10x this value. Monitor with EXPLAIN (ANALYZE, BUFFERS).

effective_cache_size - Doesn't allocate memory, just tells the planner how much caching to expect. Affects index vs. sequential scan decisions.

random_page_cost - Default of 4.0 assumes spinning disks. On SSDs, set to 1.1-1.5 to encourage index usage.

Query Analysis with EXPLAIN

EXPLAIN ANALYZE is your best friend for understanding query performance:

sql
-- Basic explain
EXPLAIN SELECT * FROM posts WHERE author_id = 123;

-- With actual execution stats (runs the query!)
EXPLAIN ANALYZE SELECT * FROM posts WHERE author_id = 123;

-- With buffer usage information
EXPLAIN (ANALYZE, BUFFERS)
SELECT p.*, a.name as author_name
FROM posts p
JOIN authors a ON p.author_id = a.id
WHERE p.status = 'PUBLISHED'
ORDER BY p.published_at DESC
LIMIT 10;

-- Output as JSON for programmatic analysis
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM posts WHERE title ILIKE '%python%';
sql
-- Example output interpretation
/*
Limit  (cost=0.42..12.56 rows=10) (actual time=0.152..0.398 rows=10)
  ->  Index Scan using idx_posts_published on posts
        (cost=0.42..1842.66 rows=15234) (actual time=0.150..0.394 rows=10)
        Filter: (status = 'PUBLISHED')
        Rows Removed by Filter: 2
        Buffers: shared hit=15

Planning Time: 0.234 ms
Execution Time: 0.425 ms
*/

-- Key things to look for:
-- 1. "actual time" vs "cost" - large differences indicate estimation problems
-- 2. "Rows Removed by Filter" - high numbers suggest missing indexes
-- 3. "Buffers: shared hit" vs "read" - reads mean cache misses
-- 4. Sequential Scan on large tables - usually means missing index

Index Strategies

Good indexes are the single biggest performance win. Here are patterns that work:

sql
-- Composite index for common queries
-- Column order matters: most selective first, or match WHERE clause order
CREATE INDEX idx_posts_status_published
ON posts (status, published_at DESC);

-- Partial index: only index rows you'll actually query
-- Smaller = faster to scan and update
CREATE INDEX idx_posts_published_recent
ON posts (published_at DESC)
WHERE status = 'PUBLISHED' AND published_at > NOW() - INTERVAL '1 year';

-- Covering index: includes all columns needed by query
-- Enables index-only scans (no table access needed)
CREATE INDEX idx_posts_list
ON posts (status, published_at DESC)
INCLUDE (title, excerpt, author_id);

-- Expression index: for computed values
CREATE INDEX idx_users_email_lower
ON users (LOWER(email));
-- Now this query uses the index:
-- SELECT * FROM users WHERE LOWER(email) = 'user@example.com'

-- GIN index for full-text search
CREATE INDEX idx_posts_search
ON posts USING GIN (to_tsvector('english', title || ' ' || content));

-- GIN index for JSONB queries
CREATE INDEX idx_posts_metadata
ON posts USING GIN (metadata);
-- Supports: metadata @> '{"featured": true}'
⚠️ Every index slows down writes. Only create indexes that serve actual queries. Use pg_stat_user_indexes to find unused indexes.

Connection Pooling

PostgreSQL connections are expensive. Use PgBouncer for connection pooling:

ini
# pgbouncer.ini
[databases]
myapp = host=localhost port=5432 dbname=myapp

[pgbouncer]
listen_port = 6432
listen_addr = *
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

# Pool mode: transaction is best for most web apps
pool_mode = transaction

# Pool size per user/database pair
default_pool_size = 20
max_client_conn = 1000
max_db_connections = 50

Monitoring Queries

sql
-- Enable query statistics (requires pg_stat_statements extension)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find slowest queries by total time
SELECT
    round(total_exec_time::numeric, 2) as total_time_ms,
    calls,
    round(mean_exec_time::numeric, 2) as avg_time_ms,
    round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) as pct,
    left(query, 80) as query_preview
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- Find unused indexes
SELECT
    schemaname || '.' || relname as table,
    indexrelname as index,
    pg_size_pretty(pg_relation_size(i.indexrelid)) as size,
    idx_scan as index_scans
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE idx_scan < 50
  AND pg_relation_size(i.indexrelid) > 1024 * 1024  -- > 1MB
ORDER BY pg_relation_size(i.indexrelid) DESC;

-- Table bloat (dead tuples)
SELECT
    schemaname || '.' || relname as table,
    n_live_tup as live_rows,
    n_dead_tup as dead_rows,
    round(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) as dead_pct,
    last_vacuum,
    last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

Quick Wins

  1. Set random_page_cost = 1.1 for SSD storage
  2. Increase shared_buffers to 25% of RAM
  3. Use connection pooling (PgBouncer) for web apps
  4. Add indexes for columns in WHERE and JOIN clauses
  5. Use partial indexes for frequently filtered subsets
  6. Run VACUUM ANALYZE after bulk data changes
  7. Monitor with pg_stat_statements
  8. Use covering indexes to enable index-only scans

Related Posts

Comments

Log in to leave a comment.

Log In

Loading comments...