Performance Tuning

This guide covers optimizing BWS for maximum performance, throughput, and efficiency in production environments.

Performance Overview

BWS is built on Pingora, providing excellent performance characteristics:

High-performance async I/O
Minimal memory footprint
Efficient connection handling
Built-in caching capabilities
Horizontal scaling support

System-Level Optimization

Operating System Tuning

Network Stack Optimization

# /etc/sysctl.d/99-bws-performance.conf

# TCP/IP stack tuning
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 5000
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 15

# Buffer sizes
net.core.rmem_default = 262144
net.core.rmem_max = 16777216
net.core.wmem_default = 262144
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Connection tracking
net.netfilter.nf_conntrack_max = 1048576
net.netfilter.nf_conntrack_tcp_timeout_established = 300

# Apply settings
sysctl -p /etc/sysctl.d/99-bws-performance.conf

File Descriptor Limits

# /etc/security/limits.d/bws-performance.conf
bws soft nofile 1048576
bws hard nofile 1048576
bws soft nproc 32768
bws hard nproc 32768

# For systemd services
# /etc/systemd/system/bws.service.d/limits.conf
[Service]
LimitNOFILE=1048576
LimitNPROC=32768

CPU and Memory Optimization

# CPU governor for performance
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# Disable swap for consistent performance
swapoff -a
# Comment out swap in /etc/fstab

# NUMA optimization (for multi-socket systems)
echo 0 > /proc/sys/kernel/numa_balancing

Storage Optimization

File System Tuning

# Mount options for static files (add to /etc/fstab)
/dev/sdb1 /opt/bws/static ext4 defaults,noatime,nodiratime,data=writeback 0 0

# For high-performance scenarios
/dev/nvme0n1 /opt/bws/cache xfs defaults,noatime,largeio,inode64 0 0

# Temporary files in memory
tmpfs /tmp tmpfs defaults,size=2G,mode=1777 0 0
tmpfs /var/tmp tmpfs defaults,size=1G,mode=1777 0 0

I/O Scheduler Optimization

# For SSDs
echo noop > /sys/block/sda/queue/scheduler

# For HDDs
echo deadline > /sys/block/sda/queue/scheduler

# Make persistent (add to /etc/rc.local or systemd service)
echo 'noop' > /sys/block/sda/queue/scheduler

BWS Configuration Optimization

Performance-Focused Configuration

# /etc/bws/performance.toml
[performance]
# Worker thread configuration
worker_threads = 16  # 2x CPU cores for I/O bound workloads
max_blocking_threads = 512

# Connection settings
max_connections = 100000
keep_alive_timeout = 60
request_timeout = 30
response_timeout = 30

# Buffer sizes
read_buffer_size = "64KB"
write_buffer_size = "64KB"
max_request_size = "10MB"

# Connection pooling
connection_pool_size = 1000
connection_pool_idle_timeout = 300

[caching]
enabled = true
max_memory = "1GB"
ttl_default = 3600
ttl_static = 86400

[compression]
enabled = true
level = 6  # Balance between CPU and compression ratio
min_size = 1024  # Don't compress small files

[[sites]]
name = "high-performance"
hostname = "0.0.0.0"
port = 8080
static_dir = "/opt/bws/static"

# Optimized headers for caching
[sites.headers]
"Cache-Control" = "public, max-age=31536000, immutable"
"Vary" = "Accept-Encoding"
"X-Content-Type-Options" = "nosniff"

Memory Management

[memory]
# Garbage collection tuning
gc_threshold = 0.8  # Trigger GC at 80% memory usage
max_heap_size = "4GB"

# Buffer pool settings
buffer_pool_size = "512MB"
buffer_pool_max_buffers = 10000

# Static file caching
file_cache_size = "2GB"
file_cache_max_files = 100000

CPU Optimization

[cpu]
# Thread affinity (if supported)
pin_threads = true
cpu_affinity = [0, 1, 2, 3, 4, 5, 6, 7]  # Pin to specific cores

# Async runtime tuning
io_uring = true  # Use io_uring if available (Linux 5.1+)
event_loop_threads = 4

Load Testing and Benchmarking

Benchmarking Tools

wrk - HTTP Benchmarking

# Install wrk
sudo apt install wrk

# Basic load test
wrk -t12 -c400 -d30s --latency http://localhost:8080/

# Custom script for complex scenarios
cat > test-script.lua << 'EOF'
wrk.method = "GET"
wrk.headers["User-Agent"] = "wrk-benchmark"

request = function()
    local path = "/static/file" .. math.random(1, 1000) .. ".jpg"
    return wrk.format(nil, path)
end
EOF

wrk -t12 -c400 -d30s -s test-script.lua http://localhost:8080/

Apache Bench (ab)

# Simple test
ab -n 10000 -c 100 http://localhost:8080/

# With keep-alive
ab -n 10000 -c 100 -k http://localhost:8080/

# POST requests
ab -n 1000 -c 10 -p post_data.json -T application/json http://localhost:8080/api/test

hey - Modern Load Testing

# Install hey
go install github.com/rakyll/hey@latest

# Basic test
hey -n 10000 -c 100 http://localhost:8080/

# With custom headers
hey -n 10000 -c 100 -H "Accept: application/json" http://localhost:8080/api/health

# Rate limited test
hey -n 10000 -q 100 http://localhost:8080/

Performance Testing Strategy

Baseline Testing

#!/bin/bash
# performance-baseline.sh

SERVER_URL="http://localhost:8080"
RESULTS_DIR="/tmp/performance-results"
DATE=$(date +%Y%m%d_%H%M%S)

mkdir -p "$RESULTS_DIR"

echo "Running BWS performance baseline tests - $DATE"

# Test 1: Static file serving
echo "Testing static file serving..."
wrk -t4 -c50 -d60s --latency "$SERVER_URL/static/test.html" > "$RESULTS_DIR/static-$DATE.log"

# Test 2: API endpoints
echo "Testing API endpoints..."
wrk -t4 -c50 -d60s --latency "$SERVER_URL/health" > "$RESULTS_DIR/api-$DATE.log"

# Test 3: High concurrency
echo "Testing high concurrency..."
wrk -t12 -c1000 -d60s --latency "$SERVER_URL/" > "$RESULTS_DIR/concurrency-$DATE.log"

# Test 4: Sustained load
echo "Testing sustained load..."
wrk -t8 -c200 -d300s --latency "$SERVER_URL/" > "$RESULTS_DIR/sustained-$DATE.log"

echo "Performance tests completed. Results in $RESULTS_DIR"

Stress Testing

#!/bin/bash
# stress-test.sh

# Gradually increase load
for connections in 100 500 1000 2000 5000; do
    echo "Testing with $connections connections..."
    wrk -t12 -c$connections -d30s --latency http://localhost:8080/ > "stress-$connections.log"
    
    # Check if server is still responsive
    if ! curl -f -s http://localhost:8080/health > /dev/null; then
        echo "Server failed at $connections connections"
        break
    fi
    
    # Cool down period
    sleep 10
done

Performance Monitoring

Real-time Monitoring Script

#!/bin/bash
# monitor-performance.sh

LOG_FILE="/var/log/bws/performance.log"
INTERVAL=5

log_metrics() {
    local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
    local cpu_usage=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
    local memory_usage=$(free | grep Mem | awk '{printf "%.1f", $3/$2 * 100.0}')
    local connections=$(ss -tuln | grep :8080 | wc -l)
    local load_avg=$(uptime | awk -F'load average:' '{print $2}' | cut -d',' -f1 | xargs)
    
    echo "$timestamp,CPU:$cpu_usage%,Memory:$memory_usage%,Connections:$connections,Load:$load_avg" >> "$LOG_FILE"
}

echo "Starting performance monitoring (interval: ${INTERVAL}s)"
echo "Timestamp,CPU,Memory,Connections,LoadAvg" > "$LOG_FILE"

while true; do
    log_metrics
    sleep $INTERVAL
done

Performance Dashboard Script

#!/bin/bash
# performance-dashboard.sh

# Real-time performance dashboard
watch -n 1 '
echo "=== BWS Performance Dashboard ==="
echo "Time: $(date)"
echo ""
echo "=== System Resources ==="
echo "CPU Usage: $(top -bn1 | grep "Cpu(s)" | awk "{print \$2}")"
echo "Memory: $(free -h | grep Mem | awk "{printf \"Used: %s/%s (%.1f%%)\", \$3, \$2, \$3/\$2*100}")"
echo "Load Average: $(uptime | awk -F"load average:" "{print \$2}")"
echo ""
echo "=== Network ==="
echo "Active Connections: $(ss -tuln | grep :8080 | wc -l)"
echo "TCP Connections: $(ss -s | grep TCP | head -1)"
echo ""
echo "=== BWS Process ==="
BWS_PID=$(pgrep bws)
if [ ! -z "$BWS_PID" ]; then
    echo "Process ID: $BWS_PID"
    echo "Memory Usage: $(ps -o pid,ppid,cmd,%mem,%cpu -p $BWS_PID | tail -1)"
    echo "File Descriptors: $(lsof -p $BWS_PID | wc -l)"
else
    echo "BWS process not found"
fi
'

Optimization Strategies

Static File Optimization

File Compression

#!/bin/bash
# precompress-static-files.sh

STATIC_DIR="/opt/bws/static"

# Pre-compress static files
find "$STATIC_DIR" -type f \( -name "*.css" -o -name "*.js" -o -name "*.html" -o -name "*.svg" \) | while read file; do
    # Gzip compression
    if [ ! -f "$file.gz" ] || [ "$file" -nt "$file.gz" ]; then
        gzip -k -9 "$file"
        echo "Compressed: $file"
    fi
    
    # Brotli compression (if available)
    if command -v brotli > /dev/null; then
        if [ ! -f "$file.br" ] || [ "$file" -nt "$file.br" ]; then
            brotli -k -q 11 "$file"
            echo "Brotli compressed: $file"
        fi
    fi
done

# Optimize images
find "$STATIC_DIR" -name "*.jpg" -o -name "*.jpeg" | while read file; do
    if command -v jpegoptim > /dev/null; then
        jpegoptim --strip-all --max=85 "$file"
    fi
done

find "$STATIC_DIR" -name "*.png" | while read file; do
    if command -v optipng > /dev/null; then
        optipng -o2 "$file"
    fi
done

CDN Integration

# Configure BWS for CDN usage
[[sites]]
name = "cdn-optimized"
hostname = "localhost"
port = 8080
static_dir = "/opt/bws/static"

[sites.headers]
"Cache-Control" = "public, max-age=31536000, immutable"
"Access-Control-Allow-Origin" = "*"
"Vary" = "Accept-Encoding"
"X-CDN-Cache" = "MISS"

# Separate configuration for CDN edge
[[sites]]
name = "edge"
hostname = "edge.example.com"
port = 8081
static_dir = "/opt/bws/edge-cache"

[sites.headers]
"Cache-Control" = "public, max-age=604800"  # 1 week for edge cache
"X-CDN-Cache" = "HIT"

Database and Cache Optimization

Redis Integration (if using caching)

# Redis performance tuning
# /etc/redis/redis.conf

# Memory settings
maxmemory 2gb
maxmemory-policy allkeys-lru

# Persistence settings (adjust based on needs)
save 900 1
save 300 10
save 60 10000

# Network settings
tcp-keepalive 60
timeout 0

# Performance settings
tcp-backlog 511
databases 16

Memcached Configuration

# /etc/memcached.conf
-m 1024  # 1GB memory
-c 1024  # Max connections
-t 4     # Number of threads
-l 127.0.0.1  # Listen address
-p 11211 # Port

Reverse Proxy Optimization

Nginx Performance Configuration

# /etc/nginx/nginx.conf
user nginx;
worker_processes auto;
worker_rlimit_nofile 100000;

events {
    worker_connections 4000;
    use epoll;
    multi_accept on;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 30;
    keepalive_requests 100000;
    reset_timedout_connection on;
    client_body_timeout 10;
    send_timeout 2;
    
    # Gzip compression
    gzip on;
    gzip_vary on;
    gzip_min_length 1024;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types
        text/plain
        text/css
        text/xml
        text/javascript
        application/json
        application/javascript
        application/xml+rss
        application/atom+xml
        image/svg+xml;
    
    # Buffer sizes
    client_max_body_size 10m;
    client_body_buffer_size 128k;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 4k;
    output_buffers 1 32k;
    postpone_output 1460;
    
    # Proxy settings
    proxy_buffering on;
    proxy_buffer_size 128k;
    proxy_buffers 4 256k;
    proxy_busy_buffers_size 256k;
    
    upstream bws_backend {
        least_conn;
        server 127.0.0.1:8080 max_fails=3 fail_timeout=30s;
        server 127.0.0.1:8081 max_fails=3 fail_timeout=30s;
        keepalive 300;
    }
    
    server {
        listen 80 default_server;
        
        location / {
            proxy_pass http://bws_backend;
            proxy_http_version 1.1;
            proxy_set_header Connection "";
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            
            # Caching
            proxy_cache_valid 200 1h;
            proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
        }
    }
}

Performance Monitoring and Analysis

Key Performance Metrics

Response Time Monitoring

#!/bin/bash
# response-time-monitor.sh

URL="http://localhost:8080"
LOG_FILE="/var/log/bws/response-times.log"

while true; do
    TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
    RESPONSE_TIME=$(curl -o /dev/null -s -w "%{time_total}" "$URL/health")
    RESPONSE_CODE=$(curl -o /dev/null -s -w "%{http_code}" "$URL/health")
    
    echo "$TIMESTAMP,$RESPONSE_TIME,$RESPONSE_CODE" >> "$LOG_FILE"
    sleep 1
done

Throughput Measurement

#!/bin/bash
# throughput-monitor.sh

# Monitor requests per second
LOG_FILE="/var/log/bws/throughput.log"
ACCESS_LOG="/var/log/nginx/access.log"

while true; do
    TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
    RPS=$(tail -60 "$ACCESS_LOG" | grep "$(date '+%d/%b/%Y:%H:%M')" | wc -l)
    
    echo "$TIMESTAMP,$RPS" >> "$LOG_FILE"
    sleep 60
done

Performance Analysis Tools

Log Analysis

#!/bin/bash
# analyze-performance.sh

LOG_FILE="/var/log/nginx/access.log"
OUTPUT_DIR="/tmp/performance-analysis"
DATE=$(date +%Y%m%d)

mkdir -p "$OUTPUT_DIR"

echo "Analyzing performance for $DATE"

# Top requested URLs
echo "=== Top Requested URLs ===" > "$OUTPUT_DIR/top-urls-$DATE.txt"
awk '{print $7}' "$LOG_FILE" | sort | uniq -c | sort -rn | head -20 >> "$OUTPUT_DIR/top-urls-$DATE.txt"

# Response codes distribution
echo "=== Response Codes ===" > "$OUTPUT_DIR/response-codes-$DATE.txt"
awk '{print $9}' "$LOG_FILE" | sort | uniq -c | sort -rn >> "$OUTPUT_DIR/response-codes-$DATE.txt"

# Response time analysis (if logged)
echo "=== Response Time Analysis ===" > "$OUTPUT_DIR/response-times-$DATE.txt"
awk '{print $NF}' "$LOG_FILE" | grep -E '^[0-9]+\.[0-9]+$' | \
awk '{
    sum += $1
    count++
    if ($1 > max) max = $1
    if (min == 0 || $1 < min) min = $1
}
END {
    print "Average:", sum/count
    print "Min:", min
    print "Max:", max
}' >> "$OUTPUT_DIR/response-times-$DATE.txt"

# Hourly request distribution
echo "=== Hourly Request Distribution ===" > "$OUTPUT_DIR/hourly-requests-$DATE.txt"
awk '{print substr($4, 14, 2)}' "$LOG_FILE" | sort | uniq -c >> "$OUTPUT_DIR/hourly-requests-$DATE.txt"

echo "Analysis complete. Results in $OUTPUT_DIR"

Resource Usage Trends

#!/bin/bash
# resource-trends.sh

DURATION=${1:-3600}  # Default 1 hour
INTERVAL=10
SAMPLES=$((DURATION / INTERVAL))

echo "Collecting resource usage data for $DURATION seconds..."
echo "timestamp,cpu_percent,memory_mb,connections,load_avg" > resource-usage.csv

for i in $(seq 1 $SAMPLES); do
    TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
    CPU_PERCENT=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
    MEMORY_MB=$(ps -o pid,rss -p $(pgrep bws) | tail -1 | awk '{print $2/1024}')
    CONNECTIONS=$(ss -tuln | grep :8080 | wc -l)
    LOAD_AVG=$(uptime | awk -F'load average:' '{print $2}' | cut -d',' -f1 | xargs)
    
    echo "$TIMESTAMP,$CPU_PERCENT,$MEMORY_MB,$CONNECTIONS,$LOAD_AVG" >> resource-usage.csv
    
    sleep $INTERVAL
done

echo "Resource usage data collected in resource-usage.csv"

Optimization Recommendations

Hardware Recommendations

CPU Optimization

Use modern CPUs with high single-thread performance
Consider NUMA topology for multi-socket systems
Enable CPU turbo boost for peak performance
Pin BWS processes to specific CPU cores for consistency

Memory Optimization

Use ECC RAM for data integrity
Ensure sufficient RAM for file caching
Consider NUMA-aware memory allocation
Monitor for memory leaks and fragmentation

Storage Optimization

Use NVMe SSDs for static file storage
Implement RAID for redundancy without performance penalty
Consider separate storage for logs and static files
Use tmpfs for temporary files and cache

Network Optimization

Use 10Gbps+ network interfaces for high-traffic scenarios
Implement network bonding for redundancy
Optimize network driver settings
Consider SR-IOV for virtualized environments

Application-Level Optimization

Code Optimization

#![allow(unused)]
fn main() {
// Example optimizations in BWS configuration
use pingora::prelude::*;
use tokio::runtime::Builder;

// Custom runtime configuration
let runtime = Builder::new_multi_thread()
    .worker_threads(num_cpus::get() * 2)
    .max_blocking_threads(512)
    .thread_keep_alive(Duration::from_secs(60))
    .enable_all()
    .build()
    .unwrap();

// Connection pooling optimization
let pool_config = ConnectionPoolConfig {
    max_connections_per_host: 100,
    idle_timeout: Duration::from_secs(300),
    connect_timeout: Duration::from_secs(10),
};
}

Memory Pool Configuration

[memory_pool]
small_buffer_size = "4KB"
medium_buffer_size = "64KB"
large_buffer_size = "1MB"
pool_size = 1000

Scaling Strategies

Horizontal Scaling

# Kubernetes deployment example
apiVersion: apps/v1
kind: Deployment
metadata:
  name: bws-deployment
spec:
  replicas: 6
  selector:
    matchLabels:
      app: bws
  template:
    metadata:
      labels:
        app: bws
    spec:
      containers:
      - name: bws
        image: ghcr.io/yourusername/bws:latest
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        ports:
        - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: bws-service
spec:
  selector:
    app: bws
  ports:
  - port: 80
    targetPort: 8080
  type: LoadBalancer

Vertical Scaling

# Automatic resource scaling script
#!/bin/bash
# auto-scale.sh

CPU_THRESHOLD=80
MEMORY_THRESHOLD=80
CHECK_INTERVAL=60

while true; do
    CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
    MEMORY_USAGE=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100.0}')
    
    if (( $(echo "$CPU_USAGE > $CPU_THRESHOLD" | bc -l) )); then
        echo "High CPU usage detected: $CPU_USAGE%"
        # Scale up logic here
    fi
    
    if (( MEMORY_USAGE > MEMORY_THRESHOLD )); then
        echo "High memory usage detected: $MEMORY_USAGE%"
        # Scale up logic here
    fi
    
    sleep $CHECK_INTERVAL
done

Performance Best Practices

Configuration Best Practices

Right-size worker threads: 2x CPU cores for I/O-bound workloads
Optimize buffer sizes: Match your typical request/response sizes
Enable compression: For text-based content
Use connection pooling: Reuse connections for better performance
Configure appropriate timeouts: Balance responsiveness and resource usage

Monitoring Best Practices

Monitor key metrics: Response time, throughput, error rate, resource usage
Set up alerting: For performance degradation
Regular performance testing: Catch regressions early
Capacity planning: Monitor trends and plan for growth
Document baselines: Know your normal performance characteristics

Deployment Best Practices

Load testing: Test under realistic load before production
Gradual rollouts: Use blue-green or canary deployments
Performance regression testing: Automated checks for performance changes
Resource monitoring: Continuous monitoring of system resources
Regular optimization: Periodic performance reviews and tuning

Next Steps

Review Configuration Schema for advanced tuning options
Check Troubleshooting for performance issues
Learn about Production Setup for deployment strategies

Keyboard shortcuts

BWS Web Server Documentation