fortify

Performance Regression Testing

This document describes Fortify’s performance regression testing framework for tracking and validating benchmark performance over time.

Overview

Fortify includes a comprehensive performance testing framework that:

Tracks Performance: Records benchmark results with detailed metrics
Detects Regressions: Automatically identifies performance degradation
Sets Baselines: Establishes acceptable performance thresholds
Generates Reports: Creates detailed performance analysis reports
CI/CD Integration: Automated testing in GitHub Actions

Quick Start

Running Benchmarks

# Run all benchmarks
./scripts/benchmark.sh run

# Generate performance baseline from results
./scripts/benchmark.sh generate-baseline

# Check for regressions
./scripts/benchmark.sh check

# Complete workflow (run + check + compare)
./scripts/benchmark.sh all

Manual Benchmark Execution

# Run benchmarks with standard duration
go test -bench=. -benchmem -benchtime=3s ./...

# Run extended benchmarks
go test -bench=. -benchmem -benchtime=10s -count=3 ./...

# Run specific package benchmarks
go test -bench=. -benchmem ./circuitbreaker

Performance Tracking API

Setting Up a Tracker

import "github.com/felixgeelhaar/fortify/testing"

// Create performance tracker
tracker := testing.NewPerformanceTracker(".benchmark-results")

// Set custom thresholds
tracker.SetThresholds(testing.RegressionThresholds{
    TimeIncrease:  1.10, // 10% slower is acceptable
    AllocIncrease: 1.20, // 20% more allocations
    BytesIncrease: 1.15, // 15% more memory
})

Adding Baselines

// Manually add baseline
tracker.AddBaseline(testing.PerformanceBaseline{
    Name:        "BenchmarkCircuitBreaker",
    MaxNsPerOp:  1000,
    MaxAllocs:   5,
    MaxBytes:    512,
    Description: "Circuit breaker baseline",
})

// Generate from benchmark results (with 10% safety factor)
results := []testing.BenchmarkResult{...}
tracker.GenerateBaselineFromResults(results, 1.1)

// Save baselines to file
tracker.SaveBaselines("performance-baselines.json")

// Load baselines from file
tracker.LoadBaselines("performance-baselines.json")

Checking for Regressions

// Check current results against baselines
results := []testing.BenchmarkResult{
    {
        Name:        "BenchmarkCircuitBreaker",
        NsPerOp:     950,
        AllocsPerOp: 4,
        BytesPerOp:  480,
        Timestamp:   time.Now(),
    },
}

report := tracker.CheckRegressions(results)

fmt.Printf("Total checks: %d\n", report.TotalChecks)
fmt.Printf("Passed: %d\n", report.Passed)
fmt.Printf("Failed: %d\n", report.Failed)

// Handle regressions
for _, regression := range report.Regressions {
    fmt.Printf("❌ %s: %s increased by %.2f%% (threshold: %.2f%%)\n",
        regression.BenchmarkName,
        regression.Metric,
        regression.Increase,
        regression.Threshold)
}

Saving Reports

// Save benchmark report
report := testing.BenchmarkReport{
    Timestamp: time.Now(),
    Results:   results,
    Metadata: map[string]string{
        "commit": "abc123",
        "branch": "main",
    },
}

tracker.SaveReport(report)

// Load latest report
latest, err := tracker.LoadLatestReport()

Comparing Reports

baseline := testing.BenchmarkReport{...}
current := testing.BenchmarkReport{...}

changes := testing.CompareReports(baseline, current)

for benchmark, metrics := range changes {
    timeChange := metrics["time_change"]
    fmt.Printf("%s: %.2f%% time change\n", benchmark, timeChange)
}

Benchmark Structure

Benchmark Result Format

type BenchmarkResult struct {
    Name           string    // Benchmark name
    NsPerOp        float64   // Nanoseconds per operation
    AllocsPerOp    uint64    // Allocations per operation
    BytesPerOp     uint64    // Bytes allocated per operation
    Timestamp      time.Time // When benchmark was run
    GitCommit      string    // Git commit hash
    GitBranch      string    // Git branch name
    GoVersion      string    // Go version
    OS             string    // Operating system
    Arch           string    // Architecture
    CPUModel       string    // CPU model
    MemoryTotal    uint64    // Total system memory
    IterationCount int       // Number of iterations
}

Performance Baseline Format

type PerformanceBaseline struct {
    Name        string  // Benchmark name
    MaxNsPerOp  float64 // Maximum acceptable ns/op
    MaxAllocs   uint64  // Maximum acceptable allocations
    MaxBytes    uint64  // Maximum acceptable bytes
    Description string  // Baseline description
}

Regression Thresholds

Default Thresholds

Time: 10% increase (1.10x)
Allocations: 20% increase (1.20x)
Memory: 15% increase (1.15x)

Custom Thresholds

tracker.SetThresholds(testing.RegressionThresholds{
    TimeIncrease:       1.05, // 5% time increase
    AllocIncrease:      1.10, // 10% allocation increase
    BytesIncrease:      1.08, // 8% memory increase
    AbsoluteMaxNsPerOp: 10000, // Hard limit at 10µs
})

CI/CD Integration

GitHub Actions

Fortify includes automated performance testing in CI/CD:

# .github/workflows/performance.yml
name: Performance Regression Testing

on:
  pull_request:
    branches: [ main ]
  push:
    branches: [ main ]

Features

PR Checks: Runs benchmarks on every pull request
Baseline Comparison: Compares against main branch
Regression Detection: Fails CI if regressions detected
Performance Tracking: Archives results for historical analysis
Automated Comments: Posts regression warnings on PRs

Workflow Steps

Run benchmarks on PR code
Checkout main branch
Run benchmarks on main
Compare results using benchstat
Check against baselines
Post results to PR

Local CI Simulation

# Simulate CI workflow
./scripts/benchmark.sh all

# View results
cat .benchmark-results/latest-raw.txt

Benchmark Scripts

benchmark.sh

Main benchmark automation script.

Commands

# Run benchmarks only
./scripts/benchmark.sh run

# Generate baseline from current results
./scripts/benchmark.sh generate-baseline

# Check for regressions
./scripts/benchmark.sh check

# Compare with previous run
./scripts/benchmark.sh compare

# Full workflow
./scripts/benchmark.sh all

# Show help
./scripts/benchmark.sh help

Environment Variables

# Customize benchmark duration
export BENCHMARK_TIME=5s
./scripts/benchmark.sh run

# Run multiple times for stability
export BENCHMARK_COUNT=3
./scripts/benchmark.sh run

Performance Analysis

Viewing Results

# View raw benchmark output
cat .benchmark-results/latest-raw.txt

# View parsed JSON results
cat .benchmark-results/latest-parsed.json | jq

# View baselines
cat scripts/performance-baselines.json | jq

Using benchstat

Install benchstat for detailed comparison:

go install golang.org/x/perf/cmd/benchstat@latest

Compare two benchmark runs:

benchstat baseline.txt current.txt

Output example:

name                    old time/op    new time/op    delta
CircuitBreakerSuccess   850ns ± 2%     920ns ± 3%   +8.24%
RetrySuccess            2.10µs ± 1%    2.25µs ± 2%  +7.14%

Best Practices

Setting Baselines

Run Multiple Times: Average 3-5 runs for stability
Use Safety Factor: Add 10-20% buffer for variance
Update Regularly: Refresh baselines after optimizations
Document Changes: Explain baseline adjustments

Detecting Regressions

Set Appropriate Thresholds: Balance sensitivity vs noise
Consider System Variance: Account for CI/CD environment
Review Context: Not all increases are regressions
Track Trends: Look for consistent degradation

Writing Benchmarks

Realistic Workloads: Mirror production scenarios
Isolate Code: Minimize external dependencies
Avoid Optimization: Don’t let compiler optimize away code
Use b.ResetTimer: Exclude setup time
Run Sufficient Iterations: Ensure statistical significance

Example benchmark:

func BenchmarkCircuitBreaker(b *testing.B) {
    cb := circuitbreaker.New[string](circuitbreaker.Config{
        Timeout: 100 * time.Millisecond,
    })

    b.ResetTimer() // Reset after setup

    for i := 0; i < b.N; i++ {
        _, _ = cb.Execute(context.Background(), func(ctx context.Context) (string, error) {
            return "result", nil
        })
    }
}

Continuous Monitoring

Historical Tracking

Results are stored in .benchmark-results/ with timestamps:

.benchmark-results/
├── benchmark-20240315-143022.json
├── benchmark-20240315-150432.json
└── latest-raw.txt

Performance Trends

Monitor trends over time:

# View all historical results
ls -lt .benchmark-results/*.json

# Compare specific dates
benchstat .benchmark-results/benchmark-20240301-*.txt \
          .benchmark-results/benchmark-20240315-*.txt

Alerting

GitHub Actions automatically:

Fails CI on regressions
Comments on PRs with warnings
Archives results for analysis
Tracks performance over time

Troubleshooting

High Variance

Problem: Benchmarks show inconsistent results

Solutions:

Increase benchtime (e.g., -benchtime=10s)
Run multiple times (-count=5)
Disable CPU frequency scaling
Close background applications

False Positives

Problem: CI reports regressions incorrectly

Solutions:

Increase threshold tolerance
Add safety factor to baselines
Review baseline generation method
Consider system differences

Memory Allocations

Problem: Unexpected allocation increases

Solutions:

Use go test -benchmem for memory profiling
Run with -memprofile=mem.prof
Analyze with go tool pprof mem.prof
Check for unintended allocations

Examples

See testing/example_test.go for complete examples:

Example_performanceTracking - Basic tracking usage
Example_performanceBaseline - Generating baselines

Testing Utilities - Chaos engineering tools
Metrics - Prometheus integration
Contributing - Development guidelines

References

This site is open source. Improve this page.

fortify

Performance Regression Testing

Overview

Quick Start

Running Benchmarks

Manual Benchmark Execution

Performance Tracking API

Setting Up a Tracker

Adding Baselines

Checking for Regressions

Saving Reports

Comparing Reports

Benchmark Structure

Benchmark Result Format

Performance Baseline Format

Regression Thresholds

Default Thresholds

Custom Thresholds

CI/CD Integration

GitHub Actions

Features

Workflow Steps

Local CI Simulation

Benchmark Scripts

benchmark.sh

Commands

Environment Variables

Performance Analysis

Viewing Results

Using benchstat

Best Practices

Setting Baselines

Detecting Regressions

Writing Benchmarks

Continuous Monitoring

Historical Tracking

Performance Trends

Alerting

Troubleshooting

High Variance

False Positives

Memory Allocations

Examples

Related Documentation

References