Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

High Performance Optimization

The grpc_graphql_gateway is designed for extreme throughput requirements, capable of handling 100,000+ requests per second (RPS) per instance. To achieve these targets, the gateway employs several advanced architectural optimizations.

Performance Targets

With High-Performance mode enabled:

  • 100K+ RPS: For cached queries serving from memory.
  • 50K+ RPS: For uncached queries performing gRPC backend calls.
  • Sub-millisecond P99: Latency for cache hits.

Key Optimizations

SIMD-Accelerated JSON Parsing

Standard JSON parsing is often the primary bottleneck in GraphQL gateways. We use simd-json, which employs SIMD (Single Instruction, Multiple Data) instructions (AVX2, SSE4.2, NEON) to parse JSON.

  • 2x – 5x faster than serde_json for typical payloads.
  • Reduced CPU cycles per request, allowing more concurrency on the same hardware.

Lock-Free Sharded Caching

Global locks cause severe contention as CPU core counts increase. Our ShardedCache implementation:

  • Splits the cache into 64 – 128 independent shards.
  • Uses lock-free reads and independent write locks per shard.
  • Eliminates the “Global Lock” bottleneck.

Object Pooling

Memory allocation is expensive at 100K RPS. We use high-performance object pools for request/response buffers:

  • Zero-allocation steady state for many request patterns.
  • Pre-allocated buffers are returned to a lock-free ArrayQueue for reuse.

Connection Pool Tuning

The gateway automatically tunes gRPC and HTTP/2 settings for maximum throughput:

  • HTTP/2 Prior Knowledge: Skips nested version negotiation.
  • Adaptive Window Sizes: Optimizes flow control for high-bandwidth/low-latency local networks.
  • TCP NoDelay: Disables Nagle’s algorithm for immediate packet dispatch.

Configuration

Enable High-Performance mode in your GatewayBuilder:

use grpc_graphql_gateway::{Gateway, HighPerfConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    Gateway::builder()
        // ... standard config ...
        .with_high_performance(HighPerfConfig::ultra_fast())
        .build()?
        .serve("0.0.0.0:8888")
        .await
}

Configuration Profiles

We provide three pre-tuned profiles:

ProfileUse Case
ultra_fast()Maximum Throughput: Optimized for 100K+ RPS.
balanced()Balanced: Good mix of throughput and latency.
low_latency()Low Latency: Optimized for minimal response time over raw RPS.

Benchmarking

We include a performance benchmark suite in the repository.

# Start the example server
cargo run --example greeter --release

# Run the benchmark
cargo run --bin benchmark --release -- --concurrency=200 --duration=30

For a complete automated test, use ./benchmark.sh which handles builds and runs multiple profiles.