High Performance Optimization

The grpc_graphql_gateway is designed for extreme throughput requirements, capable of handling 100,000+ requests per second (RPS) per instance. To achieve these targets, the gateway employs several advanced architectural optimizations.

Performance Targets

With High-Performance mode enabled:

100K+ RPS: For cached queries serving from memory.
50K+ RPS: For uncached queries performing gRPC backend calls.
Sub-millisecond P99: Latency for cache hits.

Key Optimizations

SIMD-Accelerated JSON Parsing

Standard JSON parsing is often the primary bottleneck in GraphQL gateways. We use simd-json, which employs SIMD (Single Instruction, Multiple Data) instructions (AVX2, SSE4.2, NEON) to parse JSON.

2x – 5x faster than serde_json for typical payloads.
Reduced CPU cycles per request, allowing more concurrency on the same hardware.

Lock-Free Sharded Caching

Global locks cause severe contention as CPU core counts increase. Our ShardedCache implementation:

Splits the cache into 64 – 128 independent shards.
Uses lock-free reads and independent write locks per shard.
Eliminates the “Global Lock” bottleneck.

Object Pooling

Memory allocation is expensive at 100K RPS. We use high-performance object pools for request/response buffers:

Zero-allocation steady state for many request patterns.
Pre-allocated buffers are returned to a lock-free ArrayQueue for reuse.

Connection Pool Tuning

The gateway automatically tunes gRPC and HTTP/2 settings for maximum throughput:

HTTP/2 Prior Knowledge: Skips nested version negotiation.
Adaptive Window Sizes: Optimizes flow control for high-bandwidth/low-latency local networks.
TCP NoDelay: Disables Nagle’s algorithm for immediate packet dispatch.

Configuration

Enable High-Performance mode in your GatewayBuilder:

use grpc_graphql_gateway::{Gateway, HighPerfConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    Gateway::builder()
        // ... standard config ...
        .with_high_performance(HighPerfConfig::ultra_fast())
        .build()?
        .serve("0.0.0.0:8888")
        .await
}

Configuration Profiles

We provide three pre-tuned profiles:

Profile	Use Case
`ultra_fast()`	Maximum Throughput: Optimized for 100K+ RPS.
`balanced()`	Balanced: Good mix of throughput and latency.
`low_latency()`	Low Latency: Optimized for minimal response time over raw RPS.

Benchmarking

We include a performance benchmark suite in the repository.

# Start the example server
cargo run --example greeter --release

# Run the benchmark
cargo run --bin benchmark --release -- --concurrency=200 --duration=30

For a complete automated test, use ./benchmark.sh which handles builds and runs multiple profiles.

Keyboard shortcuts

gRPC-GraphQL Gateway