Request Collapsing

Request collapsing (also known as request deduplication) is a powerful optimization that reduces the number of gRPC backend calls by identifying and coalescing identical concurrent requests.

How It Works

When a GraphQL query contains multiple fields that call the same gRPC method with identical arguments, request collapsing ensures only one gRPC call is made:

query {
  user1: getUser(id: "1") { name }
  user2: getUser(id: "2") { name }
  user3: getUser(id: "1") { name }  # Duplicate of user1!
}

Without Request Collapsing: 3 gRPC calls are made.

With Request Collapsing: Only 2 gRPC calls are made (user1 and user3 share the same response).

The Leader-Follower Pattern

Leader: The first request with a unique key executes the gRPC call
Followers: Subsequent identical requests wait for the leader’s result
Broadcast: When the leader completes, it broadcasts the result to all followers
Cleanup: The in-flight entry is removed after broadcasting

Configuration

use grpc_graphql_gateway::{Gateway, RequestCollapsingConfig};
use std::time::Duration;

let gateway = Gateway::builder()
    .with_descriptor_set_bytes(DESCRIPTORS)
    .with_request_collapsing(RequestCollapsingConfig::default())
    .add_grpc_client("service", client)
    .build()?;

Configuration Options

Option	Default	Description
`coalesce_window`	50ms	Maximum time to wait for in-flight requests
`max_waiters`	100	Maximum followers waiting for a single leader
`enabled`	true	Enable/disable collapsing
`max_cache_size`	10000	Maximum in-flight requests to track

Builder Pattern

let config = RequestCollapsingConfig::new()
    .coalesce_window(Duration::from_millis(100))  // Longer window
    .max_waiters(200)                              // More waiters allowed
    .max_cache_size(20000)                         // Larger cache
    .enabled(true);

Presets

Request collapsing comes with several presets for common scenarios:

Default (Balanced)

let config = RequestCollapsingConfig::default();
// coalesce_window: 50ms
// max_waiters: 100
// max_cache_size: 10000

Best for most workloads with a balance between latency and deduplication.

High Throughput

let config = RequestCollapsingConfig::high_throughput();
// coalesce_window: 100ms
// max_waiters: 500
// max_cache_size: 50000

Best for high-traffic scenarios where maximizing deduplication is more important than latency.

Low Latency

let config = RequestCollapsingConfig::low_latency();
// coalesce_window: 10ms
// max_waiters: 50
// max_cache_size: 5000

Best for latency-sensitive applications where quick responses are critical.

Disabled

let config = RequestCollapsingConfig::disabled();

Completely disables request collapsing.

Monitoring

You can monitor request collapsing effectiveness using the built-in statistics:

// Get the registry from ServeMux
if let Some(registry) = mux.request_collapsing() {
    let stats = registry.stats();
    println!("In-flight requests: {}", stats.in_flight_count);
    println!("Max cache size: {}", stats.max_cache_size);
    println!("Enabled: {}", stats.enabled);
}

Request Key Generation

Each request is identified by a SHA-256 hash of:

Service name - The gRPC service identifier
gRPC path - The method path (e.g., /greeter.Greeter/SayHello)
Request bytes - The serialized protobuf message

This ensures that only truly identical requests are collapsed.

Relationship with Other Features

Response Caching

Request collapsing and response caching work together:

Request Collapsing: Deduplicates concurrent identical requests
Response Caching: Caches completed responses for future requests

The typical flow is:

Check response cache → cache hit? Return cached response
Check in-flight requests → follower? Wait for leader
Execute gRPC call as leader
Broadcast result to followers
Cache response for future requests

Circuit Breaker

Request collapsing works seamlessly with the circuit breaker:

If the circuit is open, all collapsed requests fail fast together
The leader request respects circuit breaker state
Followers receive the same error as the leader

Best Practices

Start with defaults: The default configuration works well for most use cases
Monitor collapse ratio: Track how many requests are being deduplicated
- Low ratio? Requests may be too unique, consider if collapsing adds value
- High ratio? Great! You’re saving significant backend load
Tune for your workload:
- High read traffic? Use high_throughput() preset
- Real-time requirements? Use low_latency() preset
Consider request patterns:
- GraphQL queries with aliases benefit most
- Unique requests per field won’t see much benefit

Example: Full Configuration

use grpc_graphql_gateway::{Gateway, RequestCollapsingConfig, CacheConfig};
use std::time::Duration;

let gateway = Gateway::builder()
    .with_descriptor_set_bytes(DESCRIPTORS)
    // Enable response caching
    .with_response_cache(CacheConfig {
        max_size: 10_000,
        default_ttl: Duration::from_secs(60),
        stale_while_revalidate: Some(Duration::from_secs(30)),
        invalidate_on_mutation: true,
    })
    // Enable request collapsing
    .with_request_collapsing(
        RequestCollapsingConfig::new()
            .coalesce_window(Duration::from_millis(75))
            .max_waiters(150)
    )
    .add_grpc_client("service", client)
    .build()?;

Keyboard shortcuts

gRPC-GraphQL Gateway