Autoscaling and Load Balancing

This guide covers setting up comprehensive autoscaling and load balancing for the gRPC GraphQL Gateway.

Overview

The gateway supports three types of scaling and load balancing:

Horizontal Pod Autoscaler (HPA) - Scales the number of pods based on metrics
Vertical Pod Autoscaler (VPA) - Adjusts resource requests/limits for pods
LoadBalancer - External load balancing for traffic distribution

Horizontal Pod Autoscaler (HPA)

HPA automatically scales the number of pods based on observed CPU, memory, or custom metrics.

Basic Configuration

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80

Custom Metrics

For advanced scaling based on custom metrics:

autoscaling:
  enabled: true
  minReplicas: 5
  maxReplicas: 50
  customMetrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"

Deployment

helm install my-gateway ./grpc-graphql-gateway \
  --set autoscaling.enabled=true \
  --set autoscaling.minReplicas=3 \
  --set autoscaling.maxReplicas=10

Monitoring HPA

# Watch HPA status
kubectl get hpa -w

# Describe HPA for detailed metrics
kubectl describe hpa my-gateway

# View current metrics
kubectl top pods -l app.kubernetes.io/name=grpc-graphql-gateway

Vertical Pod Autoscaler (VPA)

VPA automatically adjusts CPU and memory requests/limits based on actual usage.

Prerequisites

Install VPA in your cluster:

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

Configuration

verticalPodAutoscaler:
  enabled: true
  updateMode: "Auto"  # Off, Initial, Recreate, Auto
  minAllowed:
    cpu: 100m
    memory: 128Mi
  maxAllowed:
    cpu: 2000m
    memory: 2Gi
  controlledResources:
    - cpu
    - memory

Update Modes

Mode	Description	Use Case
Off	Only provides recommendations	Safe to use with HPA
Initial	Applies recommendations on pod creation only	Good for initial sizing
Recreate	Updates running pods (requires restart)	When you want automatic updates
Auto	Automatically applies recommendations	Full automation

Using VPA with HPA

⚠️ Important: VPA and HPA should not target the same metrics (CPU/Memory).

Recommended Setup:

# Use VPA in "Off" mode for recommendations
verticalPodAutoscaler:
  enabled: true
  updateMode: "Off"
  
# Use HPA for horizontal scaling
autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10

Alternative: Use VPA for CPU/Memory and HPA for custom metrics.

Viewing VPA Recommendations

# Get VPA status
kubectl describe vpa my-gateway

# View recommendations
kubectl get vpa my-gateway -o jsonpath='{.status.recommendation}'

LoadBalancer Service

LoadBalancer provides external access with cloud provider integration.

Basic Configuration

loadBalancer:
  enabled: true
  httpPort: 80
  grpcPort: 50051
  externalTrafficPolicy: Cluster

AWS Network Load Balancer

loadBalancer:
  enabled: true
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-internal: "false"
  externalTrafficPolicy: Local  # Preserve source IP
  loadBalancerSourceRanges:
    - "10.0.0.0/8"  # Restrict to VPC

Google Cloud Load Balancer

loadBalancer:
  enabled: true
  annotations:
    cloud.google.com/load-balancer-type: "Internal"
    cloud.google.com/backend-config: '{"default": "backend-config"}'
  externalTrafficPolicy: Cluster

Azure Load Balancer

loadBalancer:
  enabled: true
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "true"
  loadBalancerIP: "10.0.0.10"  # Static internal IP

External Traffic Policy

Policy	Pros	Cons
Cluster	Even load distribution across nodes	Loses source IP
Local	Preserves source IP, lower latency	May cause uneven load distribution

Complete Example

Production Deployment with All Features

# values-production.yaml
loadBalancer:
  enabled: true
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
  externalTrafficPolicy: Local
  httpPort: 80
  loadBalancerSourceRanges:
    - "0.0.0.0/0"

autoscaling:
  enabled: true
  minReplicas: 5
  maxReplicas: 50
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80

verticalPodAutoscaler:
  enabled: true
  updateMode: "Off"  # Get recommendations without conflicts
  minAllowed:
    cpu: 250m
    memory: 256Mi
  maxAllowed:
    cpu: 4000m
    memory: 4Gi

resources:
  limits:
    cpu: 2000m
    memory: 2Gi
  requests:
    cpu: 1000m
    memory: 1Gi

podDisruptionBudget:
  enabled: true
  minAvailable: 3

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
            - key: app.kubernetes.io/name
              operator: In
              values:
                - grpc-graphql-gateway
        topologyKey: kubernetes.io/hostname

Deploy:

helm install gateway ./grpc-graphql-gateway \
  -f helm/values-production.yaml \
  --namespace production \
  --create-namespace

Load Balancing Strategies

At Service Level

service:
  sessionAffinity: ClientIP  # Sticky sessions
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800  # 3 hours

At Ingress Level

ingress:
  annotations:
    # Round Robin (default)
    nginx.ingress.kubernetes.io/load-balance: "round_robin"
    
    # Least Connections
    # nginx.ingress.kubernetes.io/load-balance: "least_conn"
    
    # IP Hash
    # nginx.ingress.kubernetes.io/load-balance: "ip_hash"

Monitoring and Troubleshooting

Check Load Distribution

# View pod distribution across nodes
kubectl get pods -o wide -l app.kubernetes.io/name=grpc-graphql-gateway

# Check service endpoints
kubectl get endpoints my-gateway

# Check LoadBalancer status
kubectl get svc my-gateway-lb

Monitor Autoscaling

# Watch HPA
watch kubectl get hpa

# Monitor resource usage
kubectl top pods

# Check VPA recommendations
kubectl describe vpa my-gateway

Load Testing

# Install k6
brew install k6

# Run load test
k6 run --vus 100 --duration 5m - <<EOF
import http from 'k6/http';

export default function () {
  const query = JSON.stringify({
    query: '{ __typename }'
  });
  
  http.post('http://<loadbalancer-ip>/graphql', query, {
    headers: { 'Content-Type': 'application/json' },
  });
}
EOF

# Watch scaling in action
watch kubectl get pods,hpa

Best Practices

Start Conservative: Begin with moderate min/max replicas and adjust based on observed patterns
VPA + HPA: Use VPA in “Off” mode alongside HPA to get recommendations without conflicts
LoadBalancer: Use externalTrafficPolicy: Local when you need source IP preservation
PodDisruptionBudget: Always configure PDB to maintain availability during updates
Multi-AZ: Use pod anti-affinity to spread pods across availability zones
Gradual Rollouts: Test autoscaling in staging before production
Monitor Costs: Set reasonable maxReplicas to prevent runaway costs
Health Checks: Ensure liveness and readiness probes are properly configured

Federation with Autoscaling

For federated deployments, each subgraph can scale independently:

# Deploy user subgraph with autoscaling
helm install user-subgraph ./grpc-graphql-gateway \
  -f helm/values-federation-user.yaml \
  --set autoscaling.maxReplicas=20

# Deploy product subgraph with different scaling
helm install product-subgraph ./grpc-graphql-gateway \
  -f helm/values-federation-product.yaml \
  --set autoscaling.maxReplicas=30

gRPC-GraphQL Gateway

Autoscaling and Load Balancing

Overview

Horizontal Pod Autoscaler (HPA)

Basic Configuration

Custom Metrics

Deployment

Monitoring HPA

Vertical Pod Autoscaler (VPA)

Prerequisites

Configuration

Update Modes

Using VPA with HPA

Viewing VPA Recommendations

LoadBalancer Service

Basic Configuration

AWS Network Load Balancer

Google Cloud Load Balancer

Azure Load Balancer

External Traffic Policy

Complete Example

Production Deployment with All Features

Load Balancing Strategies

At Service Level

At Ingress Level

Monitoring and Troubleshooting

Check Load Distribution

Monitor Autoscaling

Load Testing

Best Practices

Federation with Autoscaling

Next Steps

Keyboard shortcuts

gRPC-GraphQL Gateway