How many Redis nodes does Redlock require?

Redlock requires an odd number of independent Redis nodes — typically five — deployed across distinct failure domains. A majority quorum of at least three nodes must grant the lock for acquisition to succeed.

What TTL should I set for Redlock locks?

Set TTL = max_processing_time_ms + p99_network_latency_ms + 200 ms safety margin, rounded up to the nearest 100 ms. For payment transactions averaging 300 ms with 50 ms p99 latency, a 600 ms TTL is a safe starting point.

Does Redlock guarantee exactly-once processing?

Redlock provides a probabilistic mutual-exclusion guarantee, not strict linearizability. Under normal conditions it prevents concurrent duplicate execution, but clock skew or GC pauses can cause premature lock expiry. Pair it with an idempotent upsert or two-phase commit for financial correctness.

Implementing Redlock for High-Availability Deduplication

Part of: Distributed Lock Acquisition Patterns

This runbook is for backend engineers who need to prevent duplicate execution of identical requests across horizontally scaled, stateless services — payment processors, webhook consumers, and API gateway ingress handlers — where a single Redis node provides insufficient fault tolerance. You should already be comfortable with distributed lock acquisition patterns and understand the consistency trade-offs covered in Distributed Coordination & Locking Strategies.

Prerequisites: five independent Redis instances (standalone, not Cluster mode), a working understanding of idempotency key generation, and a runtime in Go, Node.js, or Java.

How Redlock Quorum Acquisition Works

The diagram below shows the full lock lifecycle across five Redis nodes for a single payment request.

The key insight is the validity window calculation in step ③. Even after acquiring three nodes, the effective lease duration shrinks by the round-trip time and a drift factor (typically 0.01 × TTL). If the remaining validity is zero or negative, the lock must be released immediately and the request retried after jitter.

Step-by-Step Implementation

Step 1: Provision Five Independent Redis Nodes

Redlock requires five standalone Redis instances spread across distinct failure domains — separate availability zones, racks, or physical hosts. Do not use Redis Cluster mode for Redlock; clustered Redis shares coordination state and defeats the quorum independence guarantee.

Validate isolation before writing a line of application code:

# From your application host, confirm five distinct IPs respond independently
for host in redis1:6379 redis2:6379 redis3:6379 redis4:6379 redis5:6379; do
  redis-cli -u "redis://$host" PING
done
# Expected: five PONG responses; any timeout signals a provisioning gap

Step 2: Design Deterministic Idempotency Keys

Lock keys must map identically for semantically equivalent requests regardless of header ordering, whitespace, or timestamp fields. Canonicalize before hashing:

import hashlib, json, hmac

def idempotency_key(payload: dict, client_id: str, method: str,
                    path: str, api_version: str) -> str:
    # Strip volatile fields before hashing
    stable = {k: v for k, v in payload.items()
              if k not in ("timestamp", "trace_id", "request_id")}
    canonical = json.dumps(stable, sort_keys=True, separators=(",", ":"))
    fingerprint = "|".join([canonical, client_id, method.upper(),
                             path.lower(), api_version])
    return "dedup:" + client_id + ":" + api_version + ":" + \
           hashlib.sha256(fingerprint.encode()).hexdigest()

Namespace keys as dedup:{tenant}:{env}:{version}:{hash} to isolate collision domains per tenant and deployment environment. A key length of 64 hex characters (256 bits) provides sufficient entropy against accidental collision.

Step 3: Acquire the Lock — Go

package dedup

import (
    "context"
    "time"

    "github.com/go-redsync/redsync/v4"
    "github.com/go-redsync/redsync/v4/redis/goredis/v9"
    goredislib "github.com/redis/go-redis/v9"
)

var nodes = []string{
    "redis1:6379", "redis2:6379", "redis3:6379",
    "redis4:6379", "redis5:6379",
}

func NewRedlock() *redsync.Redsync {
    var pools []redsync.Pool
    for _, addr := range nodes {
        client := goredislib.NewClient(&goredislib.Options{
            Addr:         addr,
            DialTimeout:  80 * time.Millisecond,
            ReadTimeout:  80 * time.Millisecond,
            WriteTimeout: 80 * time.Millisecond,
            MinIdleConns: 5,
            PoolSize:     50,
            TLSConfig:    tlsConfig(), // enforce TLS 1.2+
        })
        pools = append(pools, goredislib.NewPool(client))
    }
    return redsync.New(pools...)
}

func AcquireDedup(ctx context.Context, rs *redsync.Redsync,
    key string, ttl time.Duration) (*redsync.Mutex, error) {

    mu := rs.NewMutex(key,
        redsync.WithExpiry(ttl),
        redsync.WithTries(3),
        redsync.WithRetryDelay(200*time.Millisecond),
        redsync.WithDriftFactor(0.01),
        redsync.WithQuorum(3), // majority of 5
    )
    if err := mu.LockContext(ctx); err != nil {
        return nil, err // caller returns 409 or queues for retry
    }
    return mu, nil
}

Each redis-cli call uses an 80 ms socket timeout so a slow node fails fast rather than blocking quorum evaluation.

Step 4: Acquire the Lock — Node.js

import { createClient } from "@redis/client";
import Redlock from "redlock";

const clients = [
  "redis://redis1:6379", "redis://redis2:6379", "redis://redis3:6379",
  "redis://redis4:6379", "redis://redis5:6379",
].map(url => {
  const c = createClient({ url, socket: { connectTimeout: 80, commandTimeout: 80 } });
  c.connect();
  return c;
});

export const redlock = new Redlock(clients, {
  driftFactor: 0.01,   // validity = ttl - elapsed - driftFactor * ttl
  retryCount: 3,
  retryDelay: 200,     // ms
  retryJitter: 100,    // ms — prevents thundering herd on contention
  automaticExtensionThreshold: 500, // ms before expiry to renew
});

export async function withDedup(key, ttlMs, fn) {
  const lock = await redlock.acquire([key], ttlMs);
  try {
    return await fn();
  } finally {
    await lock.release();
  }
}

The automaticExtensionThreshold setting activates the built-in watchdog: the library renews the lock 500 ms before expiry while fn() is still running, preventing premature expiry during slow downstream calls.

Step 5: Acquire the Lock — Java

import org.redisson.Redisson;
import org.redisson.api.RLock;
import org.redisson.api.RedissonClient;
import org.redisson.config.Config;

public class RedlockClient {
    private final RedissonClient redisson;

    public RedlockClient(String[] redisAddresses) {
        Config config = new Config();
        // MultiLock spans five independent single-server configs
        config.useReplicatedServers()
              .addNodeAddress(redisAddresses)
              .setConnectTimeout(80)
              .setTimeout(80)
              .setRetryAttempts(3)
              .setRetryInterval(200)
              .setPingConnectionInterval(1000)
              .useSsl(true);
        this.redisson = Redisson.create(config);
    }

    public boolean tryAcquire(String key, long waitMs, long leaseSec) {
        RLock lock = redisson.getLock(key);
        try {
            // waitTime = 0 for fire-and-forget deduplication; caller handles 409
            return lock.tryLock(waitMs, leaseSec * 1000L,
                                java.util.concurrent.TimeUnit.MILLISECONDS);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            return false;
        }
    }
}

Step 6: Atomic Lock Release via Lua

Never use a plain DEL. The Lua script below makes the ownership check and eviction atomic, preventing a slow client from deleting a lock that a faster successor already acquired:

-- release.lua — execute with EVAL release.lua 1 <key> <token>
if redis.call("GET", KEYS[1]) == ARGV[1] then
  return redis.call("DEL", KEYS[1])
else
  return 0
end

Run this script on all five nodes regardless of which ones granted the lock. Nodes that never held the lock will return 0, which is safe to ignore.

Step 7: Configure TTLs

A safe TTL formula:

TTL_ms = max_processing_time_ms + p99_network_latency_ms + 200

For a payment transaction averaging 300 ms end-to-end with a 50 ms p99 Redis latency, set TTL = 600 ms. Round up to the nearest 100 ms. Avoid static TTLs below 300 ms — GC pauses on JVM runtimes can consume 150–200 ms silently.

Verification & Testing

Simulate a duplicate request in isolation:

# Terminal 1: acquire and hold for 10 seconds
redis-cli SET "dedup:test:v1:abc123" "token-A" NX PX 10000

# Terminal 2: attempt duplicate acquisition — must fail (nil)
redis-cli SET "dedup:test:v1:abc123" "token-B" NX PX 10000
# Expected: (nil)  ← deduplication is working

# Inspect TTL remaining
redis-cli PTTL "dedup:test:v1:abc123"

Simulate quorum failure (two of five nodes down):

# Stop two Redis nodes
docker stop redis4 redis5

# Attempt lock acquisition — should succeed (3/5 quorum still met)
# ... run your application's acquire path ...

# Stop a third node — now only 2/5 reachable; acquisition must fail
docker stop redis3
# Expected: redlock raises LockNotObtainedError / returns false

Verify watchdog renewal: set a TTL of 2000 ms, hold the lock for 5 seconds, and confirm via PTTL on each node that the expiry resets every ~600 ms. If PTTL counts down to zero before the 5-second hold completes, the watchdog is misconfigured.

Verify atomic release: acquire with token-A, attempt the Lua release with token-B (wrong token), and confirm the key still exists:

redis-cli EVAL "if redis.call('GET',KEYS[1])==ARGV[1] then return redis.call('DEL',KEYS[1]) else return 0 end" 1 "dedup:test:v1:abc123" "token-B"
# Expected: (integer) 0  ← key untouched
redis-cli EXISTS "dedup:test:v1:abc123"
# Expected: (integer) 1

Failure Scenarios & Debugging

Failure Scenario	Remediation Steps	Observability Hooks
Quorum loss: 3 of 5 nodes unreachable; all acquisitions fail	Activate circuit breaker routing requests to a fallback idempotency cache (e.g. a PostgreSQL `idempotency_keys` table with a unique index). Return `503` with `Retry-After: 5` rather than dropping requests. Restore quorum within 60 seconds to prevent cache divergence.	`lock_quorum_failure_total` counter; alert if > 5/min. Log `nodes_reached`, `nodes_required`, `key` on every failed acquisition.
Clock drift > 50 ms between Redis nodes; locks expire before processing completes	Run `chronyd` on all Redis hosts with `maxpoll 6`; add a pre-acquisition clock-drift check using `TIME` against each node. Reject acquisition if any node’s reported time diverges > 50 ms from the client’s monotonic clock.	`redis_clock_drift_ms` gauge per node; alert if > 50 ms. Include `node_time_delta_ms` in acquisition log.
GC pause (JVM/Go) causes watchdog thread to miss renewal window; lock expires mid-transaction	Use bounded thread pools for the watchdog (minimum 2 threads with priority `MAX-1`). Implement a dead-letter queue (DLQ) for requests whose locks expired mid-flight; replay from DLQ after idempotent state reconciliation. Log `renewal_failed: true` and gate downstream DB commits on lock ownership re-validation.	`lock_renewal_failure_total` counter; `lease_remaining_ms` histogram. Alert if `lease_remaining_ms` at renewal attempt < 100 ms.
Stale lock from crashed worker blocks legitimate retries	Set lock TTL to match the maximum realistic processing window — never open-ended. Deploy a background sweeper that runs `SCAN` every 30 seconds and reconciles locks whose associated worker PIDs are no longer alive. Log orphaned lock keys to a structured audit trail.	`orphaned_lock_count` gauge; `lock_age_ms` at sweep time. Page on-call if `orphaned_lock_count` > 0 for > 60 s.
Idempotency key collision across tenants	Namespace keys as `dedup:{tenant}:{env}:{api_version}:{sha256}`. Validate namespace prefix in the acquisition layer before any Redis call; reject malformed keys with `400 Bad Request`.	Log `key_namespace`, `tenant_id`, `api_version` on every acquisition. Alert on unexpected namespace patterns.

SRE / Observability Checklist

Instrument these six signals for every Redlock deployment:

lock_acquisition_latency_ms (histogram, p50/p95/p99) — alert if p95 > 50 ms. Tag by service, endpoint, api_version.
lock_quorum_success_rate (gauge, rolling 5-minute window) — alert if < 99.9%. Break down by nodes_reached to identify which node is degraded.
lock_lease_renewal_failure_total (counter) — alert if > 10/min. Include lock_id, lease_remaining_ms, error fields in the log entry.
idempotency_hit_ratio (gauge: hits / (hits + misses)) — baseline during low traffic; alert on > 2× spike, which indicates a retry storm or misconfigured client.
OpenTelemetry span: redlock.acquire → redlock.hold → redlock.renew → redlock.release — propagate lock_id, quorum_nodes, validity_ms, and idempotency_key as span attributes to correlate lock state with downstream DB commits.
redis_memory_fragmentation_ratio per node — alert if > 1.5. High fragmentation suggests expired lock keys are not being reclaimed; tune maxmemory-policy allkeys-lru and confirm TTLs are set on every key.

Emit structured JSON logs at the acquisition boundary:

{
  "event": "lock_acquired",
  "lock_id": "dedup:payments:prod:v2:a3f9...",
  "quorum_nodes": 3,
  "validity_ms": 548,
  "idempotency_key": "a3f9c2...",
  "client_id": "svc-payment-worker-7d8b",
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736"
}

Distributed Lock Acquisition Patterns — parent page covering the full acquisition pattern landscape, fencing tokens, and contention handling.
Handling Stale Locks in Distributed Systems — deep dive into TTL calibration, watchdog thread design, and recovering from orphaned locks.
Mitigating Thundering Herd During Retry Storms — techniques to prevent mass concurrent lock contention when clients retry after a transient outage.
Using Redis SET NX for Distributed Request Deduplication — single-node Redis deduplication as a simpler alternative when cross-AZ quorum is not required.