What is the difference between a distributed lock and an idempotency key?

An idempotency key identifies a logical operation and lets a server return a cached response to duplicates. A distributed lock serialises concurrent access so only one thread executes the operation at a time. Both are needed: the lock prevents the race condition; the idempotency key handles the retry after the winner completes.

When should I use Redlock versus a single-instance Redis lock?

Use single-instance Redis SET NX for non-critical coordination where replica failover is acceptable. Use Redlock (quorum across 5+ independent Redis nodes) for financial transactions or any operation where a split-brain lock grant would cause double-execution.

How do I prevent lock starvation under retry storms?

Apply exponential backoff with jitter before each lock acquisition attempt, set a maximum retry budget, and implement circuit breakers at the service mesh layer so a degraded lock store does not amplify load.

Distributed Coordination & Locking Strategies

Distributed coordination is the engineering contract that ensures concurrent retries of the same logical operation reach the same terminal state exactly once, regardless of how many nodes race to execute it. It solves a gap that idempotency keys alone cannot close: the window between when a request is received and when the idempotency record is durably committed. Without explicit coordination, two threads can each observe a missing key, both decide to proceed, and both mutate downstream state — producing duplicates, double-charges, or split ledger entries that are expensive to reconcile.

This page is the engineering reference for the coordination layer. It maps the request lifecycle from load balancer to database, identifies every seam where concurrent execution can diverge, and links to the implementation-specific guides that cover each mechanism in depth.

Engineering Contract

The coordination layer owns one precise guarantee: serialised access to a shared resource across nodes that have no shared memory. It prevents the check-then-act race condition inherent in distributed systems — where the interval between check(key_exists) and insert(key) can be observed by a second thread on a different node.

The failure mode this layer prevents is phantom duplicate execution: a state transition that is believed by the system to have happened once, but was actually executed by two competing workers. In payment systems this means double settlement. In inventory systems it means overselling. In event pipelines it means duplicated downstream triggers that cascade fan-out.

The contract holds under:

Concurrent retries from the same client session (timeout + retry)
Concurrent deliveries from a message broker (at-least-once delivery semantics)
Concurrent fanout from load balancers routing the same sticky-session request to two nodes

The contract degrades under network partition when quorum cannot be reached — see the Failure Boundary Map section for the precise degradation modes.

Conceptual Architecture: Request Lifecycle Through the Coordination Layer

The diagram below traces a single logical operation — client sends a payment request, times out, and retries — through every layer that must enforce coordination. The critical path is the serialisation gate at the coordination layer before any downstream mutation occurs.

The key insight the diagram illustrates: the coordination layer must both serialise concurrent access (so two threads cannot simultaneously observe the same key as absent) and cache the committed result (so late arrivals receive the original response without re-executing). Neither mechanism alone is sufficient.

Failure Boundary Map

Each architectural layer introduces its own failure modes that the coordination contract must survive. The table below enumerates the layer, the failure class, and the minimum mitigation required.

Layer	Failure Mode	Coordination Impact	Minimum Mitigation
Load Balancer	Duplicate forwarding on upstream timeout	Same request hits two backend replicas simultaneously	Sticky-session affinity by `Idempotency-Key` hash or backend-level mutex
API Gateway	Retry-on-5xx before reaching idempotency store	Gateway amplifies in-flight requests during store degradation	Circuit breaker on store health; gateway must forward original key, not re-generate
Service Mesh	Retry policies re-enter the coordination gate	Envoy/Linkerd retries bypass gateway-layer deduplication	Propagate `Idempotency-Key` header through all mesh hops; enforce at each sidecar
Coordination Store (Redis/ZooKeeper)	Leader election lag or replica promotion during lock hold	Lock disappears from new primary before holder releases it	Use Redlock quorum (5 nodes) or ZooKeeper ephemeral znodes; set lease TTL ≤ 30 seconds
Idempotency Store (DB/KV)	Replication lag on replica read	`PENDING` key written to primary not yet visible on replica	Direct all idempotency reads to primary or use quorum reads; add 100–200 ms replication buffer before retry is accepted
Downstream Service	Non-deterministic external calls (third-party payment gateway)	External provider generates unique transaction IDs per call	Confirm provider supports its own idempotency contract; store external transaction ID on first success
Background Workers	Orphaned `PENDING` records after worker crash	Subsequent retries see `PENDING` indefinitely	Reconciliation worker with 5-minute scan interval; force-expire records older than `max_execution_ttl`

Partial failure scenarios in detail

Lock held, downstream call fails: The coordination lock is acquired, the idempotency record is set to PENDING, and the downstream mutation fails with a 503. The lock expires (TTL = 30 seconds). The next retry re-acquires the lock, finds PENDING, and must decide: re-execute or report failure. The correct answer depends on whether the downstream service is idempotent for its own retry behaviour — if it is, re-execute; if not, force the record to FAILED and require client-initiated recovery.

Replication split-brain: Two availability zones lose quorum connectivity. Both zones accept the same idempotency key for the same logical operation. Upon partition healing, reconciliation must detect the conflict (two COMPLETED records for the same key) and apply a deterministic conflict resolution rule — typically last-write-wins by wall-clock timestamp, but this requires that wall clocks are synchronised to within ≤ 100 ms (NTP + bounded drift).

Clock skew and TTL decay: An idempotency key with a 5-minute TTL is written on a node whose clock is 90 seconds ahead. On a replica with a clock 90 seconds behind, the key appears to have already expired and is evicted. The next retry finds no key and re-executes. Use monotonic logical clocks or vector timestamps for TTL enforcement in multi-node deployments.

Implementation Patterns

The coordination layer decomposes into four distinct implementation concerns. Each is covered in depth by a dedicated page in this section.

Pattern	Guarantee	Throughput Impact	Best For
Distributed Lock Acquisition Patterns	Pessimistic mutual exclusion; zero phantom duplicates while lock is held	High (head-of-line blocking under contention)	High-value single-key operations: payment authorisation, inventory reservation
Lock Timeout & Lease Management	Bounded lease TTL prevents indefinite blocking after holder crash	Medium (orphan recovery adds background I/O)	Any long-running operation where the lock holder may crash mid-execution
Preventing Race Conditions in Microservices	Optimistic concurrency via conditional writes; no blocking	Low (fails fast on conflict, retries from client)	High-throughput APIs where most requests are unique; conflicts are rare
Consensus Algorithms for Deduplication	Linearisable quorum writes; survives partial node failures	Very high latency overhead; required for multi-AZ strong consistency	Multi-AZ financial settlements; regulatory-grade audit trails

How the patterns compose

These patterns are not mutually exclusive. A production payment API typically layers them:

Gateway-level key validation rejects malformed or replayed keys before they reach the service.
Pessimistic lock acquisition (via Redis SET NX or Redlock) serialises concurrent threads targeting the same key.
Lease-bounded TTL (via lock timeout management) ensures orphaned locks expire within 30 seconds without blocking the retry queue indefinitely.
Optimistic conditional writes (via INSERT ... WHERE NOT EXISTS or DynamoDB ConditionExpression) persist the idempotency record atomically, providing a storage-level backstop if the lock is lost.
Raft/Paxos quorum acknowledgement (via consensus-based deduplication) guarantees that the committed record is visible to all nodes before the response is returned.

Trade-off Matrix

Choosing a coordination backend is a decision with hard latency, cost, and durability implications. The matrix below reflects measured production numbers, not theoretical bounds.

Backend	Consistency Model	p50 Lock Latency	p99 Lock Latency	Multi-AZ Capable	Cost Class	Recommended Workload
Single-node Redis `SET NX EX`	Eventual (single master)	< 1 ms	3–5 ms	No (failover gap)	Low	Non-critical, high-throughput deduplication where a short failover window is tolerable
Redlock (5 × independent Redis nodes)	Quorum (majority)	2–4 ms	8–15 ms	Yes	Medium	Payment authorisation; any operation where split-brain lock grant causes data loss
ZooKeeper ephemeral znodes	Sequential (ZAB protocol)	4–8 ms	20–40 ms	Yes (3+ nodes)	Medium	Long-lived leader election; saga orchestrator coordination
PostgreSQL advisory locks	Serialisable (ACID)	1–3 ms	10–30 ms	No (single writer)	Low-Medium	Low-concurrency critical paths that already use Postgres for idempotency state
etcd (Raft)	Linearisable	2–5 ms	15–30 ms	Yes (3–5 nodes)	Medium	Kubernetes-native environments; cluster-wide singleton coordination
DynamoDB conditional write	Linearisable (single-key)	5–12 ms	20–50 ms	Yes (global tables)	Medium-High	Serverless / globally distributed APIs; no lock server to operate

Latency budget rule of thumb: Budget the coordination round-trip as an additive overhead on every write path. A payment API with a 200 ms SLA and 80 ms of business logic leaves 120 ms for coordination. Redlock at p99 of 15 ms is safe; Raft-based etcd at p99 of 30 ms is safe. ZooKeeper at p99 of 40 ms starts to squeeze the budget under concurrent load.

Anti-patterns & Pitfalls

1. Unbounded lock TTL (“immortal locks”)

Setting no TTL or an excessively large TTL (> 5 minutes) on a distributed lock means that a node crash mid-execution stalls all subsequent retries indefinitely. The idempotency record remains PENDING and the downstream mutation never completes. Set a lock TTL of 1.5× the p99 execution time of the operation, with an absolute ceiling of 30 seconds. Background reconciliation workers must scan for PENDING records older than this threshold and force-terminate them.

2. Lock-free idempotency store writes

Inserting an idempotency key without first acquiring a lock (or using an atomic conditional write) means the check-then-act race condition is unprotected. Two concurrent threads both observe a cache miss, both execute the downstream mutation, and the second insert silently overwrites the first. The deduplication record exists, but two mutations occurred. Always use SET NX (Redis), INSERT ... ON CONFLICT DO NOTHING (PostgreSQL), or ConditionExpression: "attribute_not_exists(pk)" (DynamoDB) — never a plain SET or INSERT.

3. Re-generating idempotency keys on retry

A client that generates a new UUID for each retry attempt bypasses the entire coordination layer. Each retry appears as a novel request; each executes the downstream mutation. Idempotency keys must be stable across retries for the same logical operation: generated once by the client, stored client-side (e.g. in local state, the message envelope, or the job payload), and forwarded verbatim on every retry. See idempotency key generation strategies for key derivation patterns.

4. Ignoring the `PENDING` state in responses

Returning a 202 Accepted or 503 Retry-Later when the idempotency store contains a PENDING record is correct behaviour — but many clients treat any non-200 as a failure and retry immediately. Without exponential backoff with jitter on the client side, this collapses into a retry storm that saturates the lock store. Clients must honour Retry-After headers and implement bounded backoff before re-attempting a PENDING operation.

5. Treating HTTP `PUT` as coordination-free

PUT is idempotent by specification, but idempotency-by-spec is not the same as coordination-safe in practice. Two concurrent PUT requests with conflicting payloads for the same resource both satisfy the HTTP contract individually, but the last-writer-wins outcome may violate the business invariant. Resources with business-critical state (e.g. account balances, reservation slots) require the same coordination gate as POST endpoints.

6. Skipping coordination in message-broker consumers

Message brokers (Kafka, SQS, RabbitMQ) guarantee at-least-once delivery. Consumer code that directly executes side effects without coordination assumes the broker will never redeliver — an assumption that breaks during consumer group rebalances, broker failovers, and visibility timeout expiry. Apply the same lock-acquire-check-execute-release cycle in consumer handlers as in HTTP handlers. Webhook delivery has its own page covering broker-specific patterns.

Production Readiness Checklist

Every POST and PATCH endpoint enforces an Idempotency-Key header; reject requests missing the header with 400 Bad Request and a descriptive error.code: missing_idempotency_key.
All idempotency key insertions use an atomic conditional write (SET NX, INSERT ON CONFLICT, ConditionExpression); no plain overwriting SET or INSERT exists in any code path.
Every distributed lock is acquired with a TTL ≤ 30 seconds; no lock acquisition call omits the EX / PX argument.
A background reconciliation worker runs on a ≤ 5-minute interval, scans for PENDING records older than max_execution_ttl, queries downstream systems for terminal state, and forces records to COMPLETED or FAILED.
The idempotency store replication lag is monitored; alert fires if replica lag exceeds 500 ms on any node handling coordination reads.
The lock store (Redis / ZooKeeper / etcd) has a health-check circuit breaker at the service mesh layer; requests fast-fail with 503 rather than queuing against a degraded lock store.
Idempotency logs store only hashed keys and terminal state transitions — no raw PANs, authentication tokens, or PII in the deduplication record.
Chaos engineering scenarios include: lock store leader failure mid-hold; idempotency store replica promotion; clock skew injection of ± 2 seconds; concurrent 50-thread retry storms against the same key.
Observability exports all four core metrics: idempotency.hit_rate, idempotency.lock_contention_p99, idempotency.storage_write_p99, idempotency.reconciliation_backlog_size.
SRE runbook documents manual recovery for each scenario in the Failure Boundary Map table above; runbook is reviewed quarterly and after every incident involving coordination failures.

Child pages in this section:

Distributed Lock Acquisition Patterns — Pessimistic mutex strategies using Redis SET NX, Redlock, and ZooKeeper ephemeral znodes; when each is appropriate.
Lock Timeout & Lease Management — How to bound lock lifetimes, detect orphaned leases, and recover safely without risking double-execution.
Preventing Race Conditions in Microservices — Optimistic concurrency control, conditional writes, and thundering-herd mitigation for high-throughput APIs.
Consensus Algorithms for Deduplication — Raft and Paxos quorum writes for multi-AZ linearisable idempotency guarantees.

Sibling sections:

Idempotency Fundamentals & API Guarantees — The foundational contract: HTTP method semantics, key generation, retry logic, and webhook delivery guarantees.
Backend Implementation & Storage Patterns — Storage-layer implementation: Redis cache-based deduplication, database unique constraints, and transaction-scoped atomic operations.
Observability & Operations for Idempotent Systems — instrument, trace, and chaos-test the coordination layer so lock loss and stale leases surface before they cause double-execution.