Do not allow stale replicas to automatically be promoted to primary

Consider a primary shard `P` hosted on node `p` and its replica shard `Q` hosted on node `q`. If `p` is isolated from the cluster (e.g., through node failure, a flapping NIC, or an excessively long garbage collection pause), indexing operations can continue on `q` after `Q` is promoted to primary; these indexing operations will be acknowledged to the requesting clients. If `q` is subsequently isolated before `p` rejoins and before a new replica is assigned to another node in the cluster, the subsequent rejoining of `p` can currently lead to `P` being promoted to primary again. The indexing operations acknowledged by `q` will be lost.

A mechanism needs to be built to prevent the automatic promotion of a stale shard in such a scenario and instead only promote a non-stale shard to primary (if a non-stale shard is availabie). The only scenario in which a stale shard should be promoted to primary is through manual intervention by a system operator (e.g., in cases when `q` suffers a total hardware failure).

Relates #10933


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do not allow stale replicas to automatically be promoted to primary #14671

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Do not allow stale replicas to automatically be promoted to primary #14671

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions