Skip to content

Do not allow stale replicas to automatically be promoted to primary #14671

@jasontedor

Description

@jasontedor

Consider a primary shard P hosted on node p and its replica shard Q hosted on node q. If p is isolated from the cluster (e.g., through node failure, a flapping NIC, or an excessively long garbage collection pause), indexing operations can continue on q after Q is promoted to primary; these indexing operations will be acknowledged to the requesting clients. If q is subsequently isolated before p rejoins and before a new replica is assigned to another node in the cluster, the subsequent rejoining of p can currently lead to P being promoted to primary again. The indexing operations acknowledged by q will be lost.

A mechanism needs to be built to prevent the automatic promotion of a stale shard in such a scenario and instead only promote a non-stale shard to primary (if a non-stale shard is availabie). The only scenario in which a stale shard should be promoted to primary is through manual intervention by a system operator (e.g., in cases when q suffers a total hardware failure).

Relates #10933

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions