-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Remove INDEX_REFRESH_BLOCK after index becomes searchable #120807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove INDEX_REFRESH_BLOCK after index becomes searchable #120807
Conversation
This commit enhances the ShardStartedClusterStateTaskExecutor by introducing functionality to automatically remove the INDEX_REFRESH_BLOCK once an index becomes searchable. The change ensures search availability by checking that at least one copy of each searchable shard is available whenever an unpromotable shard is started. Once this condition is met, the INDEX_REFRESH_BLOCK is removed. Closes ES-10278
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
I'll enhance the integration tests in a separate PR. |
Hi @fcofdez, I've created a changelog YAML for you. |
…thub.com:fcofdez/elasticsearch into clear-refresh-block-once-unpromotables-available
server/src/main/java/org/elasticsearch/cluster/action/shard/ShardStateAction.java
Show resolved
Hide resolved
ClusterBlocks.Builder clusterBlocksBuilder = null; | ||
for (Index indexWithUnpromotableShardsStarted : indicesWithUnpromotableShardsStarted) { | ||
String indexName = indexWithUnpromotableShardsStarted.getName(); | ||
assert clusterState.blocks().hasIndexBlock(indexName, INDEX_REFRESH_BLOCK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
assert clusterState.blocks().hasIndexBlock(indexName, INDEX_REFRESH_BLOCK); | |
assert clusterState.blocks().hasIndexBlock(indexName, INDEX_REFRESH_BLOCK) : indexWithUnpromotableShardsStarted; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in b766d8e
server/src/main/java/org/elasticsearch/cluster/action/shard/ShardStateAction.java
Outdated
Show resolved
Hide resolved
@@ -776,6 +789,35 @@ public ClusterState execute(BatchExecutionContext<StartedShardUpdateTask> batchE | |||
return maybeUpdatedState; | |||
} | |||
|
|||
private ClusterState maybeRemoveIndexRefreshBlocks( | |||
ClusterState clusterState, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we document that the cluster state here is already updated with the STARTED
state of search shards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in b766d8e
server/src/main/java/org/elasticsearch/cluster/action/shard/ShardStateAction.java
Show resolved
Hide resolved
…nce-unpromotables-available
@elasticmachine update branch |
@elasticmachine update branch |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine update branch |
@@ -760,7 +772,10 @@ public ClusterState execute(BatchExecutionContext<StartedShardUpdateTask> batchE | |||
maybeUpdatedState = ClusterState.builder(maybeUpdatedState).metadata(metadataBuilder).build(); | |||
} | |||
|
|||
maybeUpdatedState = maybeRemoveIndexRefreshBlocks(maybeUpdatedState, indicesWithUnpromotableShardsStarted); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with this going in for now, it is a good improvement as is.
I wonder though if we should follow-up with a change to remove the refresh block in a subsequent cluster state update. That would avoid problems with a coordinator not seeing the shard started message but an indexing node seeing the refresh block removed, which I think could cause a refresh to be ack'ed and a subsequent search not seeing the refreshed data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would avoid problems with a coordinator not seeing the shard started message but an indexing node seeing the refresh block removed, which I think could cause a refresh to be ack'ed and a subsequent search not seeing the refreshed data?
I believe that there's the same risk with the deferred removal, right? The cluster state update with the unblocked index might be applied a bit later on the search node anyway, unless you're proposing to react to the shard started events differently on the search nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the shard is marked as started as soon as it is ready to serve searches, so that should be fine too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the last part, I think we would let the coordinator of the search assume an empty result when refresh is blocked?
Maybe you are suggesting to only do so for unassigned shards, whereas we could send the shard search request to the initializing shard, which could then make the determination. I think that would be safe too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the first part, I would rely on cluster state updates trying hard to be applied everywhere before proceeding with the next update. This means we would be sure to see the shard started everywhere before the refresh block is removed.
Implicit in that is that I thought we would then forward search requests always when there is a started shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could send the shard search request to the initializing shard, which could then make the determination. I think that would be safe too.
This would come with the downside of risking sending search requests to search nodes where the shard is still unassigned. I think that is possible to handle.
But it seems there are two options and the trade-off is not entirely clear.
(but as mentioned I am ok with doing what is in this PR for now and then let us tackle that other piece).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(but as mentioned I am ok with doing what is in this PR for now and then let us tackle that other piece).
Sounds good, I'll open a follow up ticket for this.
@elasticmachine update branch |
…astic#120807)" This reverts commit ae0f1a6. The refresh block would be removed in a subsequent cluster state update instead of removing it immediately after an index is ready for searches. Closes ES-10697
…astic#120807)" (elastic#121427) This reverts commit ae0f1a6. The refresh block would be removed in a subsequent cluster state update instead of removing it immediately after an index is ready for searches. Closes ES-10697
This commit enhances the ShardStartedClusterStateTaskExecutor by
introducing functionality to automatically remove the
INDEX_REFRESH_BLOCK once an index becomes searchable.
The change ensures search availability by checking that at least one
copy of each searchable shard is available whenever an unpromotable
shard is started. Once this condition is met, the INDEX_REFRESH_BLOCK
is removed.
Closes ES-10278