Skip to content

Remove INDEX_REFRESH_BLOCK after index becomes searchable #120807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Jan 24, 2025

This commit enhances the ShardStartedClusterStateTaskExecutor by
introducing functionality to automatically remove the
INDEX_REFRESH_BLOCK once an index becomes searchable.

The change ensures search availability by checking that at least one
copy of each searchable shard is available whenever an unpromotable
shard is started. Once this condition is met, the INDEX_REFRESH_BLOCK
is removed.

Closes ES-10278

This commit enhances the ShardStartedClusterStateTaskExecutor
by introducing functionality to automatically remove the
INDEX_REFRESH_BLOCK once an index becomes searchable.

The change ensures search availability by checking that at
least one copy of each searchable shard is available whenever
an unpromotable shard is started.
Once this condition is met, the INDEX_REFRESH_BLOCK is removed.

Closes ES-10278
@fcofdez fcofdez added >enhancement :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. Team:Distributed Indexing Meta label for Distributed Indexing team labels Jan 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@fcofdez
Copy link
Contributor Author

fcofdez commented Jan 24, 2025

I'll enhance the integration tests in a separate PR.

@elasticsearchmachine
Copy link
Collaborator

Hi @fcofdez, I've created a changelog YAML for you.

ClusterBlocks.Builder clusterBlocksBuilder = null;
for (Index indexWithUnpromotableShardsStarted : indicesWithUnpromotableShardsStarted) {
String indexName = indexWithUnpromotableShardsStarted.getName();
assert clusterState.blocks().hasIndexBlock(indexName, INDEX_REFRESH_BLOCK);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
assert clusterState.blocks().hasIndexBlock(indexName, INDEX_REFRESH_BLOCK);
assert clusterState.blocks().hasIndexBlock(indexName, INDEX_REFRESH_BLOCK) : indexWithUnpromotableShardsStarted;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b766d8e

@@ -776,6 +789,35 @@ public ClusterState execute(BatchExecutionContext<StartedShardUpdateTask> batchE
return maybeUpdatedState;
}

private ClusterState maybeRemoveIndexRefreshBlocks(
ClusterState clusterState,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we document that the cluster state here is already updated with the STARTED state of search shards?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b766d8e

@fcofdez fcofdez requested a review from tlrx January 27, 2025 16:14
@fcofdez
Copy link
Contributor Author

fcofdez commented Jan 27, 2025

@elasticmachine update branch

@fcofdez
Copy link
Contributor Author

fcofdez commented Jan 27, 2025

@elasticmachine update branch

@fcofdez
Copy link
Contributor Author

fcofdez commented Jan 28, 2025

@elasticmachine update branch

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fcofdez
Copy link
Contributor Author

fcofdez commented Jan 28, 2025

@elasticmachine update branch

@@ -760,7 +772,10 @@ public ClusterState execute(BatchExecutionContext<StartedShardUpdateTask> batchE
maybeUpdatedState = ClusterState.builder(maybeUpdatedState).metadata(metadataBuilder).build();
}

maybeUpdatedState = maybeRemoveIndexRefreshBlocks(maybeUpdatedState, indicesWithUnpromotableShardsStarted);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this going in for now, it is a good improvement as is.

I wonder though if we should follow-up with a change to remove the refresh block in a subsequent cluster state update. That would avoid problems with a coordinator not seeing the shard started message but an indexing node seeing the refresh block removed, which I think could cause a refresh to be ack'ed and a subsequent search not seeing the refreshed data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would avoid problems with a coordinator not seeing the shard started message but an indexing node seeing the refresh block removed, which I think could cause a refresh to be ack'ed and a subsequent search not seeing the refreshed data?

I believe that there's the same risk with the deferred removal, right? The cluster state update with the unblocked index might be applied a bit later on the search node anyway, unless you're proposing to react to the shard started events differently on the search nodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the shard is marked as started as soon as it is ready to serve searches, so that should be fine too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the last part, I think we would let the coordinator of the search assume an empty result when refresh is blocked?

Maybe you are suggesting to only do so for unassigned shards, whereas we could send the shard search request to the initializing shard, which could then make the determination. I think that would be safe too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the first part, I would rely on cluster state updates trying hard to be applied everywhere before proceeding with the next update. This means we would be sure to see the shard started everywhere before the refresh block is removed.

Implicit in that is that I thought we would then forward search requests always when there is a started shard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could send the shard search request to the initializing shard, which could then make the determination. I think that would be safe too.

This would come with the downside of risking sending search requests to search nodes where the shard is still unassigned. I think that is possible to handle.

But it seems there are two options and the trade-off is not entirely clear.

(but as mentioned I am ok with doing what is in this PR for now and then let us tackle that other piece).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(but as mentioned I am ok with doing what is in this PR for now and then let us tackle that other piece).

Sounds good, I'll open a follow up ticket for this.

@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Jan 28, 2025
@fcofdez
Copy link
Contributor Author

fcofdez commented Jan 29, 2025

@elasticmachine update branch

@fcofdez fcofdez merged commit ae0f1a6 into elastic:main Jan 29, 2025
16 checks passed
fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Jan 31, 2025
…astic#120807)"

This reverts commit ae0f1a6.

The refresh block would be removed in a subsequent cluster state
update instead of removing it immediately after an index is ready
for searches.

Closes ES-10697
fcofdez added a commit that referenced this pull request Feb 4, 2025
…20807)" (#121427)

This reverts commit ae0f1a6.

The refresh block would be removed in a subsequent cluster state
update instead of removing it immediately after an index is ready
for searches.

Closes ES-10697
fzowl pushed a commit to voyage-ai/elasticsearch that referenced this pull request Feb 4, 2025
…astic#120807)" (elastic#121427)

This reverts commit ae0f1a6.

The refresh block would be removed in a subsequent cluster state
update instead of removing it immediately after an index is ready
for searches.

Closes ES-10697
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement serverless-linked Added by automation, don't add manually Team:Distributed Indexing Meta label for Distributed Indexing team v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants