-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Converting an Existing Data Stream to a System DataStream is Broken #121392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Hi @alexey-ivanov-es, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At https://github.com/elastic/elasticsearch/pull/121392/files#diff-1123b904a4743231e7321993f73a75ae603c6658b1e6acf6bf8656b69370a410L70 we currently only check for indices that should become system indices. Should we also check for data streams there?
It seems that if a data stream exists, it always has at least one backing index. If the data stream is non-system, the backing index will also be non-system. Then, when the data stream is marked as system, the index will be picked up by the task, and both will be converted to system ones. Does this approach make sense, or should we explicitly check for data streams as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, good question. For now that is the case, but we have talked about changing this one day but there are no concrete plans.
If I am not mistaken, this is just looping over one more data set, so I think it would be better to be "thorough" and avoid surprises in the future. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have talked about changing this one day
I'll add check for data streams here as well. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general, just a few questions. Should the Data Store team also review this?
List<DataStream> updatedDataStreams = new ArrayList<>(); | ||
for (DataStream dataStream : currentState.getMetadata().dataStreams().values()) { | ||
if (dataStream.isSystem() == false && systemIndices.isSystemDataStream(dataStream.getName())) { | ||
DataStream updatedDataStream = dataStream.copy().setSystem(true).setHidden(true).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why is there a setHidden(true)
here? Do you know if there's a reason this isn't set by default when we define a system DataStream? I'm wondering if there's ever a situation where we'd want this to be true, since the updateIndices()
function seems to test and set this separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a data stream is initially created as a non-system one, it's likely set as hidden=false
. In any case, it should become hidden=true
once it is converted to a system data stream. In the system indices upgrade process, this check is done separately, and as I understand it, this is because there is currently support for making system indices non-system again and we don't change hidden
in this case. While I'm pretty sure we don't need this functionality, it currently exists for indices.
For data streams, I don’t think we need to support converting system data streams back to non-system ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However... Let me check one thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked, and it seems there is only one case where we might need to convert an existing system index or data stream back to a non-system one - when an upgrade to a newer version starts, fails, and the cluster rolls back to the previous version where the index/data stream was non-system.
I'm not 100% sure if this is a valid scenario, but since it is already handled for indices, I've added support for data streams as well.
if (dataStream.isSystem() == false && systemIndices.isSystemDataStream(dataStream.getName())) { | ||
DataStream updatedDataStream = dataStream.copy().setSystem(true).setHidden(true).build(); | ||
|
||
updatedDataStreams.add(updatedDataStream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the indices version also updates any aliases to be hidden. Do we need to do that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified this and data stream aliases become hidden after the data stream is marked as hidden/system. Don't see an easy way to write a test for this, unfortunately
.setSystem(false) | ||
.build(); | ||
|
||
assertSystemUpgradeAppliesHiddenSettingForDataStream(dataStream, dsIndexMetadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only used in one place, should this remain separate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to check data streams as well, for completeness and to avoid any surprises in the future.
I have one more thing I would like to to discuss. If I understand correctly the code, now we check if a index is a system index or a backing index of a system data stream. But to check the backing index of a data stream we check if it starts with the .ds-<data-stream-name>
prefix. Right?
This means that a failure index will not be detected as associated with the system data stream, and a backing index that is not following this naming convention will also not be detected.
I understand it's unlikely we will encounter these cases in the wild but it is possible.
I would like to suggest the following: what if we would update only standalone backing indices in updateIndices
and then in updateDataStreams
we would loop over all the associated indices of a data stream and convert them to a system index and vice versa, if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, good question. For now that is the case, but we have talked about changing this one day but there are no concrete plans.
If I am not mistaken, this is just looping over one more data set, so I think it would be better to be "thorough" and avoid surprises in the future. What do you think?
Yes, you're right.
Good point. I need to verify this, but at first glance, it looks like an oversight in the overall system data streams implementation. This could have implications not just here but also in security (and possibly other areas) as well.
Is it possible? It looks like there are quite a few places in the code outside of
I'd prefer not to exclude indices backing system data streams from |
Yes it is, a user is able to modify the backing indices of a data stream and add or remove backing indices. It can be considered an expert move but it's possible. Furthermore, there are lifecycle steps that add prefixes to the indices so this can also have the same effect.
I think this way of thinking is error prone and is the reason we of the bug you are fixing here. What I mean is, the descriptor matches the data stream and not the backing indices themselves that's why you need to rely on a convention to determine this. While if you consider that the system resource is the data stream, and it's the responsibility of the This is more inline with how we use data streams in other places of the code as well. Of course system indices is your area of expertise so maybe there something I do not see, I am just explaining how we handle data streams in other areas of the code. |
I will make the change here, that's not a problem. The bigger issue is that many other parts of the codebase rely on system data stream backing indices name starting with
Is there a chance we could make an index name pattern that covers all these cases? Of course, it wouldn't work for indices added manually by users, but since users aren't supposed to have access to system indices anyway, it should still work |
I did a very quick code search for where is the constant
What would be the benefit of that? We could potentially cook something but it's more of hack than a solution. I checked (just a quick check) where |
The issue comes from this method, which is used in multiple places within the system indices logic. Eventually it affects this method (probably easier to fix) and this method used in security plugin (looks harder to fix, but I am new to the team, so might be completely wrong about the complexity)
I agree that it's a hack, and I don’t think system indices logic should rely on the naming of indices backing data streams. However, that is how it currently works, and this approach would allow us to implement a quick fix that's at least slightly more reliable than the current method, where the pattern is hardcoded into the system indices logic. And thank you for your help here. |
I see, we need to do a more thorough check and see what can be done, I did not go that far :)
Considering that we do not have that many system data streams, I think this buys us some time to think this through. Coming up with prefixes is not that easy. For example, I can think of the following prefixes, this list might not be exhaustive:
The last four can be combined depending on the steps that an index has been through. I propose the following:
What do you think?
Happy to, I am getting to know better this use case. Thank you for thinking along with me. |
I agree with your proposal. I'll make the changes in this PR and create a ticket to track the issue so we can prioritize and discuss the right approach. Thank you! |
// Fork to the management pool to avoid blocking the cluster applier thread unnecessarily for very large index counts | ||
// TODO: we should have a more efficient way of getting just the changed indices so that we don't have to fork here | ||
clusterService.threadPool().executor(ThreadPool.Names.MANAGEMENT).execute(new AbstractRunnable() { | ||
@Override | ||
protected void doRun() { | ||
if (triggeredVersion != triggerV) { | ||
// don't run if another newer check task was triggered already | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, this condition could lead to missing indices. If two cluster change events trigger the checks (state1 -> state2 -> state3), there’s a possibility that the first check would return here, and the second check wouldn’t see all indices. This happens because it compares state3 against state2, meaning any differences between state1 and state2 wouldn’t be checked.
Ticket to stop to rely on data streams naming: ES-10870 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not fully familiar with the usage of ClusterStateTaskExecutor
, though overall this looks good to me.
I have some questions, though they aren't blocking. I'll leave it to your judgement.
It would be good to get an approval from Data Management prior to merging.
Nice work, this is complex.
if (dataStream.isSystem() != shouldBeSystem) { | ||
DataStream.Builder dataStreamBuilder = dataStream.copy().setSystem(shouldBeSystem); | ||
if (shouldBeSystem) { | ||
dataStreamBuilder.setHidden(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the updateIndexIfNecessary
function updates the versions with this:
builder.settingsVersion(builder.settingsVersion() + 1);
Do we likewise need to update the generation
here? (I'm not suggesting we do, rather, wanting to be sure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, good question, we update generations when the backing or failure indices get modified (added, deleted, etc) but I do not think the update of system property qualifies as such. @dakrone do you agree?
ClusterService clusterService = mock(ClusterService.class); | ||
MasterServiceTaskQueue<ClusterStateTaskListener> queue = mock(MasterServiceTaskQueue.class); | ||
when(clusterService.createTaskQueue(eq("system-indices-metadata-upgrade"), eq(Priority.NORMAL), any())).thenAnswer(invocation -> { | ||
executor = invocation.getArgument(2, ClusterStateTaskExecutor.class); | ||
return queue; | ||
}); | ||
doAnswer(invocation -> { | ||
task = invocation.getArgument(1, ClusterStateTaskListener.class); | ||
return null; | ||
}).when(queue).submitTask(any(), any(), any()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm rather impartial, though the general consensus here seems to be to avoid mocks whenever possible. Is it possible/practical to avoid them here? If not, no worries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are quite a few tests using mocks, so at least I'm not alone :)
I haven't heard of a consensus to avoid mocks, and I think most tests end up not using them because there are often too many dependencies interacting with each other using complex objects, which makes mock-based tests fragile and harder to maintain. That said, if there's a decision to avoid mocks, let me know. Personally, I prefer using mocks where practical rather than exposing internal implementation details.
…lastic#121392) Adds support of converting existing data stream to a system data stream as part of existing system_index_metadata_upgrade_service task
…lastic#121392) Adds support of converting existing data stream to a system data stream as part of existing system_index_metadata_upgrade_service task
…lastic#121392) Adds support of converting existing data stream to a system data stream as part of existing system_index_metadata_upgrade_service task
Adds support of converting existing data stream to a system data stream as part of existing system_index_metadata_upgrade_service task
Closes ES-10526