Skip to content

Disable logging in ClusterFormationFailureHelper on shutdown. #125244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 20, 2025

Conversation

JeremyDahlgren
Copy link
Contributor

@JeremyDahlgren JeremyDahlgren commented Mar 19, 2025

Modifies Coordinator to enable logging in ClusterFormationFailureHelper when started and disables logging in ClusterFormationFailureHelper when stopped. The warning scheduler handling and invariant check in the Coordinator are left as is, with the logging boolean set independently, eliminating the need to hold the mutex in doStop() when Coordinator.stop() is called when the Node is shutdown.

Closes #105559.

Modifies Coordinator to enable logging in ClusterFormationFailureHelper
when started and disables logging in ClusterFormationFailureHelper when
stopped.  The warning scheduler handling and invariant check in the
Coordinator are left as is, with the logging boolean set independently,
eliminating the need to hold the mutex in doStop() when Coordinator.stop()
is called when the Node is shutdown.

Closes elastic#105559.
Comment on lines +97 to +98
public void setLoggingEnabled(boolean enabled) {
this.loggingEnabled = enabled;
Copy link
Contributor Author

@JeremyDahlgren JeremyDahlgren Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially it seemed that a

    synchronized(mutex) {
        clusterFormationFailureHelper.stop();
    }

block in Coordinator::doStop() would be sufficient, but besides needing to hold the lock when the Node is being shut down, it seems a race condition could exist where the failure helper is stopped on shutdown but another thread immediately calls becomeCandidate() and starts it back up?

Also if calls to ClusterFormationFailureHelper start() and stop() were not synchronized it looks like there is the potential for a NullPointerException in start() if the warningScheduler is set to null between the assignment of a new WarningScheduler instance and the next line that calls scheduleNextWarning (Intellij warns about this):

    public void start() {
        assert warningScheduler == null;
        warningScheduler = new WarningScheduler();
        warningScheduler.scheduleNextWarning();
    }

    public void stop() {
        warningScheduler = null;
    }

@JeremyDahlgren JeremyDahlgren added Team:Distributed Coordination Meta label for Distributed Coordination team >bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Mar 19, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @JeremyDahlgren, I've created a changelog YAML for you.

@JeremyDahlgren JeremyDahlgren marked this pull request as ready for review March 19, 2025 18:26
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat solution, LGTM

@JeremyDahlgren JeremyDahlgren merged commit a3dc918 into elastic:main Mar 20, 2025
17 checks passed
afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Mar 21, 2025
…c#125244)

Modifies Coordinator to enable logging in ClusterFormationFailureHelper
when started and disables logging in ClusterFormationFailureHelper when
stopped.  The warning scheduler handling and invariant check in the
Coordinator are left as is, with the logging boolean set independently,
eliminating the need to hold the mutex in doStop() when Coordinator.stop()
is called when the Node is shutdown.

Closes elastic#105559.

* Update docs/changelog/125244.yaml
smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Mar 21, 2025
…c#125244)

Modifies Coordinator to enable logging in ClusterFormationFailureHelper
when started and disables logging in ClusterFormationFailureHelper when
stopped.  The warning scheduler handling and invariant check in the
Coordinator are left as is, with the logging boolean set independently,
eliminating the need to hold the mutex in doStop() when Coordinator.stop()
is called when the Node is shutdown.

Closes elastic#105559.

* Update docs/changelog/125244.yaml
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
…c#125244)

Modifies Coordinator to enable logging in ClusterFormationFailureHelper
when started and disables logging in ClusterFormationFailureHelper when
stopped.  The warning scheduler handling and invariant check in the
Coordinator are left as is, with the logging boolean set independently,
eliminating the need to hold the mutex in doStop() when Coordinator.stop()
is called when the Node is shutdown.

Closes elastic#105559.

* Update docs/changelog/125244.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ClusterFormationFailureHandler shouldn't log anything once the node is stopping
3 participants