Skip to content

Ignored parent's readyPostcondition by dependent #2885

@ds-akloskowski

Description

@ds-akloskowski

Bug Report

What did you do?

I set up a workflow as in example below (just to show dependencies between resources).

@Workflow(dependents = {
    @Dependent(type = AlphaSecretDependent.class,
        name = "Alpha",
        readyPostcondition = AlphaSecretReadyCondition.class,
        useEventSourceWithName = "alpha-source"
    ),
    @Dependent(type = BetaSecretDependent.class,
        name = "Beta",
        readyPostcondition = BetaSecretReadyCondition.class,
        useEventSourceWithName = "beta-source"
    ),
    @Dependent(type = GammaJWTDependent.class,
        name = "Gamma",
        dependsOn = {"Alpha", "Beta"}
    )
})

AlphaSecretDependent and BetaSecretDependent are secrets made to be created outside an operator and further used by GammaJWTDependent to produce JWT. Ready post conditions check if resources met requirements like if keys exist in secrets.

The workflow above I used to make a test in which BetaSecretDependent is not provided and BetaSecretReadyCondition was always false to check if code to search secrets worked properly and if GammaJWTDependent is not reconciled.

What did you expect to see?

I expected to see in test that GammaJWTDependent was not reconciled and there was no attempt to reconcile it.

What did you see instead? Under which circumstances?

In a few cases, looking non-deterministic, there was an attempt to reconcile GammaJWTDependent. Looked like BetaSecretReadyCondition was true (but wasn't) or ignored, but it was more like a race condition I believe. (More about it in "possible solution").

Environment

Kubernetes cluster type:
kube-apiserver

$ Mention java-operator-sdk version from pom.xml file
java-operator-sdk 5.1.1

$ java -version
java version "21.0.7" 2025-04-15 LTS
Java(TM) SE Runtime Environment (build 21.0.7+8-LTS-245)
Java HotSpot(TM) 64-Bit Server VM (build 21.0.7+8-LTS-245, mixed mode, sharing)

$ kubectl version
Client Version: v1.32.2
Kustomize Version: v5.5.0

Possible Solution

Probably not a full solution, but I will share what I found. I debugged the code and found what caused the behavior. The code, which check if a dependent can reconcile when it has parents, is in io.javaoperatorsdk.operator.processing.dependent.workflow.WorkflowReconcileExecutor#allParentsReconciledAndReady. The code looks as below:

private boolean allParentsReconciledAndReady(DependentResourceNode<?, ?> dependentResourceNode) {
    return dependentResourceNode.getDependsOn().isEmpty()
        || dependentResourceNode.getDependsOn().stream()
            .allMatch(d -> alreadyVisited(d) && isReady(d));
  }

There are checked two requirements:

  • alreadyVisited() which checks if parents was reconciled, just a boolean flag.
  • isReady()' which is used from io.javaoperatorsdk.operator.processing.dependent.workflow.BaseWorkflowResult.DetailBuilder#isReady` and looks like:
    public boolean isReady() {
      return readyPostconditionResult == null || readyPostconditionResult.isSuccess();
    }

So, checks if there is a result of readyPostcondition and if it ended up with a success.

During debug I found out that BetaSecretReadyCondition was skipped, because readyPostConditionResult was null, but visited flag was true, and that ended up with true for that parent.

I suspect that race happens in io.javaoperatorsdk.operator.processing.dependent.workflow.WorkflowReconcileExecutor.NodeReconcileExecutor#doRun, which in my current version looks as below.

    protected void doRun(DependentResourceNode<R, P> dependentResourceNode) {
      final var dependentResource = dependentResourceNode.getDependentResource();
      log.debug("Reconciling for primary: {} node: {} ", primaryID, dependentResourceNode);
      ReconcileResult reconcileResult = dependentResource.reconcile(primary, context);
      final var detailBuilder = createOrGetResultFor(dependentResourceNode);
      detailBuilder.withReconcileResult(reconcileResult).markAsVisited();

      if (isConditionMet(dependentResourceNode.getReadyPostcondition(), dependentResourceNode)) {
        log.debug(
            "Setting already reconciled for: {} primaryID: {}", dependentResourceNode, primaryID);
        handleDependentsReconcile(dependentResourceNode);
      } else {
        log.debug("Setting already reconciled but not ready for: {}", dependentResourceNode);
      }
    }

Order of setting is:

  1. Flag visited is set by line detailBuilder.withReconcileResult(reconcileResult).markAsVisited();;
  2. Then, result is set by code isConditionMet(dependentResourceNode.getReadyPostcondition(), dependentResourceNode).

If AlphaSecretDependent finished its reconciling and tries to reconcile its children, and BetaSecretDependent is between steps above (so after setting visited but before setting readyPostcondition result), then GammaJWTDependent sees both parents as reconciled and ready.

What could be the solution? Probably to reorder steps I described, so result should be set before visited flag. I had seen version tagged 4.9.7 used reversed order of logic than the current one. Probably not a full solution, but I hope that it will be helpful to resolve the issue.

Additional context

Probably it is connected why test mentioned in #2775 was failing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions