-
Notifications
You must be signed in to change notification settings - Fork 224
Description
Bug Report
What did you do?
I set up a workflow as in example below (just to show dependencies between resources).
@Workflow(dependents = {
@Dependent(type = AlphaSecretDependent.class,
name = "Alpha",
readyPostcondition = AlphaSecretReadyCondition.class,
useEventSourceWithName = "alpha-source"
),
@Dependent(type = BetaSecretDependent.class,
name = "Beta",
readyPostcondition = BetaSecretReadyCondition.class,
useEventSourceWithName = "beta-source"
),
@Dependent(type = GammaJWTDependent.class,
name = "Gamma",
dependsOn = {"Alpha", "Beta"}
)
})
AlphaSecretDependent
and BetaSecretDependent
are secrets made to be created outside an operator and further used by GammaJWTDependent
to produce JWT. Ready post conditions check if resources met requirements like if keys exist in secrets.
The workflow above I used to make a test in which BetaSecretDependent
is not provided and BetaSecretReadyCondition
was always false to check if code to search secrets worked properly and if GammaJWTDependent
is not reconciled.
What did you expect to see?
I expected to see in test that GammaJWTDependent
was not reconciled and there was no attempt to reconcile it.
What did you see instead? Under which circumstances?
In a few cases, looking non-deterministic, there was an attempt to reconcile GammaJWTDependent
. Looked like BetaSecretReadyCondition
was true (but wasn't) or ignored, but it was more like a race condition I believe. (More about it in "possible solution").
Environment
Kubernetes cluster type:
kube-apiserver
$ Mention java-operator-sdk version from pom.xml file
java-operator-sdk 5.1.1
$ java -version
java version "21.0.7" 2025-04-15 LTS
Java(TM) SE Runtime Environment (build 21.0.7+8-LTS-245)
Java HotSpot(TM) 64-Bit Server VM (build 21.0.7+8-LTS-245, mixed mode, sharing)
$ kubectl version
Client Version: v1.32.2
Kustomize Version: v5.5.0
Possible Solution
Probably not a full solution, but I will share what I found. I debugged the code and found what caused the behavior. The code, which check if a dependent can reconcile when it has parents, is in io.javaoperatorsdk.operator.processing.dependent.workflow.WorkflowReconcileExecutor#allParentsReconciledAndReady
. The code looks as below:
private boolean allParentsReconciledAndReady(DependentResourceNode<?, ?> dependentResourceNode) {
return dependentResourceNode.getDependsOn().isEmpty()
|| dependentResourceNode.getDependsOn().stream()
.allMatch(d -> alreadyVisited(d) && isReady(d));
}
There are checked two requirements:
alreadyVisited()
which checks if parents was reconciled, just a boolean flag.isReady()' which is used from
io.javaoperatorsdk.operator.processing.dependent.workflow.BaseWorkflowResult.DetailBuilder#isReady` and looks like:
public boolean isReady() {
return readyPostconditionResult == null || readyPostconditionResult.isSuccess();
}
So, checks if there is a result of readyPostcondition and if it ended up with a success.
During debug I found out that BetaSecretReadyCondition
was skipped, because readyPostConditionResult
was null, but visited
flag was true, and that ended up with true for that parent.
I suspect that race happens in io.javaoperatorsdk.operator.processing.dependent.workflow.WorkflowReconcileExecutor.NodeReconcileExecutor#doRun
, which in my current version looks as below.
protected void doRun(DependentResourceNode<R, P> dependentResourceNode) {
final var dependentResource = dependentResourceNode.getDependentResource();
log.debug("Reconciling for primary: {} node: {} ", primaryID, dependentResourceNode);
ReconcileResult reconcileResult = dependentResource.reconcile(primary, context);
final var detailBuilder = createOrGetResultFor(dependentResourceNode);
detailBuilder.withReconcileResult(reconcileResult).markAsVisited();
if (isConditionMet(dependentResourceNode.getReadyPostcondition(), dependentResourceNode)) {
log.debug(
"Setting already reconciled for: {} primaryID: {}", dependentResourceNode, primaryID);
handleDependentsReconcile(dependentResourceNode);
} else {
log.debug("Setting already reconciled but not ready for: {}", dependentResourceNode);
}
}
Order of setting is:
- Flag
visited
is set by linedetailBuilder.withReconcileResult(reconcileResult).markAsVisited();
; - Then, result is set by code
isConditionMet(dependentResourceNode.getReadyPostcondition(), dependentResourceNode)
.
If AlphaSecretDependent finished its reconciling and tries to reconcile its children, and BetaSecretDependent is between steps above (so after setting visited
but before setting readyPostcondition result), then GammaJWTDependent sees both parents as reconciled and ready.
What could be the solution? Probably to reorder steps I described, so result should be set before visited
flag. I had seen version tagged 4.9.7 used reversed order of logic than the current one. Probably not a full solution, but I hope that it will be helpful to resolve the issue.
Additional context
Probably it is connected why test mentioned in #2775 was failing.