-
Notifications
You must be signed in to change notification settings - Fork 224
Open
Description
This is mostly a reality check if our custom bootstrap is using JOSDK as intended or not. Depending on the feedback we can close this issue and create more specific follow-up issues if needed. I just wanted to describe the context once.
// configure operator
final var customMetrics = new CustomResourceMetrics(config, meterRegistry);
final var eventSender = new EventSender(config, client);
final var reconciler = new HiveMQPlatformReconciler(config, customResourceMetrics, client, eventSender);
final var dependentResourceFactory = new HiveMQPlatformOperatorDependentResourceFactory<>(config, eventSender);
final var metrics = MicrometerMetrics.newPerResourceCollectingMicrometerMetricsBuilder(meterRegistry).build();
final var operator = new Operator(override -> override //
.withConcurrentReconciliationThreads(config.getConcurrentReconciliationThreads())
.withConcurrentWorkflowExecutorThreads(config.getConcurrentWorkflowThreads())
.withReconciliationTerminationTimeout(config.getReconciliationTerminationTimeout())
.withCacheSyncTimeout(config.getCacheSyncTimeout())
.withKubernetesClient(client)
.withCloseClientOnStop(closeClient)
.withDependentResourceFactory(dependentResourceFactory)
.withMetrics(metrics)
.withUseSSAToPatchPrimaryResource(useSSA)
.withSSABasedCreateUpdateMatchForDependentResources(useSSA));
final var reconcilerConfig = operator.getConfigurationService().getConfigurationFor(reconciler); // (1)
final var overrideConfig = ControllerConfigurationOverrider.override(reconcilerConfig);
if (operatorNamespaces.equals(Constants.WATCH_CURRENT_NAMESPACE)) {
overrideConfig.watchingOnlyCurrentNamespace();
} else if (operatorNamespaces.equals(Constants.WATCH_ALL_NAMESPACES)) {
overrideConfig.watchingAllNamespaces();
} else {
final var namespaces = Arrays.stream(operatorNamespaces.split(",")).collect(Collectors.toSet());
overrideConfig.settingNamespaces(namespaces);
}
if (!operatorSelector.isBlank()) {
overrideConfig.withLabelSelector(operatorSelector);
}
overrideConfig.withOnAddFilter(platform -> { // (3)
customResourceMetrics.register(platform);
return true;
});
final var controller = (Controller<HiveMQPlatform>) operator.register(reconciler, overrideConfig.build());
customResourceMetrics.setCache(controller.getEventSourceManager().getControllerEventSource()); // (2)
HiveMQPlatformOperatorDependentResourceFactory implements DependentResourceFactory
(it's the only place where we really had to replace Quarkus Arc, otherwise we don't need a CDI).closeClient
istrue
in production and only set tofalse
in tests to not close an injected K8s client (that is still used in the test).useSSA
istrue
in production and only set tofalse
in tests with the K8s mockserver.
- Are we creating the configuration and overrides as intended? I tried to reverse engineer how Quarkus and the
LocallyRunOperatorExtension
are configuring JOSDK, but we still get a warning on theoperator.getConfigurationService().getConfigurationFor(reconciler)
invocation:It feels wrong to get that warning, but I found no other way to create a12:29:37.423 [main] WARN Default ConfigurationService implementation - Configuration for reconciler 'hivemq-controller' was not found. Known reconcilers: None.
ControllerConfigurationOverrider
. - How to properly implement custom metrics on all custom resources in an efficient way? For example, in
CustomResourceMetrics
we have a globalGauge
for each state of our operator state machine. To calculate the values we count all custom resources that are in that state:Thatcache.list() .map(CustomResource::getStatus) .filter(Objects::nonNull) .filter(status -> status.getState().equals(state)) .count();
cache
is theControllerEventSource
that we set viacustomResourceMetrics.setCache(controller.getEventSourceManager().getControllerEventSource())
. This feels a bit illegal but works very fine for us.
In the previous implementation we kept a shadow copy of all custom resources in a CHM, but these were the cloned instances from the reconciliation loop. This impacted the GC and needed constant updates of the cached instances to retrieve the current state. Using the original instances from the cache solved this problem very elegantly for us, including the lifecycle management. - The same problem hits us now again with a new
Gauge
for an aggregated health status metric per custom resource. We need to register thisGauge
once when the CR is added, de-register it when it's removed and in between have access to the currentCustomResource::getStatus
to get the health state. So we're back to aMap<String, GaugeHolder>
with<namespace>-<name>
as key.
For theGauge
creation I'm usingoverrideConfig.withOnAddFilter()
, which again feels very wrong, but seems to work fine. TheGauge
removal is done in theCleaner::cleanup
method of our reconciler.
Is there a better way for the lifecycle control and where to put thatGauge
? It feels like we cannot put this into the custom resource itself, because of the cloning (theGauge
needs to be a singleton) and to update the correct instance. This is how it looks like in theCustomResourceMetrics
right now:private final @NotNull Map<String, GaugeHolder> healthMetricGauges = new ConcurrentHashMap<>(); // called from the bootstrap public void register(final @NotNull HiveMQPlatform platform) { healthMetricGauges.put(getKey(platform), new GaugeHolder(platform)); } // called from Cleaner::cleanup public void deregister(final @NotNull HiveMQPlatform platform) { if (healthMetricGauges.get(getKey(platform)) instanceof GaugeHolder gaugeHolder) { meterRegistry.remove(gaugeHolder.gauge); } } // called from the main reconciler when the health state has changed public void updateHealthMetric(final @NotNull HiveMQPlatform platform) { if (healthMetricGauges.get(getKey(platform)) instanceof GaugeHolder gaugeHolder) { gaugeHolder.value.set(platform.getStatus().getHealthStatus().getMetricsValue()); } } private static @NotNull String getKey(final @NotNull HiveMQPlatform platform) { return platform.getMetadata().getNamespace() + "-" + platform.getMetadata().getName(); } private class GaugeHolder { private final @NotNull AtomicInteger value = new AtomicInteger(HealthStatus.UNKNOWN.getMetricsValue()); private final @NotNull Gauge gauge; public GaugeHolder(final @NotNull HiveMQPlatform platform) { this.gauge = Gauge.builder("hivemq.platform.health.system.current", value::get) .tag("namespace", platform.getMetadata().getNamespace()) .tag("name", platform.getMetadata().getName()) .description("Custom resource health status") .register(meterRegistry); } }
Metadata
Metadata
Assignees
Labels
No labels