Skip to content

chore: pyrscope continuous profiling for coder #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Aug 5, 2025
Merged

Conversation

Emyrk
Copy link
Member

@Emyrk Emyrk commented Aug 4, 2025

No description provided.

@Emyrk Emyrk changed the title chore: work on getting pyrscope continous profiling chore: pyrscope continuous profiling for coder Aug 5, 2025
@Emyrk Emyrk marked this pull request as ready for review August 5, 2025 16:29
@Emyrk Emyrk requested a review from dannykopping as a code owner August 5, 2025 16:29
Copy link
Collaborator

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

# Source: coder-observability/templates/configmap-collector.yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: collector-config
namespace: coder-observability
data:
config.river: "\n// Discover k8s nodes\ndiscovery.kubernetes \"nodes\" {\n role = \"node\"\n}\n\n// Discover k8s pods\ndiscovery.kubernetes \"pods\" {\n role = \"pod\"\n selectors {\n role = \"pod\"\n }\n}\n\ndiscovery.relabel \"pod_logs\" {\n targets = discovery.kubernetes.pods.targets\n \n rule {\n source_labels = [\"__meta_kubernetes_namespace\"]\n target_label = \"namespace\"\n }\n rule {\n source_labels = [\"__meta_kubernetes_pod_name\"]\n target_label = \"pod\"\n }\n // coalesce the following labels and pick the first value; we'll use this to define the \"job\" label\n rule {\n source_labels = [\"__meta_kubernetes_pod_label_app_kubernetes_io_component\", \"app\", \"__meta_kubernetes_pod_container_name\"]\n separator = \"/\"\n target_label = \"__meta_app\"\n action = \"replace\"\n regex = \"^/*([^/]+?)(?:/.*)?$\" // split by the delimiter if it exists, we only want the first one\n replacement = \"${1}\"\n }\n rule {\n source_labels = [\"__meta_kubernetes_namespace\", \"__meta_kubernetes_pod_label_app_kubernetes_io_name\", \"__meta_app\"]\n separator = \"/\"\n target_label = \"job\"\n }\n rule {\n source_labels = [\"__meta_kubernetes_pod_container_name\"]\n target_label = \"container\"\n }\n rule {\n regex = \"__meta_kubernetes_pod_label_(statefulset_kubernetes_io_pod_name|controller_revision_hash)\"\n action = \"labeldrop\"\n }\n rule {\n regex = \"pod_template_generation\"\n action = \"labeldrop\"\n }\n rule {\n source_labels = [\"__meta_kubernetes_pod_phase\"]\n regex = \"Pending|Succeeded|Failed|Completed\"\n action = \"drop\"\n }\n rule {\n source_labels = [\"__meta_kubernetes_pod_node_name\"]\n action = \"replace\"\n target_label = \"node\"\n }\n rule {\n action = \"labelmap\"\n regex = \"__meta_kubernetes_pod_annotation_prometheus_io_param_(.+)\"\n replacement = \"__param_$1\"\n }\n rule {\n source_labels = [\"__meta_kubernetes_pod_uid\", \"__meta_kubernetes_pod_container_name\"]\n separator = \"/\"\n action = \"replace\"\n replacement = \"/var/log/pods/*$1/*.log\"\n target_label = \"__path__\"\n }\n rule {\n action = \"replace\"\n source_labels = [\"__meta_kubernetes_pod_container_id\"]\n regex = \"^(\\\\w+):\\\\/\\\\/.+$\"\n replacement = \"$1\"\n target_label = \"tmp_container_runtime\"\n }\n}\n\ndiscovery.relabel \"pod_metrics\" {\n targets = discovery.kubernetes.pods.targets\n \n rule {\n source_labels = [\"__meta_kubernetes_namespace\"]\n target_label = \"namespace\"\n }\n rule {\n source_labels = [\"__meta_kubernetes_pod_name\"]\n target_label = \"pod\"\n }\n // coalesce the following labels and pick the first value; we'll use this to define the \"job\" label\n rule {\n source_labels = [\"__meta_kubernetes_pod_label_app_kubernetes_io_component\", \"app\", \"__meta_kubernetes_pod_container_name\"]\n separator = \"/\"\n target_label = \"__meta_app\"\n action = \"replace\"\n regex = \"^/*([^/]+?)(?:/.*)?$\" // split by the delimiter if it exists, we only want the first one\n replacement = \"${1}\"\n }\n rule {\n source_labels = [\"__meta_kubernetes_namespace\", \"__meta_kubernetes_pod_label_app_kubernetes_io_name\", \"__meta_app\"]\n separator = \"/\"\n target_label = \"job\"\n }\n rule {\n source_labels = [\"__meta_kubernetes_pod_container_name\"]\n target_label = \"container\"\n }\n rule {\n regex = \"__meta_kubernetes_pod_label_(statefulset_kubernetes_io_pod_name|controller_revision_hash)\"\n action = \"labeldrop\"\n }\n rule {\n regex = \"pod_template_generation\"\n action = \"labeldrop\"\n }\n rule {\n source_labels = [\"__meta_kubernetes_pod_phase\"]\n regex = \"Pending|Succeeded|Failed|Completed\"\n action = \"drop\"\n }\n rule {\n source_labels = [\"__meta_kubernetes_pod_node_name\"]\n action = \"replace\"\n target_label = \"node\"\n }\n rule {\n action = \"labelmap\"\n regex = \"__meta_kubernetes_pod_annotation_prometheus_io_param_(.+)\"\n replacement = \"__param_$1\"\n }\n // drop ports that do not expose Prometheus metrics, but might otherwise be exposed by a container which *also*\n // exposes an HTTP port which exposes metrics\n rule {\n source_labels = [\"__meta_kubernetes_pod_container_port_name\"]\n regex = \"grpc|http-(memberlist|console)\"\n action = \"drop\"\n }\n // adapted from the Prometheus helm chart\n // https://github.com/prometheus-community/helm-charts/blob/862870fc3c847e32479b509e511584d5283126a3/charts/prometheus/values.yaml#L1070\n rule {\n source_labels = [\"__meta_kubernetes_pod_annotation_prometheus_io_scrape\"]\n action = \"keep\"\n regex = \"true\"\n }\n rule {\n source_labels = [\"__meta_kubernetes_pod_annotation_prometheus_io_scheme\"]\n action = \"replace\"\n regex = \"(https?)\"\n target_label = \"__scheme__\"\n }\n rule {\n source_labels = [\"__meta_kubernetes_pod_annotation_prometheus_io_path\"]\n action = \"replace\"\n target_label = \"__metrics_path__\"\n regex = \"(.+)\"\n }\n rule {\n source_labels = [\"__meta_kubernetes_pod_annotation_prometheus_io_port\", \"__meta_kubernetes_pod_ip\"]\n action = \"replace\"\n regex = \"(\\\\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})\"\n replacement = \"[$2]:$1\"\n target_label = \"__address__\"\n }\n rule {\n source_labels = [\"__meta_kubernetes_pod_annotation_prometheus_io_port\", \"__meta_kubernetes_pod_ip\"]\n action = \"replace\"\n regex = \"(\\\\d+);((([0-9]+?)(\\\\.|$)){4})\"\n replacement = \"$2:$1\"\n target_label = \"__address__\"\n }\n}\n\nlocal.file_match \"pod_logs\" {\n path_targets = discovery.relabel.pod_logs.output\n}\n\nloki.source.file \"pod_logs\" {\n targets = local.file_match.pod_logs.targets\n forward_to = [loki.process.pod_logs.receiver]\n}\n\nloki.process \"pod_logs\" {\n stage.match {\n selector = \"{tmp_container_runtime=\\\"containerd\\\"}\"\n // the cri processing stage extracts the following k/v pairs: log, stream, time, flags\n stage.cri {}\n // Set the extract flags and stream values as labels\n stage.labels {\n values = {\n flags = \"\",\n stream = \"\",\n }\n }\n }\n\n // if the label tmp_container_runtime from above is docker parse using docker\n stage.match {\n selector = \"{tmp_container_runtime=\\\"docker\\\"}\"\n // the docker processing stage extracts the following k/v pairs: log, stream, time\n stage.docker {}\n\n // Set the extract stream value as a label\n stage.labels {\n values = {\n stream = \"\",\n }\n }\n }\n\n // drop the temporary container runtime label as it is no longer needed\n stage.label_drop {\n values = [\"tmp_container_runtime\"]\n }\n\n // parse Coder logs and extract level & logger for efficient filtering\n stage.match {\n selector = \"{pod=~\\\"coder.*\\\"}\" // TODO: make configurable\n\n stage.multiline {\n firstline = \"^(?P<ts>\\\\d{4}-\\\\d{2}-\\\\d{2}\\\\s\\\\d{2}:\\\\d{2}:\\\\d{2}\\\\.\\\\d{3})\"\n max_wait_time = \"10s\"\n }\n\n stage.regex {\n expression = \"^(?P<ts>\\\\d{4}-\\\\d{2}-\\\\d{2}\\\\s\\\\d{2}:\\\\d{2}:\\\\d{2}\\\\.\\\\d{3})\\\\s\\\\[(?P<level>\\\\w+)\\\\]\\\\s\\\\s(?P<logger>[^:]+):\\\\s(?P<line>.+)\"\n }\n\n stage.timestamp {\n source = \"ts\"\n format = \"2006-01-02 15:04:05.000\"\n action_on_failure = \"fudge\" // rather have inaccurate time than drop the log line\n }\n\n stage.labels {\n values = {\n level = \"\",\n logger = \"\",\n }\n }\n }\n\n forward_to = [loki.write.loki.receiver]\n}\n\nloki.write \"loki\" {\n endpoint {\n url = \"http://loki-gateway.coder-observability.svc/loki/api/v1/push\"\n }\n}\n\nprometheus.scrape \"pods\" {\n targets = discovery.relabel.pod_metrics.output\n forward_to = [prometheus.relabel.pods.receiver]\n\n scrape_interval = \"15s\"\n scrape_timeout = \"12s\"\n}\n\n// These are metric_relabel_configs while discovery.relabel are relabel_configs.\n// See https://github.com/grafana/agent/blob/main/internal/converter/internal/prometheusconvert/prometheusconvert.go#L95-L106\nprometheus.relabel \"pods\" {\n forward_to = [prometheus.remote_write.default.receiver]\n\n // Drop kube-state-metrics' labels which clash with ours\n rule {\n source_labels = [\"__name__\", \"container\"]\n regex = \"kube_pod.+;(.+)\"\n target_label = \"container\"\n replacement = \"\"\n }\n rule {\n source_labels = [\"__name__\", \"pod\"]\n regex = \"kube_pod.+;(.+)\"\n target_label = \"pod\"\n replacement = \"\"\n }\n rule {\n source_labels = [\"__name__\", \"namespace\"]\n regex = \"kube_pod.+;(.+)\"\n target_label = \"namespace\"\n replacement = \"\"\n }\n rule {\n source_labels = [\"__name__\", \"exported_container\"]\n // don't replace an empty label\n regex = \"^kube_pod.+;(.+)$\"\n target_label = \"container\"\n replacement = \"$1\"\n }\n rule {\n source_labels = [\"__name__\", \"exported_pod\"]\n // don't replace an empty label\n regex = \"^kube_pod.+;(.+)$\"\n target_label = \"pod\"\n replacement = \"$1\"\n }\n rule {\n source_labels = [\"__name__\", \"exported_namespace\"]\n // don't replace an empty label\n regex = \"^kube_pod.+;(.+)$\"\n target_label = \"namespace\"\n replacement = \"$1\"\n }\n rule {\n regex = \"^(exported_.*|image_.*|container_id|id|uid)$\"\n action = \"labeldrop\"\n }\n}\n\ndiscovery.relabel \"cadvisor\" {\n targets = discovery.kubernetes.nodes.targets\n rule {\n replacement = \"/metrics/cadvisor\"\n target_label = \"__metrics_path__\"\n }\n}\n\nprometheus.scrape \"cadvisor\" {\n targets = discovery.relabel.cadvisor.output\n forward_to = [ prometheus.relabel.cadvisor.receiver ]\n scheme = \"https\"\n tls_config {\n insecure_skip_verify = true\n }\n bearer_token_file = \"/var/run/secrets/kubernetes.io/serviceaccount/token\"\n scrape_interval = \"15s\"\n scrape_timeout = \"12s\"\n}\n\nprometheus.relabel \"cadvisor\" {\n forward_to = [ prometheus.remote_write.default.receiver ]\n\n // Drop empty container labels, addressing https://github.com/google/cadvisor/issues/2688\n rule {\n source_labels = [\"__name__\",\"container\"]\n separator = \"@\"\n regex = \"(container_cpu_.*|container_fs_.*|container_memory_.*)@\"\n action = \"drop\"\n }\n // Drop empty image labels, addressing https://github.com/google/cadvisor/issues/2688\n rule {\n source_labels = [\"__name__\",\"image\"]\n separator = \"@\"\n regex = \"(container_cpu_.*|container_fs_.*|container_memory_.*|container_network_.*)@\"\n action = \"drop\"\n }\n // Drop irrelevant series\n rule {\n source_labels = [\"container\"]\n regex = \"^POD$\"\n action = \"drop\"\n }\n // Drop unnecessary labels\n rule {\n source_labels = [\"id\"]\n target_label = \"id\"\n replacement = \"\"\n }\n rule {\n source_labels = [\"job\"]\n target_label = \"job\"\n replacement = \"\"\n }\n rule {\n source_labels = [\"name\"]\n target_label = \"name\"\n replacement = \"\"\n }\n}\n\nprometheus.remote_write \"default\" {\n endpoint {\n url =\"http://prometheus.coder-observability.svc/api/v1/write\"\n\n // drop instance label which unnecessarily adds new series when pods are restarted, since pod IPs are dynamically assigned\n // NOTE: \"__address__\" is mapped to \"instance\", so will contain <hostname>:<port>\n write_relabel_config {\n regex = \"instance\"\n action = \"labeldrop\"\n }\n }\n}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,106 @@
# Using kind to test observability
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@Emyrk Emyrk merged commit f72d9d7 into main Aug 5, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants