Skip to content

[Elasticsearch][Ingest pipeline] Stop truncating the elasticsearch server log messages #12813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 18, 2025

Conversation

consulthys
Copy link
Contributor

@consulthys consulthys commented Feb 17, 2025

Proposed commit message

This PR fixes the ingest pipeline called logs-elasticsearch.server-<version>-pipeline-json that parses elasticsearch server logs.

The pipeline was inherited from the Filebeat elasticsearch module and hasn't changed in several years. The main issue is that the pipeline makes the assumption that if the message field value starts with square brackets (e.g. [xyz] some log message), then xyz is considered to be an index name (indexed in the elasticsearch.index.name field) and the message is truncated to only what comes after the square brackets (i.e. some log message). This assumption might have been true at some point in the past, but isn't the case anymore, i.e. the square brackets can contain literally anything, such as component names, class names, etc. Truncating the message field breaks downstream processes that expect to find the full log message in that field.

For instance, when applied on the following log message

[co.elastic.elasticsearch.metering.sampling.SampledStorageMetricsProvider] is not ready for collect yet

the truncated message field will only contain is not ready for collect yet, which now lacks context and is unsuable.

As it is not easy to find out all (internal and external) downstream processes that rely on the index name to be extracted from the log message, we need to keep extracting whatever is in the square brackets, but without truncating the message field. This PR suggests a non-breaking change that will keep extracting the index name (if it doesn't already exist in the document), while leaving the message field alone.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have added an entry to my package's changelog.yml file.

How to test this PR locally

  1. Install the elasticsearch integration update
  2. Run the following ingest pipeline simulation in Dev Tools and make sure that the message field is unaltered
POST _ingest/pipeline/logs-elasticsearch.server-1.17.2-pipeline-json/_simulate
{
  "docs": [
    {
      "_source": {
        "@timestamp": "2025-01-22T15:50:04.517Z",
        "log.level": "INFO",
        "message": "[co.elastic.elasticsearch.metering.sampling.SampledStorageMetricsProvider] is not ready for collect yet",
        "ecs.version": "1.2.0",
        "service.name": "ES_ECS",
        "event.dataset": "elasticsearch.server",
        "process.thread.name": "elasticsearch[es-es-search-c886b6975-9tsbz][metering_reporter][T#1]",
        "log.logger": "co.elastic.elasticsearch.metering.usagereports.UsageReportCollector",
        "elasticsearch.cluster.uuid": "vaYFhXk-Q0WFOdSjyaylnA",
        "elasticsearch.node.id": "fnqA_AXpTFm6uZdtXDNIPA",
        "elasticsearch.node.name": "es-es-search-c886b6975-9tsbz",
        "elasticsearch.cluster.name": "es"
      }
    }
  ]
}

Response:

{
  "docs": [
    {
      "doc": {
        "_source": {
          "elasticsearch": {
            "index": {
>>>>          "name": "co.elastic.elasticsearch.metering.sampling.SampledStorageMetricsProvider"
            }
          },
          ...
>>>>      "message": "[co.elastic.elasticsearch.metering.sampling.SampledStorageMetricsProvider] is not ready for collect yet",
          ...
        }
      }
    }
  ]
}

If the elasticsearch.index.name field is already present in the log document, then it will not be overridden

POST _ingest/pipeline/logs-elasticsearch.server-1.17.2-pipeline-json/_simulate
{
  "docs": [
    {
      "_source": {
        "@timestamp": "2025-01-22T15:50:04.517Z",
        "log.level": "INFO",
        "message": "[co.elastic.elasticsearch.metering.sampling.SampledStorageMetricsProvider] is not ready for collect yet",
        "ecs.version": "1.2.0",
        "service.name": "ES_ECS",
        "event.dataset": "elasticsearch.server",
        "process.thread.name": "elasticsearch[es-es-search-c886b6975-9tsbz][metering_reporter][T#1]",
        "log.logger": "co.elastic.elasticsearch.metering.usagereports.UsageReportCollector",
        "elasticsearch.cluster.uuid": "vaYFhXk-Q0WFOdSjyaylnA",
        "elasticsearch.node.id": "fnqA_AXpTFm6uZdtXDNIPA",
        "elasticsearch.node.name": "es-es-search-c886b6975-9tsbz",
        "elasticsearch.cluster.name": "es",
>>>     "elasticsearch.index.name": "index-123"
      }
    }
  ]
}

Response:

{
  "docs": [
    {
      "doc": {
        "_source": {
          "elasticsearch": {
            "index": {
>>>>          "name": "index-123"
            }
          },
          ...
>>>>      "message": "[co.elastic.elasticsearch.metering.sampling.SampledStorageMetricsProvider] is not ready for collect yet",
          ...
        }
      }
    }
  ]
}

Related issues

Closes #12501

@consulthys consulthys added Integration:elasticsearch Elasticsearch Feature:Stack Monitoring Stack Monitoring Feature bugfix Pull request that fixes a bug issue Team:Stack Monitoring Stack Monitoring team [elastic/stack-monitoring] labels Feb 17, 2025
@consulthys consulthys self-assigned this Feb 17, 2025
@consulthys consulthys requested a review from a team as a code owner February 17, 2025 14:27
@consulthys consulthys requested a review from pickypg February 17, 2025 14:40
@elastic-vault-github-plugin-prod

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

Copy link

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @consulthys

Copy link
Member

@pickypg pickypg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@consulthys consulthys merged commit 29006f1 into main Feb 18, 2025
6 checks passed
@consulthys consulthys deleted the 12501-es-server-logs-truncation branch February 18, 2025 19:36
@elastic-vault-github-plugin-prod

Package elasticsearch - 1.17.2 containing this change is available at https://epr.elastic.co/package/elasticsearch/1.17.2/

flexitrev pushed a commit that referenced this pull request Mar 20, 2025
…rver log messages (#12813)

* Stop truncating the elasticsearch server log messages

* Add more test cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Pull request that fixes a bug issue Feature:Stack Monitoring Stack Monitoring Feature Integration:elasticsearch Elasticsearch Team:Stack Monitoring Stack Monitoring team [elastic/stack-monitoring]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Elasticsearch]: Ingest pipeline created to process Elasticserver logs truncates log messages
3 participants