[AWS][Cloudfront Logs] - Implemented GROK processor based ipv6/v4 parsing in AWS Cloudfront Logs data stream #11829

ShourieG · 2024-11-22T10:54:31Z

Type of change

Please label this PR with one of the following labels, depending on the scope of your change:

Bug
enhancement

Proposed commit message

WHAT: Implemented GROK processor based ipv6/v4 parsing. Cleaned up formatting issues across all data streams via elastic-package format.
WHY: The regex pattern used for the ipv6 was overtly complex and caused errors in Elasticsearch. The cleanup was a byproduct of running elastic-package format.
HOW: A new helper pipeline was added containing this new grok processor and helper scripts which are invoked in a foreach processor from the parent pipeline. This approach had to be taken because with existing limitations of one sub processor per foreach and grok being unable to append to existing lists/arrays.

The Elasticsearch error produced is as follows:-

Processor 'conditional' failed with message '[scripting] Regular expression considered too many characters, pattern: [((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(250-5){3})){3})/)|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(250-5){3})){3})/)|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(250-5){3}))){3}))/)|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(250-5){3}))){3}))/)|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(250-5){3}))){3}))/)|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(250-5){3}))){3}))/)|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(250-5){3}))){3}))/)|:)))(%.+)??$], limit factor: [6], char limit: [114], count: [115], wrapped: [2a06:98c0:3600::103], this limit can be changed by changed by the [script.painless.regex.limit-factor] setting'

NOTE

The latest commit - elastic-packge build cleaned up a lot of formatting issues, hence there are added changes across many data streams. I think these are good in the long run hence kept them in.

Checklist

I have reviewed tips for building integrations and this pull request is aligned with them.
I have verified that all data streams collect metrics or logs.
I have added an entry to my package's changelog.yml file.
I have verified that Kibana version constraints are current according to guidelines.
I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

[ ]

How to test this PR locally

Related issues

Screenshots

elasticmachine · 2024-11-22T10:58:01Z

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

kcreddy

There are several IPV6 addresses that are failing with current change, but are successfully matched by previous regex.
Examples:

1::
1:2:3:4:5:6:7::
1::7:8
1:2:3:4:5::7:8
1::6:7:8
1:2:3:4::6:7:8
1::5:6:7:8
1:2:3::5:6:7:8
1::4:5:6:7:8
1:2::4:5:6:7:8
1::3:4:5:6:7:8
::2:3:4:5:6:7:8

Examples taken from here

kcreddy · 2024-11-22T11:31:20Z

@ShourieG , would be it possible to replace the script processor with foreach and grok with on_failure to use the default IPV6 pattern provided by the logstash?

ShourieG · 2024-11-22T11:38:56Z

would be it possible to replace the script processor with foreach and grok with on_failure to use the default IPV6 pattern provided by the logstash?

@kcreddy, should we try it for both ipv4/v6 or only ipv6 and leave ipv4 as it is currently ?

kcreddy · 2024-11-22T11:42:41Z

@kcreddy, should we try it for both ipv4/v6 or only ipv6 and leave ipv4 as it is currently ?

Makes sense to try for both and use default IPV4 and IPV6 pattern provided. Example in use. It also helps to keep the pattern in sync in case of any future changes to it at source.

elastic-vault-github-plugin-prod · 2024-11-22T12:35:50Z

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

ShourieG · 2024-11-22T15:00:14Z

@kcreddy, have updated the logic with a relevant GROK processor but still some scripts were required for cleanup and handling the localhost edge-case scenario. There are some removals in the ip section based on the GROK patterns.

kcreddy · 2024-11-22T15:48:17Z

packages/aws/data_stream/cloudfront_logs/elasticsearch/ingest_pipeline/default.yml

+  - split:
+      field: _tmp.x_forwarded_for
+      separator: ","
+      target_field: _tmp.split_x_forwarded_for


We could have the same conditional thats used in original script processor.
if: ctx._tmp?.x_forwarded_for != null && ctx._tmp.x_forwarded_for != '-'

packages/aws/data_stream/cloudfront_logs/_dev/test/pipeline/test-cloudfront.log-expected.json

kcreddy

Just a nit, LGTM 👍🏼

Could you also add the exact error to the PR.

kcreddy · 2024-11-24T09:23:55Z

packages/aws/data_stream/cloudfront_logs/elasticsearch/ingest_pipeline/pipeline_process_ip.yml

+        // append valid IPs
+        if (ctx._tmp.valid_ip != null) {
+          if (ctx.network.forwarded_ip == null) {
+            ctx.network.forwarded_ip = [];
+          }
+          if (!ctx.network.forwarded_ip.contains(ctx._tmp.valid_ip)){
+              ctx.network.forwarded_ip.add(ctx._tmp.valid_ip);
+          }


Could possibly take the array-existence check to the base pipeline so that it is not run for every entry.
Or could also change script to append with allow_duplicates: false to be cleaner.
Same for invalid IPs.

efd6

Suggested commit message body.

The regex pattern used for the ipv6 was overly complex and caused errors in
Elasticsearch. This change simplifies and fixes the grok-based IPv4/IPv4
parsing by adding a new helper pipeline invoked iteratively over ip fields
via a foreach. Also add a painless script to handle 'localhost' literal
addresses.

Also run elastic-package format.

I think that probably the format should have been in a separate PR.

efd6 · 2024-11-24T22:18:57Z

packages/aws/data_stream/cloudfront_logs/elasticsearch/ingest_pipeline/default.yml

+      if: ctx._tmp.invalid_ips != null
+  - script:
+      lang: painless
+      description: handle 'localhost' edge case scenario, currently not handled via grok


Suggested change

description: handle 'localhost' edge case scenario, currently not handled via grok

description: Handle 'localhost' edge case, currently not handled via grok.

agithomas · 2024-11-25T04:45:46Z

LGTM!

Just a suggestion: Can we consider adding a sample or samples to the pipeline test to capture the pattern observed in the reported issue, handling of localhost etc?

agithomas

Approved with comments.

elasticmachine · 2024-11-25T06:04:36Z

💚 Build Succeeded

Buildkite Build
Commit: f7ae0bba7e9943fce9761be3524dc0973cdab818

History

💚 Build #18634 succeeded 3328d815eb6cfa58549b89b82a7206ec3069a8c9
💚 Build #18613 succeeded 7d36c97b9d1ab3f47b32270cd23ad378217191fd
💔 Build #18609 failed dd3e19759c44a5c9a45f17e9bf500af71c44bb78
💚 Build #18603 succeeded 3cf60a7f3d4bb50b415a91fe50e1b98bbb8d6df1

cc @ShourieG

elastic-sonarqube · 2024-11-25T06:04:38Z

Quality Gate failed

Failed conditions
74.6% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

ShourieG · 2024-11-25T06:17:17Z

LGTM!

Just a suggestion: Can we consider adding a sample or samples to the pipeline test to capture the pattern observed in the reported issue, handling of localhost etc?

Current tests already contain the localhost pattern @agithomas. The reported issue on the other hand was reporting errors with the existing regex pattern when compiling in elasticsearch.

elastic-vault-github-plugin-prod · 2024-11-25T06:43:08Z

Package aws - 2.32.0 containing this change is available at https://epr.elastic.co/package/aws/2.32.0/

…sing in AWS Cloudfront Logs data stream (elastic#11829) * The regex pattern used for the ipv6 was overly complex and caused errors inElasticsearch. This change simplifies and fixes the grok-based IPv4/IPv4 parsing by adding a new helper pipeline invoked iteratively over ip fields via a foreach. Also add a painless script to handle 'localhost' literal addresses. * Also run elastic-package format.

ShourieG requested review from a team as code owners November 22, 2024 10:54

ShourieG self-assigned this Nov 22, 2024

ShourieG added integration Label used for meta issues tracking each integration Integration:aws AWS bugfix Pull request that fixes a bug issue Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] labels Nov 22, 2024

ShourieG added Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] Team:obs-ds-hosted-services Observability Hosted Services team [elastic/obs-ds-hosted-services] labels Nov 22, 2024

kcreddy reviewed Nov 22, 2024

View reviewed changes

ShourieG changed the title ~~[AWS][Cloudfront Logs] - Simplified the ipv6 regex pattern in Cloudfront logs data stream~~ [AWS][Cloudfront Logs] - Implemented GROK processor based ipv6/v4 parsing in AWS Cloudfront Logs data stream Nov 22, 2024

ShourieG added the enhancement New feature or request label Nov 22, 2024

kaiyan-sheng approved these changes Nov 22, 2024

View reviewed changes

kcreddy reviewed Nov 22, 2024

View reviewed changes

kcreddy approved these changes Nov 24, 2024

View reviewed changes

efd6 reviewed Nov 24, 2024

View reviewed changes

agithomas approved these changes Nov 25, 2024

View reviewed changes

ShourieG merged commit 29e226c into elastic:main Nov 25, 2024
4 of 5 checks passed

ShourieG deleted the bugfix/aws_cloudfront branch November 25, 2024 06:25

	description: handle 'localhost' edge case scenario, currently not handled via grok
	description: Handle 'localhost' edge case, currently not handled via grok.

[AWS][Cloudfront Logs] - Implemented GROK processor based ipv6/v4 parsing in AWS Cloudfront Logs data stream #11829

[AWS][Cloudfront Logs] - Implemented GROK processor based ipv6/v4 parsing in AWS Cloudfront Logs data stream #11829

Uh oh!

Conversation

ShourieG commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of change

Proposed commit message

NOTE

Checklist

Author's Checklist

How to test this PR locally

Related issues

Screenshots

Uh oh!

elasticmachine commented Nov 22, 2024

Uh oh!

kcreddy left a comment

Choose a reason for hiding this comment

Uh oh!

kcreddy commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShourieG commented Nov 22, 2024

Uh oh!

kcreddy commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elastic-vault-github-plugin-prod bot commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Benchmarks report

Uh oh!

ShourieG commented Nov 22, 2024

Uh oh!

kcreddy Nov 22, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kcreddy left a comment

Choose a reason for hiding this comment

Uh oh!

kcreddy Nov 24, 2024

Choose a reason for hiding this comment

Uh oh!

efd6 left a comment

Choose a reason for hiding this comment

Uh oh!

efd6 Nov 24, 2024

Choose a reason for hiding this comment

Uh oh!

agithomas commented Nov 25, 2024

Uh oh!

agithomas left a comment

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Nov 25, 2024

💚 Build Succeeded

History

Uh oh!

elastic-sonarqube bot commented Nov 25, 2024

Quality Gate failed

Uh oh!

ShourieG commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

elastic-vault-github-plugin-prod bot commented Nov 25, 2024

Uh oh!

Uh oh!

ShourieG commented Nov 22, 2024 •

edited

Loading

kcreddy commented Nov 22, 2024 •

edited

Loading

kcreddy commented Nov 22, 2024 •

edited

Loading

elastic-vault-github-plugin-prod bot commented Nov 22, 2024 •

edited

Loading

ShourieG commented Nov 25, 2024 •

edited

Loading