Skip to content

[AWS][Cloudfront Logs] - Implemented GROK processor based ipv6/v4 parsing in AWS Cloudfront Logs data stream #11829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 0 commits into from
Nov 25, 2024

Conversation

ShourieG
Copy link
Contributor

@ShourieG ShourieG commented Nov 22, 2024

Type of change

Please label this PR with one of the following labels, depending on the scope of your change:

  • Bug
  • enhancement

Proposed commit message

  • WHAT: Implemented GROK processor based ipv6/v4 parsing. Cleaned up formatting issues across all data streams via elastic-package format.

  • WHY: The regex pattern used for the ipv6 was overtly complex and caused errors in Elasticsearch. The cleanup was a byproduct of running elastic-package format.

  • HOW: A new helper pipeline was added containing this new grok processor and helper scripts which are invoked in a foreach processor from the parent pipeline. This approach had to be taken because with existing limitations of one sub processor per foreach and grok being unable to append to existing lists/arrays.

The Elasticsearch error produced is as follows:-

Processor 'conditional' failed with message '[scripting] Regular expression considered too many characters, pattern: [((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(250-5){3})){3})/)|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(250-5){3})){3})/)|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(250-5){3}))){3}))/)|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(250-5){3}))){3}))/)|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(250-5){3}))){3}))/)|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(250-5){3}))){3}))/)|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(250-5){3}))){3}))/)|:)))(%.+)??$], limit factor: [6], char limit: [114], count: [115], wrapped: [2a06:98c0:3600::103], this limit can be changed by changed by the [script.painless.regex.limit-factor] setting'

NOTE

The latest commit - elastic-packge build cleaned up a lot of formatting issues, hence there are added changes across many data streams. I think these are good in the long run hence kept them in.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

@ShourieG ShourieG requested review from a team as code owners November 22, 2024 10:54
@ShourieG ShourieG self-assigned this Nov 22, 2024
@ShourieG ShourieG added integration Label used for meta issues tracking each integration Integration:aws AWS bugfix Pull request that fixes a bug issue Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] labels Nov 22, 2024
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@ShourieG ShourieG added Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] Team:obs-ds-hosted-services Observability Hosted Services team [elastic/obs-ds-hosted-services] labels Nov 22, 2024
Copy link
Contributor

@kcreddy kcreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several IPV6 addresses that are failing with current change, but are successfully matched by previous regex.
Examples:

1::
1:2:3:4:5:6:7::
1::7:8
1:2:3:4:5::7:8
1::6:7:8
1:2:3:4::6:7:8
1::5:6:7:8
1:2:3::5:6:7:8
1::4:5:6:7:8
1:2::4:5:6:7:8
1::3:4:5:6:7:8
::2:3:4:5:6:7:8

Examples taken from here

@kcreddy
Copy link
Contributor

kcreddy commented Nov 22, 2024

@ShourieG , would be it possible to replace the script processor with foreach and grok with on_failure to use the default IPV6 pattern provided by the logstash?

@ShourieG
Copy link
Contributor Author

would be it possible to replace the script processor with foreach and grok with on_failure to use the default IPV6 pattern provided by the logstash?

@kcreddy, should we try it for both ipv4/v6 or only ipv6 and leave ipv4 as it is currently ?

@kcreddy
Copy link
Contributor

kcreddy commented Nov 22, 2024

@kcreddy, should we try it for both ipv4/v6 or only ipv6 and leave ipv4 as it is currently ?

Makes sense to try for both and use default IPV4 and IPV6 pattern provided. Example in use. It also helps to keep the pattern in sync in case of any future changes to it at source.

@elastic-vault-github-plugin-prod
Copy link

elastic-vault-github-plugin-prod bot commented Nov 22, 2024

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@ShourieG ShourieG changed the title [AWS][Cloudfront Logs] - Simplified the ipv6 regex pattern in Cloudfront logs data stream [AWS][Cloudfront Logs] - Implemented GROK processor based ipv6/v4 parsing in AWS Cloudfront Logs data stream Nov 22, 2024
@ShourieG
Copy link
Contributor Author

@kcreddy, have updated the logic with a relevant GROK processor but still some scripts were required for cleanup and handling the localhost edge-case scenario. There are some removals in the ip section based on the GROK patterns.

@ShourieG ShourieG added the enhancement New feature or request label Nov 22, 2024
- split:
field: _tmp.x_forwarded_for
separator: ","
target_field: _tmp.split_x_forwarded_for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have the same conditional thats used in original script processor.
if: ctx._tmp?.x_forwarded_for != null && ctx._tmp.x_forwarded_for != '-'

Copy link
Contributor

@kcreddy kcreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, LGTM 👍🏼

Could you also add the exact error to the PR.

Comment on lines 28 to 35
// append valid IPs
if (ctx._tmp.valid_ip != null) {
if (ctx.network.forwarded_ip == null) {
ctx.network.forwarded_ip = [];
}
if (!ctx.network.forwarded_ip.contains(ctx._tmp.valid_ip)){
ctx.network.forwarded_ip.add(ctx._tmp.valid_ip);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could possibly take the array-existence check to the base pipeline so that it is not run for every entry.
Or could also change script to append with allow_duplicates: false to be cleaner.
Same for invalid IPs.

Copy link
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested commit message body.

The regex pattern used for the ipv6 was overly complex and caused errors in
Elasticsearch. This change simplifies and fixes the grok-based IPv4/IPv4
parsing by adding a new helper pipeline invoked iteratively over ip fields
via a foreach. Also add a painless script to handle 'localhost' literal
addresses.

Also run elastic-package format.

I think that probably the format should have been in a separate PR.

if: ctx._tmp.invalid_ips != null
- script:
lang: painless
description: handle 'localhost' edge case scenario, currently not handled via grok
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: handle 'localhost' edge case scenario, currently not handled via grok
description: Handle 'localhost' edge case, currently not handled via grok.

@agithomas
Copy link
Contributor

LGTM!

Just a suggestion: Can we consider adding a sample or samples to the pipeline test to capture the pattern observed in the reported issue, handling of localhost etc?

Copy link
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with comments.

@elasticmachine
Copy link

💚 Build Succeeded

History

  • 💚 Build #18634 succeeded 3328d815eb6cfa58549b89b82a7206ec3069a8c9
  • 💚 Build #18613 succeeded 7d36c97b9d1ab3f47b32270cd23ad378217191fd
  • 💔 Build #18609 failed dd3e19759c44a5c9a45f17e9bf500af71c44bb78
  • 💚 Build #18603 succeeded 3cf60a7f3d4bb50b415a91fe50e1b98bbb8d6df1

cc @ShourieG

Copy link

Quality Gate failed Quality Gate failed

Failed conditions
74.6% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

@ShourieG
Copy link
Contributor Author

ShourieG commented Nov 25, 2024

LGTM!

Just a suggestion: Can we consider adding a sample or samples to the pipeline test to capture the pattern observed in the reported issue, handling of localhost etc?

Current tests already contain the localhost pattern @agithomas. The reported issue on the other hand was reporting errors with the existing regex pattern when compiling in elasticsearch.

@ShourieG ShourieG merged commit 29e226c into elastic:main Nov 25, 2024
4 of 5 checks passed
@ShourieG ShourieG deleted the bugfix/aws_cloudfront branch November 25, 2024 06:25
@elastic-vault-github-plugin-prod

Package aws - 2.32.0 containing this change is available at https://epr.elastic.co/package/aws/2.32.0/

harnish-crest-data pushed a commit to chavdaharnish/integrations that referenced this pull request Feb 4, 2025
…sing in AWS Cloudfront Logs data stream (elastic#11829)

* The regex pattern used for the ipv6 was overly complex and caused errors inElasticsearch. This change simplifies and fixes the grok-based IPv4/IPv4 parsing by adding a new helper pipeline invoked iteratively over ip fields via a foreach. Also add a painless script to handle 'localhost' literal addresses.

* Also run elastic-package format.
harnish-crest-data pushed a commit to chavdaharnish/integrations that referenced this pull request Feb 5, 2025
…sing in AWS Cloudfront Logs data stream (elastic#11829)

* The regex pattern used for the ipv6 was overly complex and caused errors inElasticsearch. This change simplifies and fixes the grok-based IPv4/IPv4 parsing by adding a new helper pipeline invoked iteratively over ip fields via a foreach. Also add a painless script to handle 'localhost' literal addresses.

* Also run elastic-package format.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Pull request that fixes a bug issue enhancement New feature or request Integration:aws AWS integration Label used for meta issues tracking each integration Team:obs-ds-hosted-services Observability Hosted Services team [elastic/obs-ds-hosted-services] Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants