-
Notifications
You must be signed in to change notification settings - Fork 474
[AWS][Cloudfront Logs] - Implemented GROK processor based ipv6/v4 parsing in AWS Cloudfront Logs data stream #11829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several IPV6 addresses that are failing with current change, but are successfully matched by previous regex.
Examples:
1::
1:2:3:4:5:6:7::
1::7:8
1:2:3:4:5::7:8
1::6:7:8
1:2:3:4::6:7:8
1::5:6:7:8
1:2:3::5:6:7:8
1::4:5:6:7:8
1:2::4:5:6:7:8
1::3:4:5:6:7:8
::2:3:4:5:6:7:8
Examples taken from here
@ShourieG , would be it possible to replace the script processor with |
@kcreddy, should we try it for both ipv4/v6 or only ipv6 and leave ipv4 as it is currently ? |
🚀 Benchmarks reportTo see the full report comment with |
@kcreddy, have updated the logic with a relevant GROK processor but still some scripts were required for cleanup and handling the localhost edge-case scenario. There are some removals in the ip section based on the GROK patterns. |
- split: | ||
field: _tmp.x_forwarded_for | ||
separator: "," | ||
target_field: _tmp.split_x_forwarded_for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have the same conditional thats used in original script processor.
if: ctx._tmp?.x_forwarded_for != null && ctx._tmp.x_forwarded_for != '-'
packages/aws/data_stream/cloudfront_logs/_dev/test/pipeline/test-cloudfront.log-expected.json
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit, LGTM 👍🏼
Could you also add the exact error to the PR.
// append valid IPs | ||
if (ctx._tmp.valid_ip != null) { | ||
if (ctx.network.forwarded_ip == null) { | ||
ctx.network.forwarded_ip = []; | ||
} | ||
if (!ctx.network.forwarded_ip.contains(ctx._tmp.valid_ip)){ | ||
ctx.network.forwarded_ip.add(ctx._tmp.valid_ip); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could possibly take the array-existence check to the base pipeline so that it is not run for every entry.
Or could also change script
to append
with allow_duplicates: false
to be cleaner.
Same for invalid IPs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested commit message body.
The regex pattern used for the ipv6 was overly complex and caused errors in
Elasticsearch. This change simplifies and fixes the grok-based IPv4/IPv4
parsing by adding a new helper pipeline invoked iteratively over ip fields
via a foreach. Also add a painless script to handle 'localhost' literal
addresses.
Also run elastic-package format.
I think that probably the format should have been in a separate PR.
if: ctx._tmp.invalid_ips != null | ||
- script: | ||
lang: painless | ||
description: handle 'localhost' edge case scenario, currently not handled via grok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: handle 'localhost' edge case scenario, currently not handled via grok | |
description: Handle 'localhost' edge case, currently not handled via grok. |
LGTM! Just a suggestion: Can we consider adding a sample or samples to the pipeline test to capture the pattern observed in the reported issue, handling of localhost etc? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with comments.
💚 Build Succeeded
History
cc @ShourieG |
|
Current tests already contain the localhost pattern @agithomas. The reported issue on the other hand was reporting errors with the existing regex pattern when compiling in elasticsearch. |
Package aws - 2.32.0 containing this change is available at https://epr.elastic.co/package/aws/2.32.0/ |
…sing in AWS Cloudfront Logs data stream (elastic#11829) * The regex pattern used for the ipv6 was overly complex and caused errors inElasticsearch. This change simplifies and fixes the grok-based IPv4/IPv4 parsing by adding a new helper pipeline invoked iteratively over ip fields via a foreach. Also add a painless script to handle 'localhost' literal addresses. * Also run elastic-package format.
…sing in AWS Cloudfront Logs data stream (elastic#11829) * The regex pattern used for the ipv6 was overly complex and caused errors inElasticsearch. This change simplifies and fixes the grok-based IPv4/IPv4 parsing by adding a new helper pipeline invoked iteratively over ip fields via a foreach. Also add a painless script to handle 'localhost' literal addresses. * Also run elastic-package format.
Type of change
Please label this PR with one of the following labels, depending on the scope of your change:
Proposed commit message
WHAT: Implemented GROK processor based ipv6/v4 parsing. Cleaned up formatting issues across all data streams via elastic-package format.
WHY: The regex pattern used for the ipv6 was overtly complex and caused errors in Elasticsearch. The cleanup was a byproduct of running elastic-package format.
HOW: A new helper pipeline was added containing this new grok processor and helper scripts which are invoked in a foreach processor from the parent pipeline. This approach had to be taken because with existing limitations of one sub processor per foreach and grok being unable to append to existing lists/arrays.
The Elasticsearch error produced is as follows:-
NOTE
The latest commit - elastic-packge build cleaned up a lot of formatting issues, hence there are added changes across many data streams. I think these are good in the long run hence kept them in.
Checklist
changelog.yml
file.Author's Checklist
How to test this PR locally
Related issues
Screenshots