Skip to content

aws: allow user specification of fields to retain in the cloudtrail data stream #14236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 4, 2025

Conversation

efd6
Copy link
Contributor

@efd6 efd6 commented Jun 17, 2025

Proposed commit message

aws: allow user specification of fields to retain in the cloudtrail data stream

Storage of the response_elements, request_parameters and additional_eventdata
is a potentially significant cost, but different users have different
requirements for their present, so there is no ideal approach. Given
that it is likely that this optimisation will be a common desire,
provide a UI option to allow users to easily configure this behaviour
without the requirement of adding processors to remove the fields in an
@custom pipeline. Note also that there is a TODO in the pipeline
addition here to move from a remove after creation model, spending
fruitless work, to a non-creation model, which would not be possible to
implement in an @custom pipeline.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

@efd6 efd6 self-assigned this Jun 17, 2025
@efd6 efd6 added enhancement New feature or request Integration:aws AWS Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] labels Jun 17, 2025
@efd6 efd6 force-pushed the 13500-cloudtrail_interim_solution branch from 7d17abf to 40b1eeb Compare June 17, 2025 06:35
@efd6 efd6 changed the title aws: allow user-specification of fields to retain in the cloudtrail data stream aws: allow user specification of fields to retain in the cloudtrail data stream Jun 17, 2025
@elastic-vault-github-plugin-prod
Copy link

elastic-vault-github-plugin-prod bot commented Jun 17, 2025

🚀 Benchmarks report

Package aws 👍(13) 💚(8) 💔(1)

Expand to view
Data stream Previous EPS New EPS Diff (%) Result
waf 6666.67 5649.72 -1016.95 (-15.25%) 💔

To see the full report comment with /test benchmark fullreport

@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@andrewkroh andrewkroh added the Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] label Jun 18, 2025
@efd6 efd6 requested review from romulets and strawgate June 19, 2025 01:42
@efd6 efd6 force-pushed the 13500-cloudtrail_interim_solution branch 2 times, most recently from aebfd82 to f5e0b91 Compare June 23, 2025 00:46
Comment on lines 199 to 202
Cloudtrail `response_elements`, `request_parameters` and `additional_eventdata` data can
be placed in keyword and text fields as JSON, and in flattened fields. Depending on requirements
This configuration determines which fields will be retained in the final document. The Minimal
option retains the minmal set of fields required for the Security Detection Engine rules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Cloudtrail `response_elements`, `request_parameters` and `additional_eventdata` data can
be placed in keyword and text fields as JSON, and in flattened fields. Depending on requirements
This configuration determines which fields will be retained in the final document. The Minimal
option retains the minmal set of fields required for the Security Detection Engine rules.
Cloudtrail `response_elements`, `request_parameters` and `additional_eventdata` data can
be placed in keyword and text fields as JSON, and in flattened fields. Depending on requirements
this configuration determines which fields will be retained in the final document. The Minimal
option retains the minimal set of fields required for the Security Detection Engine rules.

- text: Flattened
value: flattened
- text: Neither
value: none
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is minimal not applicable for this input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just forgot to add it; it came in later.

@efd6 efd6 requested a review from kcreddy June 30, 2025 03:36
Copy link
Contributor

@kcreddy kcreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after resolving conflict. Thanks!

efd6 added 3 commits July 2, 2025 17:28
…ata stream

Storage of the response_elements, request_parameters and additional_eventdata
is a potentially significant cost, but different users have different
requirements for their present, so there is no ideal approach. Given
that it is likely that this optimisation will be a common desire,
provide a UI option to allow users to easily configure this behaviour
without the requirement of adding processors to remove the fields in an
@Custom pipeline. Note also that there is a TODO in the pipeline
addition here to move from a remove after creation model, spending
fruitless work, to a non-creation model, which would not be possible to
implement in an @Custom pipeline.
The fields were identified by running the following shell script in the
the security_detection_engine/kibana/security_rule directory.

for f in *; do
  jq 'select(.attributes.required_fields != null)|.attributes.required_fields|.[]|select(.name != null)|select(.name|contains("cloudtrail.flattened"))|.name'<$f
done|sort|uniq

The test for this is derived from the test-copy-object-json.log test
case which includes one of the required fields and a number of other
fields under cloudtrail.flattened. So comparing the test added here to
that demonstrates whether is works.
@efd6 efd6 force-pushed the 13500-cloudtrail_interim_solution branch from c784f79 to 3253fb8 Compare July 2, 2025 07:58
Copy link
Member

@romulets romulets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.

Question. I'm actually not sure what is the httpjson stream. But why apply hbs changes to s3 and cloudwatch, but not to httpjson?

Comment on lines +1730 to +1739
required_flattened_fields:
- aws.cloudtrail.flattened.additional_eventdata.SSEApplied
- aws.cloudtrail.flattened.request_parameters.cidrIp
- aws.cloudtrail.flattened.request_parameters.dryRun
- aws.cloudtrail.flattened.request_parameters.fromPort
- aws.cloudtrail.flattened.request_parameters.includeDeprecated
- aws.cloudtrail.flattened.request_parameters.policyArn
- aws.cloudtrail.flattened.request_parameters.serialNumber
- aws.cloudtrail.flattened.request_parameters.withDecryption
- aws.cloudtrail.flattened.request_parameters.x-amz-server-side-encryption-customer-algorithm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally worry about this duplicated list from the detection engine. This will be easily missed in future iterations. Do you have thoughts on process to avoid it? Or maybe can we automate fetching this list on pre-commit? Or automate tests to verify consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the consequence of the technical debt that has been built up by the unconstrained use of fields in the detection rules. Fixing this would require a significant refactor and I think is outside the scope of this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree fixing is outside. But what is the strategy to keep consistency in the list with the rules over time?

@efd6
Copy link
Contributor Author

efd6 commented Jul 2, 2025

Question. I'm actually not sure what is the httpjson stream. But why apply hbs changes to s3 and cloudwatch, but not to httpjson?

This was left over when the HTTP JSON input was removed in #13246. There is #14200 to track this.

@efd6
Copy link
Contributor Author

efd6 commented Jul 3, 2025

/test

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @efd6

Copy link

@efd6 efd6 merged commit 9c3504d into elastic:main Jul 4, 2025
7 checks passed
@elastic-vault-github-plugin-prod

Package aws - 3.10.0 containing this change is available at https://epr.elastic.co/package/aws/3.10.0/

@efd6
Copy link
Contributor Author

efd6 commented Jul 6, 2025

Follow up issue: #14429

robester0403 pushed a commit to robester0403/integrations that referenced this pull request Jul 8, 2025
…ata stream (elastic#14236)

Storage of the response_elements, request_parameters and additional_eventdata
is a potentially significant cost, but different users have different
requirements for their present, so there is no ideal approach. Given
that it is likely that this optimisation will be a common desire,
provide a UI option to allow users to easily configure this behaviour
without the requirement of adding processors to remove the fields in an
@Custom pipeline. Note also that there is a TODO in the pipeline
addition here to move from a remove after creation model, spending
fruitless work, to a non-creation model, which would not be possible to
implement in an @Custom pipeline.
efd6 added a commit that referenced this pull request Jul 16, 2025
In #14236 we allowed users to select which extended fields they wanted
to retain in order to reduce storage costs in cases where they did not
what the full set of capacities that the data stream can provide. We
did not however prevent the work of collecting those unwanted fields.
This change does that, avoiding retaining fields that will ultimately
not be kept if possible.

It is unfortunate that the wide variety of fields is needed at all, but
resolving that depends on improving platform support for the diversity
of fields that the data source provides and then making more efficient
use of those improvements in the detection rules. Until then, this is
what we have.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Integration:aws AWS Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants