Skip to content

[Cloud Security] add misconfiguration latest transform to Wiz integration #10965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 19, 2024
Merged

[Cloud Security] add misconfiguration latest transform to Wiz integration #10965

merged 6 commits into from
Sep 19, 2024

Conversation

maxcold
Copy link
Contributor

@maxcold maxcold commented Sep 2, 2024

Proposed commit message

Introducing the latest transform for Wiz Cloud Configuration Finding data to support CDR workflows as per [Guide] Adapting data stream to Cloud Security in Kibana

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

  • [ ]

How to test this PR locally

  1. build the package elastic-package build
  2. run stack elastic-package stack up -v -d
  3. Install Wiz integration, enable Cloud Configuration Finding data stream
  4. check that the transform is created and started. It should be healthy after some time
  5. check that Wiz's findings are available in the Findings page

Related issues

Screenshots

@maxcold
Copy link
Contributor Author

maxcold commented Sep 6, 2024

/ci

@elasticmachine
Copy link

elasticmachine commented Sep 9, 2024

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@maxcold maxcold marked this pull request as ready for review September 12, 2024 15:41
@maxcold maxcold requested a review from a team as a code owner September 12, 2024 15:41
@maxcold maxcold requested review from CohenIdo and a team September 12, 2024 15:41
@andrewkroh andrewkroh added the Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] label Sep 12, 2024
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@maxcold maxcold changed the title [WIP][Cloud Security] add misconfiguration latest transform to Wiz integration [Cloud Security] add misconfiguration latest transform to Wiz integration Sep 13, 2024
move_on_creation: true
latest:
unique_key:
- rule.uuid
Copy link
Contributor

@CohenIdo CohenIdo Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using rule.uuid instead of rule.id? The native Transform utilizes rule.id, and I noticed that it is also part of the ECS schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CohenIdo this is a valid question, the reasons I chose rule.uuid for the unique key are the following:

  1. our native integration doesn't provide rule.uuid but rule.id has the format fe083488-fa0f-5408-9624-ac27607ac2ff which is basically the uuid as per ECS schema
  2. As per ECS rule.id is a unique id in some context and this context is narrower than for rule.uuid. In the context of Cloud Security it might be that the id is unique within the benchmark, but not unique across multiple benchmarks, therefore I thought using uuid allow for better uniqueness, taking into account that our native also uses uuid, while providing it in the rule.id field
    I will reflect this in the guide, sharing that the id should be unique across benchmarks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For user convenience, I believe it's worth using the same field names across integrations.

However, if you decide to go with rule.uuid, let's at least create a ticket to migrate from rule.id to rule.uuid in the native integration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For user convenience, I believe it's worth using the same field names across integrations.

Overall I agree that consistency is better, but do you have a specific influence on the UX if we go for a different field between Wiz and CSP? I could not think of any specific downside for the users

However, if you decide to go with rule.uuid, let's at least create a ticket to migrate from rule.id to rule.uuid in the native integration.

I summarised all current mismatches between Wiz and our native CSP integration in this document https://docs.google.com/document/d/1GG3st6wR-9lNh4HjHsNlam5lo2oWceRK67-IcvyIVjk/edit , the mismatch between rule.uuid and rule.id is one of the points there. I will create GitHub issues after I gather some feedback on these points. Can you also review the document?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One scenario is where a user is investigating misconfiguration documents and needs to determine the rule ID field for each integration. Or, when a user wants to add a rule ID column to the data grid and has both integrations installed, they would have to add two columns that represent the same information.

Copy link
Contributor Author

@maxcold maxcold Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CohenIdo both use cases related to the rule.id vs rule.uuid in general, and not to which field to use in the transform as far as I understand. Whatever field we choose for the transform, these problems will stay due to our native integration not following ECS and providing uiud in rule.id and not providing rule.uuid at all. I believe we should not make changes in Wiz to make it not compliant with ECS just for it to be consistent with our native integration. I think we rather need to fix our native integration to be compliant with ECS and then the problem will be resolved properly

@maxcold maxcold requested a review from CohenIdo September 17, 2024 09:30
Copy link
Contributor

@kcreddy kcreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transform's manifest.yml could be defined as per this spec: https://github.com/elastic/package-spec/blob/main/spec/integration/elasticsearch/transform/manifest.spec.yml

It contains properties such as start, but more importantly destination_index_template.settings.index.sort which defines the sort order of documents within the index. According to Elasticsearch doc, we don't seem to have default field for index.sort.

By default Lucene does not apply any sort. The index.sort.* settings define which fields should be used to sort the documents inside each Segment.

So I think we have to define this manually. Example: https://github.com/elastic/integrations/blob/main/packages/ti_threatconnect/elasticsearch/transform/latest/manifest.yml

- name: event.outcome
external: ecs
- name: observer.vendor
external: ecs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are few more ECS fields being populated in the source datastream's ingest pipeline, like cloud.region, event.category, etc. Shouldn't we be defininig them as well?

@maxcold
Copy link
Contributor Author

maxcold commented Sep 18, 2024

@kcreddy thanks for the code review. Regarding the start, we decided not to specify it as true seems to be a default, meaning the transform starts right away anyway.
As for destination_index_template.settings.index.sort, it's a good point, though we don't have this setting on our native latest transform (the one transforming data coming from cloud_security_posture integration). I wonder what is the effect of this sort setting not being set as we haven't experienced any downside yet with our native integration. Do you know what exactly is the effect for the users if we leave it out? If it requires further investigation, I would leave this setting out to be consistent with our native integration

@maxcold maxcold requested a review from kcreddy September 18, 2024 21:17
Copy link

Quality Gate failed Quality Gate failed

Failed conditions
0.5% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

@elasticmachine
Copy link

💚 Build Succeeded

History

@kcreddy
Copy link
Contributor

kcreddy commented Sep 19, 2024

@maxcold,

I wonder what is the effect of this sort setting not being set as we haven't experienced any downside yet without native integration. Do you know what exactly is the effect for the users if we leave it out? If it requires further investigation, I would leave this setting out to be consistent with our native integration

Based on the docs, I can see that when index.sort is set, the documents are stored in sorted order inside the segments within a shard. This makes searches such as Get first N documents faster. Even for such queries, note that track_total_hits needs to be false to disable the aggregation being performed on all documents.
Another benefit that docs talk about it having efficient conjunctions, but they only seem to work on low-cardinality fields.

There are also caveats mentioned regarding performance while indexing the documents with index.sort enabled:

Index sorting also has a cost in terms of indexing throughput since documents must be sorted at flush and merge time.

The default source data-stream backed indices of logs-* also don't seem to contain the setting index.sort.

I agree that we should not set this property especially since you haven't experienced any downside yet in native integration.

@maxcold maxcold merged commit d9da904 into elastic:main Sep 19, 2024
4 of 5 checks passed
@maxcold maxcold deleted the csp-add-wiz-misconfigruation-findings-transform branch September 19, 2024 11:47
@elastic-vault-github-plugin-prod

Package wiz - 2.0.0 containing this change is available at https://epr.elastic.co/search?package=wiz

harnish-crest-data pushed a commit to chavdaharnish/integrations that referenced this pull request Feb 4, 2025
…tion (elastic#10965)

* add misconfiguration latest transform to Wiz integration

* add PR link to changelog

* add ecs mapping to wiz latest cdr misconfigurations transform

* add missing ecs fields to wiz misconfiguration transform mapping
harnish-crest-data pushed a commit to chavdaharnish/integrations that referenced this pull request Feb 5, 2025
…tion (elastic#10965)

* add misconfiguration latest transform to Wiz integration

* add PR link to changelog

* add ecs mapping to wiz latest cdr misconfigurations transform

* add missing ecs fields to wiz misconfiguration transform mapping
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Integration:wiz Wiz Team:Cloud Security Cloud Security team [elastic/cloud-security-posture] Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants