Skip to content

[Cloudflare Logpush] Fix data type for http_request.bot.detection_tags field #13581

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 6, 2025

Conversation

moxarth-rathod
Copy link
Contributor

@moxarth-rathod moxarth-rathod commented Apr 17, 2025

Proposed commit message

cloudflare_logpush: fix data type for `http_request.bot.detection_tags` field

Previously the `http_request.bot.detection_tags` field was mapped wrongly, so this is fixed 
by updating the data type from `long` to `keyword`.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

How to test this PR locally

  • Clone integrations repo.
  • Install elastic package locally.
  • Start elastic stack using elastic-package.
  • Move to integrations/packages/cloudflare_logpush directory.
  • Run the following command to run tests.

elastic-package test

Related issues

@moxarth-rathod moxarth-rathod added Integration:cloudflare_logpush Cloudflare Logpush bugfix Pull request that fixes a bug issue Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] Team:Sit-Crest Crest developers on the Security Integrations team [elastic/sit-crest-contractors] labels Apr 17, 2025
@moxarth-rathod moxarth-rathod self-assigned this Apr 17, 2025
@moxarth-rathod moxarth-rathod requested a review from a team as a code owner April 17, 2025 09:01
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@elastic-vault-github-plugin-prod
Copy link

elastic-vault-github-plugin-prod bot commented Apr 17, 2025

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@leandrojmp
Copy link
Contributor

Shouldn't the version be increased to 1.38.0 instead of 2.0.0 ?

@moxarth-rathod
Copy link
Contributor Author

Shouldn't the version be increased to 1.38.0 instead of 2.0.0 ?

Since this is a breaking change to the current integration, we've increased a major version.

@leandrojmp
Copy link
Contributor

@moxarth-rathod

I see, but this is a requirement? Because this could make the integration version increase too fast, I've just reported another mapping error that would lead to another breaking change, so in a couple of weeks this integration version could jump from 1.37.2 into 3.0.0, this may give the false impression that a lot is being changed, but it is just fixing mappings.

There are cases where the PR was breaking-change, but only the minor version was upgraded, not the major.

@moxarth-rathod
Copy link
Contributor Author

@moxarth-rathod

I see, but this is a requirement? Because this could make the integration version increase too fast, I've just reported another mapping error that would lead to another breaking change, so in a couple of weeks this integration version could jump from 1.37.2 into 3.0.0, this may give the false impression that a lot is being changed, but it is just fixing mappings.

There are cases where the PR was breaking-change, but only the minor version was upgraded, not the major.

Got it! I've modified the version to 1.38.0.
cc: @kcreddy @ShourieG

@ShourieG
Copy link
Contributor

ShourieG commented Apr 22, 2025

was mapped wrongly,

@moxarth-rathod, I don't think this should be categorized as a breaking change. This is a mapping issue and is technically a bugfix. Breaking change is something where the schema or rules of mapping change drastically and might even break current indexing rules. It could also be a complete rework of the underlying input or a change that breaks existing state such that it is incompatible with the previous version. This is also not an enhancement hence we should not increment the minor version, only the patch version should be incremented in case of a bugfix.

@kcreddy
Copy link
Contributor

kcreddy commented Apr 22, 2025

Agree with @ShourieG here. In this case the field is never indexed and is being ignored. So it wouldn't result in any mapping conflicts (long vs keyword).

@moxarth-rathod moxarth-rathod requested a review from ShourieG April 22, 2025 12:32
@leandrojmp
Copy link
Contributor

leandrojmp commented Apr 22, 2025

@ShourieG @kcreddy

This change will lead to a mapping conflict in Kibana, the user may not be able to use the field until all older indices with the wrong mapping are removed from the cluster,shouldn't this be considered a breaking change?

In this specific case it may not have much impact because this field was added recently, but with the wrong mapping.

In this case the field is never indexed and is being ignored. So it wouldn't result in any mapping conflicts (long vs keyword).

Changing the mapping from long into keyword will make the values to not be ignored anymore and be indexed, but it will still lead to conflicts in Kibana as the field will have different mappings accross different backing indices, this in practice render the field useless until the older indices are removed or a runtime field mapping is created for the older indices, which can be expensive in terms of performance.

@ShourieG
Copy link
Contributor

@ShourieG @kcreddy

This change will lead to a mapping conflict in Kibana, the user may not be able to use the field until all older indices with the wrong mapping are removed from the cluster,shouldn't this be considered a breaking change?

In this specific case it may not have much impact because this field was added recently, but with the wrong mapping.

In this case the field is never indexed and is being ignored. So it wouldn't result in any mapping conflicts (long vs keyword).

Changing the mapping from long into keyword will make the values to not be ignored anymore and be indexed, but it will still lead to conflicts in Kibana as the field will have different mappings accross different backing indices, this in practice render the field useless until the older indices are removed or a runtime field mapping is created for the older indices, which can be expensive in terms of performance.

Hi @leandrojmp, could you elaborate a bit more on this mapping conflict and performance impact. I was assuming that it would not be a major issue as this field is not used inside any dashboard/visualizations and it was not even indexed previously. With an index rollover and sufficient time passage won't this issue self-correct ?

@kcreddy
Copy link
Contributor

kcreddy commented Apr 23, 2025

@moxarth-rathod, it will be nice to test this scenario (if mapped but unindexed field can cause conflicts).

@leandrojmp, atm we are working on communicating breaking changes to the users better: elastic/package-spec#817 through UI changes.
Until then we have been handling it by incrementing major version as something that might be potentially breaking due to a major release.

@leandrojmp
Copy link
Contributor

Hi @leandrojmp, could you elaborate a bit more on this mapping conflict and performance impact. I was assuming that it would not be a major issue as this field is not used inside any dashboard/visualizations and it was not even indexed previously. With an index rollover and sufficient time passage won't this issue self-correct ?

@ShourieG sure!

The fact that the field was not indexed does not matter for the conflict, the field is explictly mapped, so when the backing index for the data stream is create it will have the wrong mapping.

When we send a document with strings for a field that has a long type, it will be ignored, but only because index.mapping.ignore_malformed was set to true as default on integrations data streams not long ago, if this was still set to false, the events would be dropped.

If the mapping is corrected to keyword and we rollover the data stream, this will fix the mapping for new backing indices and on these backing indices the field will be indexed, but this will lead to a conflict in Kibana.

When a data view or index pattern matches multiple indices and a field has conflicting mappings on those indices, the field cannot be used in Kibana, you cannot filter on it, you cannot search on it and you cannot use it in visualizations.

There are 2 main ways to solve this:

  • Remove the indices with the wrong mapping from the cluster.
  • Fix the mapping with the a runtime field on the old indices.

The performance impact could happen if you use runtime fields with by definition have a performance impact, it is one of their compromises, specially in large datasets, which is normally the case for Cloudflare logs.

So in resume, changing the mapping and rolling over the index fixes the mapping issues in Elasticsearch and new index will be able to index this field, but it will result in conflicts in Kibana, which makes the field unusable until the conflicts are solved.

it will be nice to test this scenario (if mapped but unindexed field can cause conflicts).

@kcreddy

What cause the conflict in Kibana is the mapping, it doesn't matter if there is data on the index or not, if you create two empty indices where a field has different mapping and use those indices on the same index pattern, Kibana will complain about a mapping conflict even without any data on the indices.

You can confirm this pretty easly with the following steps:

First create 2 indices where a field has a different mapping:

PUT test_index_1
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "test_field": { "type": "long" }
    }
  }
}
PUT test_index_2
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "test_field": { "type": "keyword" }
    }
  }
}

Then on Kibana create a dataview for the index pattern test_index_*, you will see that even without data Kibana will complain about the conflict.

Screenshot from 2025-04-23 10-51-23

@ShourieG
Copy link
Contributor

ShourieG commented Apr 24, 2025

@andrewkroh, would be awesome to have your thoughts on this. I feel because the particular field in question is not used in any dashboard/visualization, with an index rollover and passage of time, the conflicts should auto-correct but I also understand the concern here. What would be the best way to move forward ? One non intrusive option is to introduce a new field with the correct data type while keeping the existing field as is, but it feels like a hack/workaround.

@leandrojmp
Copy link
Contributor

I feel because the particular field in question is not used in any dashboard/visualization, with an index rollover and passage of time, the conflicts should auto-correct but I also understand the concern here.

Yeah, the passage of time and removal of old indices from the cluster would solve the Kibana conflicts, also this field was added not long ago on this commit merged on April 2nd and since it has the wrong mapping, there is no one using it, so it is expected to not break anything, Kibana will just start showing a conflict for this field, if people start investigating the reason for that conflict they could easily find this issue which has the explanation on previous comments.

I've just noticed the mistake in the mapping earlier because I'm migrating from a custom ingestion with Logstash to the native one, and some fields that we use for Security rules where missing or have the wrong mapping in this case.

Also, this requires extra features on the Cloudflare side, which limits even more the number of people that may be affected.

@efd6
Copy link
Contributor

efd6 commented Apr 28, 2025

@leandrojmp Thanks for providing this information. I can see that the introduction of the incorrect field definition falls on my shoulders in #12782; I even documented the type correctly in the owning issue. Unfortunately, this addition was in the set of changes where I was unable to obtain any test samples. Would you be in a position to be able to donate test samples to improve the test coverage for the package?

Copy link
Contributor

@ShourieG ShourieG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moxarth-rathod, approving this since as discussed with @leandrojmp the real world impact will be minimal and the issue will eventually self correct with time. But I'd wait for @andrewkroh or @efd6 to give their sign off on this issue before merging.

@efd6
Copy link
Contributor

efd6 commented Apr 28, 2025

I'm happy with this, but I think Andrew should have the final say.

@leandrojmp
Copy link
Contributor

@efd6 you mean, some sample events from real usage? Sure!

I can get some, redact sensitive information and share here.

@efd6
Copy link
Contributor

efd6 commented Apr 28, 2025

@leandrojmp That would be wonderful. Note the concern that I expressed in the PR that introduced this.

@moxarth-rathod
Copy link
Contributor Author

I can get some, redact sensitive information and share here.

@leandrojmp did you get a chance to collect some sample events?

@@ -1,4 +1,9 @@
# newer versions go on top
- version: "1.37.3"
changes:
- description: Fix field type of `http_request.bot.detection_tags` field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would like more details in this message. e.g. What type was it? What type is is now? What problem does it fix?

Something like this gives a more complete view of the problem being fixed by the update.

Fix the field type of http_request.bot.detection_tags from long to keyword, aligning it with the log source type of string array. This resolves an issue where the field was being ignored due to the "ignored_malformed" index setting, making it unusable in searches.

@moxarth-rathod moxarth-rathod requested a review from andrewkroh May 5, 2025 06:13
@elasticmachine
Copy link

💚 Build Succeeded

History

cc @moxarth-rathod

Copy link

@efd6 efd6 merged commit 7798cc0 into elastic:main May 6, 2025
7 checks passed
@elastic-vault-github-plugin-prod

Package cloudflare_logpush - 1.37.4 containing this change is available at https://epr.elastic.co/package/cloudflare_logpush/1.37.4/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Pull request that fixes a bug issue Integration:cloudflare_logpush Cloudflare Logpush Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] Team:Sit-Crest Crest developers on the Security Integrations team [elastic/sit-crest-contractors]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Cloudflare Logpush]: Wrong mapping on field cloudflare_logpush.http_request.bot.detection_tags
7 participants