-
Notifications
You must be signed in to change notification settings - Fork 474
[Modsec] Fix date format & Json issues #3363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/security-external-integrations (Team:Security-External Integrations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution. I have a few comments around the date format. Also, can you please add a change log entry to packages/modsecurity/changelog.yml
and update the version in packages/modsecurity/manifest.yml
too.
packages/modsecurity/data_stream/auditlog/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear from the PR description what this is fixing (there is no issue and no comment on what the change is intended to address).
Can you explain why the old test inputs are being completely replaced? Were they wrong in some way? If so, please explain this in the PR. If they are not explicitly incorrect, please leave them and append new cases where needed.
A number of files have had there modes changed to include the execute bit set for all users. Please make sure that regular files are ugo-x unless the need to be otherwise.
Hi @efd6, I edited the pr description. Please kindly check. I used new test inputs with multiple date formats to check all the date formats were working well and also added new test inputs for the ModSecurity with apache. |
Many of the files still show an addition of the x flags on their modes and there are still significant non-append changes to inputs. Please revert the deletions in test cases or explain why they need to be deleted. |
Thank you, it is looking good now. Please resolve the conflict. |
Hello @efd6 , the conflicts are resolved. |
/test |
Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com>
Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com>
Please run |
Already run |
When I run it I get this diff --git a/packages/modsecurity/docs/README.md b/packages/modsecurity/docs/README.md
index a76c4ec07..3bb671561 100644
--- a/packages/modsecurity/docs/README.md
+++ b/packages/modsecurity/docs/README.md
@@ -59,6 +59,7 @@ The `Audit Log` dataset collects Modsecurity Audit logs.
| host.os.family | OS family (such as redhat, debian, freebsd, windows). | keyword |
| host.os.kernel | Operating system kernel version as a raw string. | keyword |
| host.os.name | Operating system name, without the version. | keyword |
+| host.os.name.text | Multi-field of `host.os.name`. | text |
| host.os.platform | Operating system platform (such centos, ubuntu, windows). | keyword |
| host.os.version | Operating system version as a raw string. | keyword |
| host.type | Type of host. For Cloud providers this can be the machine type like `t2.medium`. If vm, this could be the container, for example, or other information meaningful in your environment. | keyword |
@@ -85,6 +86,7 @@ The `Audit Log` dataset collects Modsecurity Audit logs.
| source.address | Some event source addresses are defined ambiguously. The event will sometimes list an IP, a domain or a unix socket. You should always store the raw address in the `.address` field. Then it should be duplicated to `.ip` or `.domain`, depending on which one it is. | keyword |
| source.as.number | Unique number allocated to the autonomous system. The autonomous system number (ASN) uniquely identifies each network on the Internet. | long |
| source.as.organization.name | Organization name. | keyword |
+| source.as.organization.name.text | Multi-field of `source.as.organization.name`. | match_only_text |
| source.geo.city_name | City name. | keyword |
| source.geo.continent_name | Name of the continent. | keyword |
| source.geo.country_iso_code | Country ISO code. | keyword |
@@ -99,16 +101,21 @@ The `Audit Log` dataset collects Modsecurity Audit logs.
| url.extension | The field contains the file extension from the original request url, excluding the leading dot. The file extension is only set if it exists, as not every url has a file extension. The leading period must not be included. For example, the value must be "png", not ".png". Note that when the file name has multiple extensions (example.tar.gz), only the last one should be captured ("gz", not "tar.gz"). | keyword |
| url.fragment | Portion of the url after the `#`, such as "top". The `#` is not part of the fragment. | keyword |
| url.original | Unmodified original url as seen in the event source. Note that in network monitoring, the observed URL may be a full URL, whereas in access logs, the URL is often just represented as a path. This field is meant to represent the URL as it was observed, complete or not. | wildcard |
+| url.original.text | Multi-field of `url.original`. | match_only_text |
| url.path | Path of the request, such as "/search". | wildcard |
| url.port | Port of the request, such as 443. | long |
| url.query | The query field describes the query string of the request, such as "q=elasticsearch". The `?` is excluded from the query string. If a URL contains no `?`, there is no query field. If there is a `?` but no query, the query field exists with an empty string. The `exists` query can be used to differentiate between the two cases. | keyword |
| url.scheme | Scheme of the request, such as "https". Note: The `:` is not part of the scheme. | keyword |
| user.name | Short name or login of the user. | keyword |
+| user.name.text | Multi-field of `user.name`. | match_only_text |
| user_agent.device.name | Name of the device. | keyword |
| user_agent.name | Name of the user agent. | keyword |
| user_agent.original | Unparsed user_agent string. | keyword |
+| user_agent.original.text | Multi-field of `user_agent.original`. | match_only_text |
| user_agent.os.full | Operating system name, including the version or code name. | keyword |
+| user_agent.os.full.text | Multi-field of `user_agent.os.full`. | match_only_text |
| user_agent.os.name | Operating system name, without the version. | keyword |
+| user_agent.os.name.text | Multi-field of `user_agent.os.name`. | match_only_text |
| user_agent.os.version | Operating system version as a raw string. | keyword |
| user_agent.version | Version of the user agent. | keyword |
|
- foreach: | ||
field: modsec.audit.messages | ||
ignore_missing: true | ||
processor: | ||
grok: | ||
field: '_ingest._value' | ||
ignore_failure: true | ||
patterns: | ||
- '%{GREEDYDATA} \[file "%{GREEDYDATA:modsec.audit.details.file}"\] \[line "%{GREEDYDATA:_temps.line_number}"\] \[id "%{GREEDYDATA:modsec.audit.details.rule_id}"\] \[msg "%{GREEDYDATA:modsec.audit.details.msg}"\] \[data "%{GREEDYDATA:modsec.audit.details.match}"\] \[severity "%{GREEDYDATA:modsec.audit.details.severity}"\] \[ver "%{DATA:modsec.audit.details.version}"\] %{GREEDYDATA:_temps.tags}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't going to do what we want. I would like to think about this some more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is currently feasible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should defer some of the changes here until they are more able to be handled. The changes that I'm suggesting don't lock us into anything in the future and retain the data in a reasonably searchable format.
packages/modsecurity/data_stream/auditlog/elasticsearch/ingest_pipeline/nginx-modsec.yml
Outdated
Show resolved
Hide resolved
packages/modsecurity/data_stream/auditlog/elasticsearch/ingest_pipeline/apache-modsec.yml
Outdated
Show resolved
Hide resolved
Thanks for your suggestion <3. I have updated it. Please take a look. |
/test |
Sorry there was a version bump. Can you resolve the conflicts? Looks good to me though after that. |
Is it necessary to set "ecs.version" to "8.3.0" due to the conflict in this file |
Yes, if you resolve the conflicts using the current tip, you will have |
/test |
This needs to have |
I already run that command before pushing. What can be the problem? |
When I run that locally, I get this diff diff --git a/packages/modsecurity/docs/README.md b/packages/modsecurity/docs/README.md
index 77024f1bb..38af18462 100644
--- a/packages/modsecurity/docs/README.md
+++ b/packages/modsecurity/docs/README.md
@@ -59,6 +59,7 @@ The `Audit Log` dataset collects Modsecurity Audit logs.
| host.os.family | OS family (such as redhat, debian, freebsd, windows). | keyword |
| host.os.kernel | Operating system kernel version as a raw string. | keyword |
| host.os.name | Operating system name, without the version. | keyword |
+| host.os.name.text | Multi-field of `host.os.name`. | text |
| host.os.platform | Operating system platform (such centos, ubuntu, windows). | keyword |
| host.os.version | Operating system version as a raw string. | keyword |
| host.type | Type of host. For Cloud providers this can be the machine type like `t2.medium`. If vm, this could be the container, for example, or other information meaningful in your environment. | keyword |
@@ -80,6 +81,7 @@ The `Audit Log` dataset collects Modsecurity Audit logs.
| source.address | Some event source addresses are defined ambiguously. The event will sometimes list an IP, a domain or a unix socket. You should always store the raw address in the `.address` field. Then it should be duplicated to `.ip` or `.domain`, depending on which one it is. | keyword |
| source.as.number | Unique number allocated to the autonomous system. The autonomous system number (ASN) uniquely identifies each network on the Internet. | long |
| source.as.organization.name | Organization name. | keyword |
+| source.as.organization.name.text | Multi-field of `source.as.organization.name`. | match_only_text |
| source.geo.city_name | City name. | keyword |
| source.geo.continent_name | Name of the continent. | keyword |
| source.geo.country_iso_code | Country ISO code. | keyword |
@@ -94,16 +96,21 @@ The `Audit Log` dataset collects Modsecurity Audit logs.
| url.extension | The field contains the file extension from the original request url, excluding the leading dot. The file extension is only set if it exists, as not every url has a file extension. The leading period must not be included. For example, the value must be "png", not ".png". Note that when the file name has multiple extensions (example.tar.gz), only the last one should be captured ("gz", not "tar.gz"). | keyword |
| url.fragment | Portion of the url after the `#`, such as "top". The `#` is not part of the fragment. | keyword |
| url.original | Unmodified original url as seen in the event source. Note that in network monitoring, the observed URL may be a full URL, whereas in access logs, the URL is often just represented as a path. This field is meant to represent the URL as it was observed, complete or not. | wildcard |
+| url.original.text | Multi-field of `url.original`. | match_only_text |
| url.path | Path of the request, such as "/search". | wildcard |
| url.port | Port of the request, such as 443. | long |
| url.query | The query field describes the query string of the request, such as "q=elasticsearch". The `?` is excluded from the query string. If a URL contains no `?`, there is no query field. If there is a `?` but no query, the query field exists with an empty string. The `exists` query can be used to differentiate between the two cases. | keyword |
| url.scheme | Scheme of the request, such as "https". Note: The `:` is not part of the scheme. | keyword |
| user.name | Short name or login of the user. | keyword |
+| user.name.text | Multi-field of `user.name`. | match_only_text |
| user_agent.device.name | Name of the device. | keyword |
| user_agent.name | Name of the user agent. | keyword |
| user_agent.original | Unparsed user_agent string. | keyword |
+| user_agent.original.text | Multi-field of `user_agent.original`. | match_only_text |
| user_agent.os.full | Operating system name, including the version or code name. | keyword |
+| user_agent.os.full.text | Multi-field of `user_agent.os.full`. | match_only_text |
| user_agent.os.name | Operating system name, without the version. | keyword |
+| user_agent.os.name.text | Multi-field of `user_agent.os.name`. | match_only_text |
| user_agent.os.version | Operating system version as a raw string. | keyword |
| user_agent.version | Version of the user agent. | keyword |
|
/test |
@emnp Would you please fix the conflicts and update the manifest version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What does this PR do?
In the previously released version, some logs are unparsed because of date format and some are because of JSON duplicate keys. I fixed that first and then found one issue #2859 opened.
The previous version was for ModSecurity with Nginx and according to this issue #2859, I also updated for the ModSecurity with apache logs.
Checklist
changelog.yml
file.How to test this PR locally
Related issues
Screenshots