Elastic Defend - Causing Performance issue on Endpoints (Workplace)

Hello Everyone,

We’re currently facing performance issues in our developer environments after deploying Elastic Agents.

Environment Setup:

  • Elasticsearch: On-prem deployment, version 8.18.1
  • Elastic Agent: ~15,000 agents deployed
  • Integrations: System, Windows, and Elastic Defend (only log collection is enabled; all protections like malware, ransomware, etc., are disabled)
  • EDR: Sophos Central
  • Endpoints: Windows machines running Creo, Autodesk, and other heavy engineering tools

Problem Statement:
Even with protections disabled in Elastic Defend and exclusions added in both Sophos and Elastic, we still observe significant performance degradation on systems running resource-intensive applications (like Creo and Autodesk).

What we’ve tried so far:

  • Added Sophos binaries as trusted apps in Elastic
  • Added Creo, hmpalert.dll, and other known binaries/paths as trusted applications in Elastic
  • Configured event filters to exclude noisy events like process.executable for known safe processes
  • Confirmed that Sophos exclusions are in place (folders, file hashes, processes, etc.)

Issue:
Despite these configurations, the Elastic Endpoint still appears to scan or hook into these trusted applications. This causes noticeable performance lags during CAD operations and code builds in the developer machines.

Questions:

  1. Are there additional configurations needed to fully suppress scanning or interference by Elastic Endpoint on trusted applications?
  2. Is there a recommended strategy for co-existence of Elastic Defend and Sophos EDR in developer-heavy environments?
  3. Is there a way to validate if a trusted application entry is actively being honored, or if something else is still inspecting the executable?
  4. Would disabling certain Elastic endpoint modules (like diagnostics, process, etc.) improve performance further?

Any insights or suggestions from the community are greatly appreciated.

1 Like

This can get a bit complicated to troubleshoot but hopefully I can give you a few pointers that might help out.

Event filters are not likely to have a significant impact on performance. They filter events be sent out by Endpoint, but do not significantly change the processing and evaluation of those events on the host. (See the docs here... Docs are technically for 9. but haven't changed for event filters: Event filters | Elastic Docs)

Second, it might help to use the Endpoint's top command. If you're running it when you are experiencing the slowdown you want to resolve, it will show you where Endpoint knows it is spending time. It will show you what processing endpoint down and break it down by feature and process generating the events:

My initial thought is that something you've configured an event filter for likely needs to be converted to be a trusted app instead. If something you have an event filter for is showing up as causing a lot of processing in top, that is likely where the problem is. Can you see if that helps you get things tuned to an acceptable level?

1 Like

From my own experience, I’d recommend double-checking whether all relevant binaries are fully covered under the Trusted Applications configuration. In a similar case, we observed that using file hash-based trust alone didn’t always prevent Elastic from interacting with processes—especially after application updates or when dynamically spawned processes were involved. Switching to the “Signature” method helped reduce friction significantly.

That said, even with a trusted application configured, it's not entirely clear to what extent Elastic Defend disengages. According to Elastic’s own documentation:

“Trusted applications... don’t monitor the application for threats, nor do they generate alerts, even if it behaves like malware. They might improve performance, but still generate process events for visualizations or internal use.

You mention you “still observe significant performance degradation” on systems running resource-intensive applications. Could you elaborate on what “significant” means in your case? For example:

What are the average CPU and memory usage levels of the Elastic Agent, Elastic Endpoint, and Beats processes?
How does this compare to the resource usage of Sophos?
Do you observe any impact on the performance of the monitored applications themselves, such as CAD tools or compiler?

In my own testing with a different commercial EDR and Elastic Defend side-by-side, performance was mostly acceptable, though we did see isolated cases where Elastic related processes consumed up to 15% CPU, which is not negligible, especially on developer workstations.

I’ve also come across suggestions to disable process events in the Defend policy to reduce system load. While this may yield short term performance improvements, it comes at a high cost: it creates critical blind spots in your telemetry. Disabling process event collection undermines visibility into parent and child process chains, script execution, and lateral movement, which are core behaviors that many detections rely on. In essence, you are trading security depth for performance. That said, if your environment has compensating controls and you have accepted the residual risk, it might be worth trialing a configuration with process events disabled on specific endpoints.

Also when you create a diagnostics, the user-artifacts folder in the zip contains the endpoint-trustlist-windows-v1.

.\elastic-endpoint.exe diagnostics --log stdout --log-level debug; Start-Sleep -Seconds 2; $zip = Get-ChildItem "$env:WINDIR\TEMP\endpoint-diagnostics-*.zip" | Sort-Object LastWriteTime -Descending | Select-Object -First 1; $ts = Get-Date -Format "yyyyMMdd-HHmmss"; $destZip = Join-Path $HOME $zip.Name; Move-Item $zip.FullName $destZip -Force; Expand-Archive -Path $destZip -DestinationPath "$HOME\Diagnostics-$ts" -Force

When you tune and run

Add-Type -AssemblyName System.IO.Compression.FileSystem

$encodedFile = "$HOME\Diagnostics-<timestamp>\user-artifacts\endpoint-trustlist-windows-v1"
$decodedFile = "$HOME\trustlist-windows-decoded.json"
$inputStream = New-Object System.IO.FileStream($encodedFile, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read)
$null = $inputStream.ReadByte()
$null = $inputStream.ReadByte()
$deflateStream = New-Object System.IO.Compression.DeflateStream($inputStream, [System.IO.Compression.CompressionMode]::Decompress)
$memoryStream = New-Object System.IO.MemoryStream
$buffer = New-Object byte[] 4096
while (($read = $deflateStream.Read($buffer, 0, $buffer.Length)) -gt 0) {
    $memoryStream.Write($buffer, 0, $read)
}
$deflateStream.Close()
$inputStream.Close()
[System.IO.File]::WriteAllBytes($decodedFile, $memoryStream.ToArray())
notepad $decodedFile

You can see the content of user-artifacts\endpoint-trustlist-windows-v1 in notepad and check if the trusted applications you configured are there. The script was provided by ChatGPT and worked for my test system. Test it on a nonproduction system first please. :slight_smile: (There might be a better way to verify the existance of the correct Trusted Applications)