ximgproc: optimize guidedfilter function for ARM64 using NEON intrinsics #3979

pratham-mcw · 2025-07-29T07:36:49Z

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
This PR introduces an ARM64-specific optimization for the add_mul function in edgeaware_filters_common.cpp using NEON intrinsics.
The optimization is applied only when CV_NEON is defined and the runtime NEON check (cv::checkHardwareSupport(CV_CPU_NEON)) passes.
The SIMD implementation leverages NEON instructions (vld1q_f32, vmulq_f32, vaddq_f32, vst1q_f32) to accelerate the fused multiply-add operation on 4-element float vectors.
This brings parity with existing x64 SIMD optimizations using SSE1.
The addition of the ARM64 NEON optimization for the add_mul function in edgeaware_filters_common.cpp has led to performance improvements in some tests of GuidedFilter function.

ximgproc: optimize add_mul using NEON intrinsics for ARM64

07d31f6

Provide feedback