Skip to content

ximgproc: optimize guidedfilter function for ARM64 using NEON intrinsics #3979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 4.x
Choose a base branch
from

Conversation

pratham-mcw
Copy link

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.

  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV

  • The PR is proposed to the proper branch

  • This PR introduces an ARM64-specific optimization for the add_mul function in edgeaware_filters_common.cpp using NEON intrinsics.

  • The optimization is applied only when CV_NEON is defined and the runtime NEON check (cv::checkHardwareSupport(CV_CPU_NEON)) passes.

  • The SIMD implementation leverages NEON instructions (vld1q_f32, vmulq_f32, vaddq_f32, vst1q_f32) to accelerate the fused multiply-add operation on 4-element float vectors.

  • This brings parity with existing x64 SIMD optimizations using SSE1.

  • The addition of the ARM64 NEON optimization for the add_mul function in edgeaware_filters_common.cpp has led to performance improvements in some tests of GuidedFilter function.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant