← Back to Projects
Compiler optimizations for improving performance of Harris Corner Detection Algorithm on multicore/SIMD CPUs
The objective of the assignment is to optimize and tune the Harris corner detection algorithm for performance using locality, SIMD and multicore parallelism transformations. We tune the
Using suitable compiler flags, transforms and optimizations we obtain a speed up of 11.5X over unparallelized reference implementation and 13.5X over OpenCV using GCC 4.9.2 compiler and 11.3X over unparallelized reference implementation and 14.6X over OpenCV using ICC 15.0 compiler. All experiments were performaned on Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz [Haswell μarch, 4 core, 64 KB L1 private / 256 KB L2 private / 8 MB L3 shared cache].
Detailed performance speedup comparison of ICC vs GCC and vectorization, parallelism etc. available in the below report.
The complete code for the project is available on github here https://github.com/adarshpatil/e0255-opt-asst
Speedup and Execution time (in ms) by vectorization and locality transforms
OpenCV | Reference | Optimized | Speedup by locality transforms | |
No Vectorize | 3515.29 | 3767.32 | 2442.4 | 1.54 |
Vectorize | 3566.35 | 3035.41 | 930.90 | 3.26 |
Vectorization | - | 1.24x | 2.62x | |
Speedup |
Speedup and Execution time (in ms) using ICC 15.0
OpenCV | Reference | Optimized | Speedup w.r.t Reference | |
1 core | 3567.95 | 2755.83 | 904.61 | 3.04x |
2 core | - | 1617.88 | 355.724 | 4.54x |
4 core | - | 1444.89 | 243.19 | 5.94x |
Speedup by | - | 1.90x | 3.72x | |
Parallelism |
Speedup and Execution time (in ms) using GCC 4.9.2
OpenCV | Reference | Optimized | Speedup w.r.t Reference | |
1 core | 3566.35 | 3035.41 | 930.90 | 3.26x |
2 core | - | 1990.6 | 422.54 | 4.71x |
4 core | - | 1940.92 | 264.73 | 7.34x |
Speedup by | - | 1.56x | 3.52x | |
Parallelism |
- Compiler Optimizations
- SIMD Vectorization
- AVX / SSE2
- Parallelization / OpenMP
- Image/Video processing algo
- Course Project