mirror of
https://github.com/zebrajr/opencv.git
synced 2026-01-15 12:15:17 +00:00
8efc0fd47be47730a2c13c571b2ee450a9b74c80
dnn: add SVE optimized fastGEMM1T function and SVE dispatch #28055 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch **Description** This PR enables fastGemm1t vectorized with SVE for AARCH64 architecture that called by recurrent layers and fully connected layers with SVE dispatching mechanism. **ARM Compatibility:** Modified the build scripts, and configuration files to ensure compatibility with ARM processors. **Checklist** Code changes have been tested on ARM devices (Graviton3). **Modifications** - Implemented FastGemm1T kernel in SVE with Vector length agnostic approach. - Added Flags and checks to call our ported Kernel in Recurrent Layer and FullyConnected layer. - Changes made to cmakelist.txt to dispatch our ported kernel for SVE. - Flag OpenCV Dispatch with SVE optimization is added to support SVE implemented kernel for OpenCV. According to OpenCV build optimization https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options cmake \ -DCPU_BASELINE=NEON\ -D CPU_DISPATCH=SVE\ **Performance Improvement** - The suggested optimizations Improves the performance of LSTM layer and fully connected layer. <html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/jaiswaln/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/jaiswaln/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> <style> <!--table {mso-displayed-decimal-separator:"\."; mso-displayed-thousand-separator:"\,";} @page {margin:.75in .7in .75in .7in; mso-header-margin:.3in; mso-footer-margin:.3in;} tr {mso-height-source:auto;} col {mso-width-source:auto;} br {mso-data-placement:same-cell;} td {padding-top:1px; padding-right:1px; padding-left:1px; mso-ignore:padding; color:black; font-size:11.0pt; font-weight:400; font-style:normal; text-decoration:none; font-family:"Aptos Narrow", sans-serif; mso-font-charset:0; mso-number-format:General; text-align:general; vertical-align:bottom; border:none; mso-background-source:auto; mso-pattern:auto; mso-protection:locked visible; white-space:nowrap; mso-rotate:0;} .xl63 {border:.5pt solid windowtext;} .xl64 {text-align:center;} .xl65 {text-align:center; border:.5pt solid windowtext;} --> </style> </head> <body link="#467886" vlink="#96607D"> Name of Test | dnn_neon | dnn_sve | dnn_sve vs dnn_neon(x-factor) -- | -- | -- | -- lstm::Layer_LSTM::BATCH=1, IN=64, HIDDEN=192, TS=100 | 2.878 | 2.326 | 1.24 lstm::Layer_LSTM::BATCH=1, IN=192, HIDDEN=192, TS=100 | 4.162 | 3.08 | 1.35 lstm::Layer_LSTM::BATCH=1, IN=192, HIDDEN=512, TS=100 | 18.627 | 16.152 | 1.15 lstm::Layer_LSTM::BATCH=1, IN=1024, HIDDEN=192, TS=100 | 10.98 | 7.976 | 1.38 lstm::Layer_LSTM::BATCH=64, IN=64, HIDDEN=192, TS=2 | 4.41 | 3.459 | 1.27 lstm::Layer_LSTM::BATCH=64, IN=192, HIDDEN=192, TS=2 | 6.567 | 4.807 | 1.37 lstm::Layer_LSTM::BATCH=64, IN=192, HIDDEN=512, TS=2 | 28.471 | 22.909 | 1.24 lstm::Layer_LSTM::BATCH=64, IN=1024, HIDDEN=192, TS=2 | 15.491 | 12.537 | 1.24 lstm::Layer_LSTM::BATCH=128, IN=64, HIDDEN=192, TS=2 | 8.848 | 6.821 | 1.3 lstm::Layer_LSTM::BATCH=128, IN=192, HIDDEN=192, TS=2 | 12.969 | 9.522 | 1.36 lstm::Layer_LSTM::BATCH=128, IN=192, HIDDEN=512, TS=2 | 55.52 | 45.746 | 1.21 lstm::Layer_LSTM::BATCH=128, IN=1024, HIDDEN=192, TS=2 | 31.226 | 26.132 | 1.19 </body> </html> <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/jaiswaln/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/jaiswaln/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> <style> <!--table {mso-displayed-decimal-separator:"\."; mso-displayed-thousand-separator:"\,";} @page {margin:.75in .7in .75in .7in; mso-header-margin:.3in; mso-footer-margin:.3in;} tr {mso-height-source:auto;} col {mso-width-source:auto;} br {mso-data-placement:same-cell;} td {padding-top:1px; padding-right:1px; padding-left:1px; mso-ignore:padding; color:black; font-size:11.0pt; font-weight:400; font-style:normal; text-decoration:none; font-family:"Aptos Narrow", sans-serif; mso-font-charset:0; mso-number-format:General; text-align:general; vertical-align:bottom; border:none; mso-background-source:auto; mso-pattern:auto; mso-protection:locked visible; white-space:nowrap; mso-rotate:0;} .xl65 {border:.5pt solid windowtext;} .xl66 {text-align:center;} .xl67 {text-align:center; border:.5pt solid windowtext;} --> </style> </head> <body link="#467886" vlink="#96607D"> Name of Test | dnn_neon | dnn_sve | dnn_sve vs dnn_neon(x-factor) -- | -- | -- | -- fc::Layer_FullyConnected::([5, 16, 512, 128], 256, false, OCV/CPU) | 5.086 | 4.483 | 1.13 fc::Layer_FullyConnected::([5, 16, 512, 128], 256, true, OCV/CPU) | 8.512 | 8.347 | 1.02 fc::Layer_FullyConnected::([5, 16, 512, 128], 512, false, OCV/CPU) | 9.467 | 8.965 | 1.06 fc::Layer_FullyConnected::([5, 16, 512, 128], 512, true, OCV/CPU) | 14.855 | 13.527 | 1.1 fc::Layer_FullyConnected::([5, 16, 512, 128], 1024, false, OCV/CPU) | 18.821 | 18.023 | 1.04 fc::Layer_FullyConnected::([5, 16, 512, 128], 1024, true, OCV/CPU) | 27.558 | 24.966 | 1.1 fc::Layer_FullyConnected::([5, 512, 384, 0], 256, false, OCV/CPU) | 0.924 | 0.804 | 1.15 fc::Layer_FullyConnected::([5, 512, 384, 0], 256, true, OCV/CPU) | 1.259 | 1.126 | 1.12 fc::Layer_FullyConnected::([5, 512, 384, 0], 512, false, OCV/CPU) | 1.957 | 1.655 | 1.18 fc::Layer_FullyConnected::([5, 512, 384, 0], 512, true, OCV/CPU) | 2.831 | 2.775 | 1.02 fc::Layer_FullyConnected::([5, 512, 384, 0], 1024, false, OCV/CPU) | 5.92 | 6.379 | 0.93 fc::Layer_FullyConnected::([5, 512, 384, 0], 1024, true, OCV/CPU) | 8.924 | 8.993 | 0.99 </body> </html>
OpenCV: Open Source Computer Vision Library
Resources
- Homepage: https://opencv.org
- Courses: https://opencv.org/courses
- Docs: https://docs.opencv.org/4.x/
- Q&A forum: https://forum.opencv.org
- previous forum (read only): http://answers.opencv.org
- Issue tracking: https://github.com/opencv/opencv/issues
- Additional OpenCV functionality: https://github.com/opencv/opencv_contrib
- Donate to OpenCV: https://opencv.org/support/
Contributing
Please read the contribution guidelines before starting work on a pull request.
Summary of the guidelines:
- One pull request per issue;
- Choose the right base branch;
- Include tests and documentation;
- Clean up "oops" commits before submitting;
- Follow the coding style guide.
Additional Resources
- Submit your OpenCV-based project for inclusion in Community Friday on opencv.org
- Subscribe to the OpenCV YouTube Channel featuring OpenCV Live, an hour-long streaming show
- Follow OpenCV on LinkedIn for daily posts showing the state-of-the-art in computer vision & AI
- Apply to be an OpenCV Volunteer to help organize events and online campaigns as well as amplify them
- Follow OpenCV on Mastodon in the Fediverse
- Follow OpenCV on Twitter
- OpenCV.ai: Computer Vision and AI development services from the OpenCV team.
Description
Languages
C++
87.5%
C
3.1%
Python
3%
CMake
2%
Java
1.5%
Other
2.7%