AOCL 2.2 changes - Majorly include LAPACK 3.9.0 support#36
Open
rsanagap wants to merge 1485 commits intoflame:masterfrom
Open
AOCL 2.2 changes - Majorly include LAPACK 3.9.0 support#36rsanagap wants to merge 1485 commits intoflame:masterfrom
rsanagap wants to merge 1485 commits intoflame:masterfrom
Conversation
Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-5556 Change-Id: I774e1a39b12feb6aef65d964704017aea7a45579
Change-Id: I74067c461c68cff55c6b5f02a9080e3df9352b9e
-fopenmp was missing in compiler flag. Included the flag in both automake and cmake configure file CPUPL-5677 Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com> Change-Id: I444abc4ce35dbbce1b45bdde27faf2a5f3f250f2
1. Enable Multithreading in DGETRF for input range M and N > 160. 2. Default and MAX thread count set to 64. 3. 32 threads gives best bench marking results, so from 160 to 6500 inputrange set to 32. AMD Internal : [CPUPL-4978] Change-Id: I7c68982bfe295e95563f40d77547ef101fbd97e2
AOCL-LAPACK Version upgraded from 4.2.1 to 5.0.1 Signed-off-by: tprnaidu <tprnaidu@amd.com> Change-Id: I066ae2d790f55b135ac00b868ef9acd03ba1762d
Gains observed upto 15% for the following block size/input size combinations on lib-genoa-05 machine 1. For float datatype, block sizes that work well are:- a) 60 for size greater than 3000 x 3000 b) 48 for size greater than 1000 x 1000 c) 24 for rest all cases 2. For double datatype, block sizes that work well are:- a) 64 for size greater than 2000 x 2000 b) 60 for size greater than 1000 x 1000 c) 32 for rest all cases 3. For complex datatype, block sizes that work well are:- a) 64 for size greater than 600 x 600 b) 48 for size greater than 100 x 100 c) 32 for rest all cases 4. For double complex datatype, block sizes that work well are:- a) 64 for size greater than 400 x 400 b) 48 for size greater than 100 x 100 c) 32 for rest all cases Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com> AMD-Internal: CPUPL-5876 Change-Id: Ide6acd9d41faafac6a84bdd18d38849789636fbe
… multi-thread is enabled CPUPL-5677 Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com> Change-Id: Iaa1b79ca8e7e7c15df31b02e7fef450eb530a4e7
Added LAPACKE interface testing support in the main test suite for GBTRF and GBTRS APIs. Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com> Change-Id: I97b7aaf8b697a1d1779bbebb69dc7cec9348d1f8
Correction in condition checks for invalid input params in sgesdd_fla_check, dgesdd_fla_check APIs. Modified condition to check ldvt based on jobvt instead of jobu. Modified jobz comparison logic to check regardless of case. AMD-Internal: CPUPL-5889 Signed-off-by: dnikku <Deepika.Nikku@amd.com> Change-Id: Ib3fed47fbd05163aa2b496cec6cd59cd79156880
Update License text with latest Third Party Notices Change-Id: I79589b0821cab802157406415d49d898a7a83d2f
This reverts commit 70132a4. Change-Id: I1002f00295f77caf2097ff67bf1b51889067bf96
NOTICES file with Third Party Licence information Change-Id: I8b91edd3481bef3e5378a2c91e67c1b4eb81d1b1
APIs from AOCL-Utils to get ISA information have changed in 4.2 release and the ones used in libflame have been deprecated. Use the latest AOCL-Utils API. AMD Internal : [CPUPL-5906] Change-Id: I84d576f9749a399aea23d96b5d2d636497bed540
Earlier these APIs were just a wrapper around sgelss/dgelss. gelsd APIs provide significant performance gains over gelss APIs. Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-5766 Change-Id: I867c107816fcd50fb16a890ca5947c6b9ff80e3d
Corrected error in calculation of length parameter while handling of values less than safe-min in norm calculation. Also, fixed issue in usage of lda in macro inside macro. Signed-off-by: Vasanthakumar R <varajago@amd.com> AMD-Internal: SWLCSG-3226, SWLCSG-3217 Change-Id: Ic75a10aa7020f043e71f1501bb4219f4498be901
1. Optimized the LAPACK_GETRI_SMALL_D_3x3 kernel by reducing the internal repetitive memory load/stores AMD Internal : [CPUPL-5865] Change-Id: Ib4e33cccdc4a78820014a676bcb9808f4c685797
… major Fixed init_matrix_from_file() API to read matrix from input file in column major format Added fix for gtsv AOCC issue. AMD-Internal: CPUPL-5922 Signed-off-by: dnikku <Deepika.Nikku@amd.com> Change-Id: I159583837028dd11ee1dbc844b0d495d0c1be1fc
Initialization of first column/row of U added. Previous optimization to remove the initialization caused test failures in few performance cases. Corrected modification of signs of Vt in 2x2 cases. Signed-off-by: Vasanthakumar R <varajago@amd.com> AMD-Internal: CPUPL-6074 Change-Id: I19b7044029b4580013ea7decb58c3500097d0f88
- Overflow/underflow tests for sygvd/hegvd - Memory leak fixes for sygvd test cases - Enabling lapacke interfaces for sygvd/hegvd Change-Id: I79ec9c009e6ba52df17bc6247bb726e60193d5ed Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-6037
… cases Fixed the copy matrix sizes in validate_gesdd/gesvd to avoid out of bound memory access while testing corner cases. for gesdd, jobz = O, m >= n, ldu = 1, m < n, ldvt = 1 for gesvd, jobu/vt = O, ldu/ldvt = 1 cases Fixed gtsv test2 under validate_gtsv(). Scaling down the residual by 10 times to fall in the expected threshold range as input matrix Xact is randomly generated. AMD-Internal: CPUPL-5926 Signed-off-by: dnikku <Deepika.Nikku@amd.com> Change-Id: Ia99bb6d81b76de394265ffded0069fb440de979f
details: Datatype alignment changes for the structures used in test suite Signed-off-by: ksaithar <katteboina.saitharun@amd.com> Change-Id: I08ce86f5d642189b6f9142c74af41a633415b1f3
Components added: 1. Test run/validation 2. Negative test cases 3. Extreme test cases 4. Overflow/Underflow tests 5. Lapacke test Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com> AMD-Internal: CPUPL-5903 Change-Id: I38fa28ac0216740e0669e41509ca7870fd3adab8
1. Move block size computation to a separate function for each of the 4 types. 2. Optimal block sizes for various input sizes vary as OMP_NUM_THREADS is varied. Set optimal block sizes based on input size ranges only when OMP_NUM_THREADS=1 3. For small sizes, take the un-optimized path because with the optimized path there are regressions due to overhead of openmp calls. Gains obtained for single threaded runs - Upto 15% on genoa and 28% on turin Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com> AMD-Internal: CPUPL-5876 Change-Id: I8fdeccdf0debdacec3913f8192711d86e9d62314
Port Netlib Lapack 3.12 FORTRAN code to C files for double precision APIs Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-5708] Change-Id: If2ddc85a9ad0818c96155945340b9cea23b40c8e
- Added avx2, avx512 and parallel version for sgetrf Change-Id: I724cc5c9bf98f42014bcaf680a2fa7373195f10d Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-6060
Following features are implemented in this commit:
1. Library path and include path for aocl-utils and blis can be automatically inferred while building libflame if pkg-config files
for these libraries are available. Only works on Linux for now.
2. Various cmake configure/build/install/test/workflow presets.
3. Cmake presets for Windows (msvc and ninja). As of now test presets do not work!
4. Minimum cmake version upgraded to 3.26.0
Preset names follow the convention: <os>-<make/ninja>-<compiler>-<st/mt>-<lp/ilp>-<static/shared>-<isa-mode>-<other optional commands>
Usage:
$ cmake --build --list-presets
-- Without aocl-utils pkgconfig file
$ cmake --preset {chosen-preset} -DLIBAOCLUTILS_INCLUDE_PATH={aoclutils header path} -DLIBAOCLUTILS_LIBRARY_PATH={aoclutils library path}
-- With aocl-utils pkgconfig file
$ cmake --preset {chosen-preset}
$ cmake --build --preset {chosen-preset}
Build and test workflow
-- If aocl-utils and blis pkg-config files are available
$ cmake --workflow --preset {chosen-preset}
More info in BUILD.md
Change-Id: I8bc54b3eabed9a18c305e911df9aa76d8ff746d0
Signed-off-by: samahmad <Sameer.Ahmad@amd.com>
AMD-Internal: CPUPL-5862
Port Netlib Lapack-3.12 newly added double precision fortran files to c Files added : dgedmd.c, dgedmdq.c, dgeqp3rk.c, dlaqp2rk.c, dlaqp3rk.c. Netlib test for lapack-3.12 included. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-5708] Change-Id: I60b5c47505162882a19f2086e4842c858e0586e8
a. Added separate invoke functions in CPP for each API b. Added support for cmake and make c. Added support for --interface in cmd line d. Resolved warnings in existing interface header file e. Resolved errors/warnings in Windows f. Added ENABLE_CPP_TEST flag in cmake, make files to enable/disable CPP test interface. g. Updated Readme as per latest changes. h. Added CPP changes for 25+ test APIs. Change-Id: I6b77d24e204833134401c69813a3a1672de02c18
Port Netlib Lapack 3.12 FORTRAN code to C files for double precision complex APIs Note: Retained lapack-3.11 zlaqr5.c, to overcome netlib test failures. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6150] Change-Id: I40df2b270d82159cd0bc16a0158951139054b90a
Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com> AMD-Internal: CPUPL-6155 Change-Id: Icbd84fdfa434875bf3bfd2072b09d8e77c326702
Handled a case where invalid arguments to dlacpy could cause segmentation faults AMD-Internal: CPUPL-7560
Handled a case where invalid arguments to dlacpy could cause segmentation faults AMD-Internal: CPUPL-7560
-> Updated BUILD.md to be consistent with user guide. -> Removed ENABLE_EMBED_AOCLUTILS related information from BUILD.md. AMD-Internal: [CPUPL-7491]
-> Updated BUILD.md to be consistent with user guide. -> Removed ENABLE_EMBED_AOCLUTILS related information from BUILD.md. AMD-Internal: [CPUPL-7491]
DGESDD was reusing threshold from SVD for small size paths. It is changed to SDD specific value to make it independent of SVD. Macro name for thresholds changed to include full API name. AMD Internal: CPUPL-7572
DGESDD was reusing threshold from SVD for small size paths. It is changed to SDD specific value to make it independent of SVD. Macro name for thresholds changed to include full API name. AMD Internal: CPUPL-7572
…eration and skipping validation (#136) * AOCL LAPACK Testsuite: Implemented test mode support Implemented --test-mode flag inplace of --random_init which helps to add random input generation and/or skips output validation. AMD-Internal: CPUPL-7435 Signed-off-by: dnikku <Deepika.Nikku@amd.com>
* Add ai-pr-platform-app.yml * Update ai-pr-platform-app.yml * Update ai-pr-platform-app.yml * Update ai-pr-platform-app.yml * Create ai_coverity.json Add AI Coverity Fix - Historical Analysis related changes * Update ai-pr-platform-app.yml * Update ai-pr-platform-app.yml * Update ai-pr-platform-app.yml Removed email * Update ai_coverity.json * Update ai-pr-platform-app.yml * Delete .github/psdb-jenkins-trigger.yml * Delete .github/recipients.yaml * Delete .github/workflows/ai-code-review-trigger.yml * Delete .github/self_enablement_config.yaml --------- Co-authored-by: Tyagi, Shubham <Shubham.Tyagi@amd.com> Co-authored-by: Prabhu, Anantha <Anantha.Prabhu@amd.com>
…ode instead of FLA code (#162) The underlying APIs for DGELSS - DGEBRD, DGEBD2 and DORGBR were utilizing FLA codepaths. It was found that utilizing lapack codepath provides a significant boost to performance of small matrices. A threshold has been added to switch between lapack and FLA codepaths for optimal performance.
* Addition of GEQPF in GEQP3 files * Corrected RWORK allocation for CPP * Removed strcasecmp() * Correct workspace allocation for complex in LAPACKE mode * Added same_string function in test_common * Restored comments * Removed redundant comments + Removed unused lwork parameter from GEQPF * Removed static variable 'geqpf_mode' to avoid race conditions
Root cause: Residual value is NAN/INF as norm value is 0 and test case fails. Solution: Check norm value is not 0 before residual calculation. Added FLA_COMPUTE_RESIDUAL macro for residual computation. AMD-Internal: CPUPL-7592
Used existing AVX2 code for medium sizes. AMD-Internal: CPUPL-7516
* AOCL LAPACK Test suite: Addition of API wise ctests Revamped the test suite to generate API-wise CTests for faster Jenkins execution. One ctest for each of the available code paths is added. GETRF , SYEV are included in this commit.
* AOCL-LAPACK: Build warnings fixing * Update src/base/flamec/main/FLA_Obj.c Co-authored-by: ai-pr-platform[bot] <217061646+ai-pr-platform[bot]@users.noreply.github.com> * Update src/map/lapack2flamec/f2c/c/dlaswp.c Co-authored-by: ai-pr-platform[bot] <217061646+ai-pr-platform[bot]@users.noreply.github.com> * Update test/legacyflame/src/test_apcaqutinc.c Co-authored-by: ai-pr-platform[bot] <217061646+ai-pr-platform[bot]@users.noreply.github.com> * Update test/legacyflame/src/test_trinv.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test/legacyflame/src/test_lyap.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test/legacyflame/src/test_apcaqutinc.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test/legacyflame/src/test_trinv.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test/legacyflame/src/test_lyap.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test/legacyflame/src/test_apcaqutinc.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test/legacyflame/src/test_apqutinc.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/base/flamec/main/FLA_Obj.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * AOCL-LAPACK: Fix review comments * Update src/map/lapack2flamec/f2c/c/dlaswp.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * AOCL-LAPACK: Fixing review comments * Update test/legacyflame/src/test_apcaqutinc.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/base/flamec/main/FLA_Obj.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test/legacyflame/src/test_apcaqutinc.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * AOCL-LAPACK: Addressed review suggestions * Update src/base/flamec/main/FLA_Obj.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/base/flamec/main/FLA_Obj.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/base/flamec/main/FLA_Obj.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * AOCL-LAPACK: Error Fixes * Update test/legacyflame/src/test_caqrutinc.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test/legacyflame/src/test_apcaqutinc.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * AOCL-LAPACK: Addressed review comments * Reverted type casting from uintptr_t to unsigned long * Update CMakeLists.txt Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: tprnaidu <tprnaidu@amd.com> Co-authored-by: ai-pr-platform[bot] <217061646+ai-pr-platform[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Updated License and TPN files according to latest requirements.
* Resolving merge conflicts * replaced GEMM switches with fla_invoke_gemm * BDSQR: Added conformance tests * BDSQR: Added Overflow/Underflow tests * BDSQR: Clang-format * U and VT are orthogonal, D and E are derived from GEBRD. Validation fixed * Used build_bidiagonal_matrix function from test_common * Used build_bidiagonal_matrix function from test_common * Added BRT support * Fixed BDSQR BRT Tests + Corrected GEJSV api config file test parameters * Resolving conflicts * resolving conflicts * used fla_mem_alloc in test_gejsv and addressed negative test failing in BDSQR * Removed trailing whitespaces. * U and VT from GEBRD/ORGBR + Added C matrix validations. * Used GETRI for C matrix validation + SVD reconstruction test * Fixed conformance tests and validation * Addressed segfault for lapacke_row * Removed redundant variables. * Correct matrix dimension allocation. * Reconstruction U and VT tests * Fixed zero residual issue * Fixed compute_matrix_norm call * Added compute_matrix_inverse to test_common * Removed reset_matrix calls * Fixed memory leaks
Address alignment of key x86 optimized functions called for DGELS. This improves the performance of small size problems for dgels.
Also removed warnings in CPP interface for deprecated APIs. Root cause: Residual value is NAN/INF as norm value is 0 and test case fails. Solution: Added prevention check, if norm value > safe_min before residual calculation. Added common eps (E, P), safe min to be used to compute residual changes of all APIs. AMD-Internal: CPUPL-7742
Fixed build error when libflame is built for AVX2 with ENABLE_AOCL_BLAS option. Fix: Used macro BLIS_KERNELS_ZEN4 to run direct BLIS kernels for zen4 and above. Change-Id: I19e4cea12becd9daac7dc85f4ce3c0ebb5d366c9 AMD-Internal: CPUPL-7475
* AOCL-LAPACK: Fix for GELSS validation failure for Rank < N -> Updated validation formula for rank < n case. -> Updated fla_invoke_gemm function to have alpha and beta parameters. AMD-Internal: [CPUPL-7571]
…… (#220) AOCL-LAPACK: Fix for Reproducibility failures observed on POTRF, POTF2, POTRI, POTRS, GEBRD APIs -> aocl_fla_init() was not invoked before checking the minimum arch id, added this call for potrf and potri code. -> Fix for GEBRD reproducibility failures AMD-Internal: [CPUPL-7493] Co-authored-by: Reshi Krish <reshikrish.thangajawahar@amd.com>
Updated version string in the so_version file and wherever applicable.
We have removed support for auto-conf tools based build. Hence removing this file.
Co-authored-by: tprnaidu <tprnaidu@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Field,
Please review and merge them.
Thanks