AOCL 2.2 changes - Majorly include LAPACK 3.9.0 support by rsanagap · Pull Request #36 · flame/libflame

rsanagap · 2020-07-05T17:32:25Z

Field,

Please review and merge them.

Thanks

Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-5556 Change-Id: I774e1a39b12feb6aef65d964704017aea7a45579

Change-Id: I74067c461c68cff55c6b5f02a9080e3df9352b9e

-fopenmp was missing in compiler flag. Included the flag in both automake and cmake configure file CPUPL-5677 Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com> Change-Id: I444abc4ce35dbbce1b45bdde27faf2a5f3f250f2

1. Enable Multithreading in DGETRF for input range M and N > 160. 2. Default and MAX thread count set to 64. 3. 32 threads gives best bench marking results, so from 160 to 6500 inputrange set to 32. AMD Internal : [CPUPL-4978] Change-Id: I7c68982bfe295e95563f40d77547ef101fbd97e2

AOCL-LAPACK Version upgraded from 4.2.1 to 5.0.1 Signed-off-by: tprnaidu <tprnaidu@amd.com> Change-Id: I066ae2d790f55b135ac00b868ef9acd03ba1762d

Gains observed upto 15% for the following block size/input size combinations on lib-genoa-05 machine 1. For float datatype, block sizes that work well are:- a) 60 for size greater than 3000 x 3000 b) 48 for size greater than 1000 x 1000 c) 24 for rest all cases 2. For double datatype, block sizes that work well are:- a) 64 for size greater than 2000 x 2000 b) 60 for size greater than 1000 x 1000 c) 32 for rest all cases 3. For complex datatype, block sizes that work well are:- a) 64 for size greater than 600 x 600 b) 48 for size greater than 100 x 100 c) 32 for rest all cases 4. For double complex datatype, block sizes that work well are:- a) 64 for size greater than 400 x 400 b) 48 for size greater than 100 x 100 c) 32 for rest all cases Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com> AMD-Internal: CPUPL-5876 Change-Id: Ide6acd9d41faafac6a84bdd18d38849789636fbe

… multi-thread is enabled CPUPL-5677 Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com> Change-Id: Iaa1b79ca8e7e7c15df31b02e7fef450eb530a4e7

Added LAPACKE interface testing support in the main test suite for GBTRF and GBTRS APIs. Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com> Change-Id: I97b7aaf8b697a1d1779bbebb69dc7cec9348d1f8

Correction in condition checks for invalid input params in sgesdd_fla_check, dgesdd_fla_check APIs. Modified condition to check ldvt based on jobvt instead of jobu. Modified jobz comparison logic to check regardless of case. AMD-Internal: CPUPL-5889 Signed-off-by: dnikku <Deepika.Nikku@amd.com> Change-Id: Ib3fed47fbd05163aa2b496cec6cd59cd79156880

Update License text with latest Third Party Notices Change-Id: I79589b0821cab802157406415d49d898a7a83d2f

This reverts commit 70132a4. Change-Id: I1002f00295f77caf2097ff67bf1b51889067bf96

NOTICES file with Third Party Licence information Change-Id: I8b91edd3481bef3e5378a2c91e67c1b4eb81d1b1

APIs from AOCL-Utils to get ISA information have changed in 4.2 release and the ones used in libflame have been deprecated. Use the latest AOCL-Utils API. AMD Internal : [CPUPL-5906] Change-Id: I84d576f9749a399aea23d96b5d2d636497bed540

Earlier these APIs were just a wrapper around sgelss/dgelss. gelsd APIs provide significant performance gains over gelss APIs. Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-5766 Change-Id: I867c107816fcd50fb16a890ca5947c6b9ff80e3d

Corrected error in calculation of length parameter while handling of values less than safe-min in norm calculation. Also, fixed issue in usage of lda in macro inside macro. Signed-off-by: Vasanthakumar R <varajago@amd.com> AMD-Internal: SWLCSG-3226, SWLCSG-3217 Change-Id: Ic75a10aa7020f043e71f1501bb4219f4498be901

1. Optimized the LAPACK_GETRI_SMALL_D_3x3 kernel by reducing the internal repetitive memory load/stores AMD Internal : [CPUPL-5865] Change-Id: Ib4e33cccdc4a78820014a676bcb9808f4c685797

… major Fixed init_matrix_from_file() API to read matrix from input file in column major format Added fix for gtsv AOCC issue. AMD-Internal: CPUPL-5922 Signed-off-by: dnikku <Deepika.Nikku@amd.com> Change-Id: I159583837028dd11ee1dbc844b0d495d0c1be1fc

Initialization of first column/row of U added. Previous optimization to remove the initialization caused test failures in few performance cases. Corrected modification of signs of Vt in 2x2 cases. Signed-off-by: Vasanthakumar R <varajago@amd.com> AMD-Internal: CPUPL-6074 Change-Id: I19b7044029b4580013ea7decb58c3500097d0f88

- Overflow/underflow tests for sygvd/hegvd - Memory leak fixes for sygvd test cases - Enabling lapacke interfaces for sygvd/hegvd Change-Id: I79ec9c009e6ba52df17bc6247bb726e60193d5ed Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-6037

… cases Fixed the copy matrix sizes in validate_gesdd/gesvd to avoid out of bound memory access while testing corner cases. for gesdd, jobz = O, m >= n, ldu = 1, m < n, ldvt = 1 for gesvd, jobu/vt = O, ldu/ldvt = 1 cases Fixed gtsv test2 under validate_gtsv(). Scaling down the residual by 10 times to fall in the expected threshold range as input matrix Xact is randomly generated. AMD-Internal: CPUPL-5926 Signed-off-by: dnikku <Deepika.Nikku@amd.com> Change-Id: Ia99bb6d81b76de394265ffded0069fb440de979f

details: Datatype alignment changes for the structures used in test suite Signed-off-by: ksaithar <katteboina.saitharun@amd.com> Change-Id: I08ce86f5d642189b6f9142c74af41a633415b1f3

Components added: 1. Test run/validation 2. Negative test cases 3. Extreme test cases 4. Overflow/Underflow tests 5. Lapacke test Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com> AMD-Internal: CPUPL-5903 Change-Id: I38fa28ac0216740e0669e41509ca7870fd3adab8

1. Move block size computation to a separate function for each of the 4 types. 2. Optimal block sizes for various input sizes vary as OMP_NUM_THREADS is varied. Set optimal block sizes based on input size ranges only when OMP_NUM_THREADS=1 3. For small sizes, take the un-optimized path because with the optimized path there are regressions due to overhead of openmp calls. Gains obtained for single threaded runs - Upto 15% on genoa and 28% on turin Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com> AMD-Internal: CPUPL-5876 Change-Id: I8fdeccdf0debdacec3913f8192711d86e9d62314

Port Netlib Lapack 3.12 FORTRAN code to C files for double precision APIs Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-5708] Change-Id: If2ddc85a9ad0818c96155945340b9cea23b40c8e

- Added avx2, avx512 and parallel version for sgetrf Change-Id: I724cc5c9bf98f42014bcaf680a2fa7373195f10d Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-6060

Following features are implemented in this commit: 1. Library path and include path for aocl-utils and blis can be automatically inferred while building libflame if pkg-config files for these libraries are available. Only works on Linux for now. 2. Various cmake configure/build/install/test/workflow presets. 3. Cmake presets for Windows (msvc and ninja). As of now test presets do not work! 4. Minimum cmake version upgraded to 3.26.0 Preset names follow the convention: <os>-<make/ninja>-<compiler>-<st/mt>-<lp/ilp>-<static/shared>-<isa-mode>-<other optional commands> Usage: $ cmake --build --list-presets -- Without aocl-utils pkgconfig file $ cmake --preset {chosen-preset} -DLIBAOCLUTILS_INCLUDE_PATH={aoclutils header path} -DLIBAOCLUTILS_LIBRARY_PATH={aoclutils library path} -- With aocl-utils pkgconfig file $ cmake --preset {chosen-preset} $ cmake --build --preset {chosen-preset} Build and test workflow -- If aocl-utils and blis pkg-config files are available $ cmake --workflow --preset {chosen-preset} More info in BUILD.md Change-Id: I8bc54b3eabed9a18c305e911df9aa76d8ff746d0 Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-5862

Port Netlib Lapack-3.12 newly added double precision fortran files to c Files added : dgedmd.c, dgedmdq.c, dgeqp3rk.c, dlaqp2rk.c, dlaqp3rk.c. Netlib test for lapack-3.12 included. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-5708] Change-Id: I60b5c47505162882a19f2086e4842c858e0586e8

a. Added separate invoke functions in CPP for each API b. Added support for cmake and make c. Added support for --interface in cmd line d. Resolved warnings in existing interface header file e. Resolved errors/warnings in Windows f. Added ENABLE_CPP_TEST flag in cmake, make files to enable/disable CPP test interface. g. Updated Readme as per latest changes. h. Added CPP changes for 25+ test APIs. Change-Id: I6b77d24e204833134401c69813a3a1672de02c18

Port Netlib Lapack 3.12 FORTRAN code to C files for double precision complex APIs Note: Retained lapack-3.11 zlaqr5.c, to overcome netlib test failures. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6150] Change-Id: I40df2b270d82159cd0bc16a0158951139054b90a

Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com> AMD-Internal: CPUPL-6155 Change-Id: Icbd84fdfa434875bf3bfd2072b09d8e77c326702

Handled a case where invalid arguments to dlacpy could cause segmentation faults AMD-Internal: CPUPL-7560

-> Updated BUILD.md to be consistent with user guide. -> Removed ENABLE_EMBED_AOCLUTILS related information from BUILD.md. AMD-Internal: [CPUPL-7491]

DGESDD was reusing threshold from SVD for small size paths. It is changed to SDD specific value to make it independent of SVD. Macro name for thresholds changed to include full API name. AMD Internal: CPUPL-7572

…eration and skipping validation (#136) * AOCL LAPACK Testsuite: Implemented test mode support Implemented --test-mode flag inplace of --random_init which helps to add random input generation and/or skips output validation. AMD-Internal: CPUPL-7435 Signed-off-by: dnikku <Deepika.Nikku@amd.com>

* Add ai-pr-platform-app.yml * Update ai-pr-platform-app.yml * Update ai-pr-platform-app.yml * Update ai-pr-platform-app.yml * Create ai_coverity.json Add AI Coverity Fix - Historical Analysis related changes * Update ai-pr-platform-app.yml * Update ai-pr-platform-app.yml * Update ai-pr-platform-app.yml Removed email * Update ai_coverity.json * Update ai-pr-platform-app.yml * Delete .github/psdb-jenkins-trigger.yml * Delete .github/recipients.yaml * Delete .github/workflows/ai-code-review-trigger.yml * Delete .github/self_enablement_config.yaml --------- Co-authored-by: Tyagi, Shubham <Shubham.Tyagi@amd.com> Co-authored-by: Prabhu, Anantha <Anantha.Prabhu@amd.com>

…ode instead of FLA code (#162) The underlying APIs for DGELSS - DGEBRD, DGEBD2 and DORGBR were utilizing FLA codepaths. It was found that utilizing lapack codepath provides a significant boost to performance of small matrices. A threshold has been added to switch between lapack and FLA codepaths for optimal performance.

* Addition of GEQPF in GEQP3 files * Corrected RWORK allocation for CPP * Removed strcasecmp() * Correct workspace allocation for complex in LAPACKE mode * Added same_string function in test_common * Restored comments * Removed redundant comments + Removed unused lwork parameter from GEQPF * Removed static variable 'geqpf_mode' to avoid race conditions

Root cause: Residual value is NAN/INF as norm value is 0 and test case fails. Solution: Check norm value is not 0 before residual calculation. Added FLA_COMPUTE_RESIDUAL macro for residual computation. AMD-Internal: CPUPL-7592

Used existing AVX2 code for medium sizes. AMD-Internal: CPUPL-7516

* AOCL LAPACK Test suite: Addition of API wise ctests Revamped the test suite to generate API-wise CTests for faster Jenkins execution. One ctest for each of the available code paths is added. GETRF , SYEV are included in this commit.

* AOCL-LAPACK: Build warnings fixing * Update src/base/flamec/main/FLA_Obj.c Co-authored-by: ai-pr-platform[bot] <217061646+ai-pr-platform[bot]@users.noreply.github.com> * Update src/map/lapack2flamec/f2c/c/dlaswp.c Co-authored-by: ai-pr-platform[bot] <217061646+ai-pr-platform[bot]@users.noreply.github.com> * Update test/legacyflame/src/test_apcaqutinc.c Co-authored-by: ai-pr-platform[bot] <217061646+ai-pr-platform[bot]@users.noreply.github.com> * Update test/legacyflame/src/test_trinv.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test/legacyflame/src/test_lyap.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test/legacyflame/src/test_apcaqutinc.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test/legacyflame/src/test_trinv.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test/legacyflame/src/test_lyap.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test/legacyflame/src/test_apcaqutinc.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test/legacyflame/src/test_apqutinc.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/base/flamec/main/FLA_Obj.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * AOCL-LAPACK: Fix review comments * Update src/map/lapack2flamec/f2c/c/dlaswp.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * AOCL-LAPACK: Fixing review comments * Update test/legacyflame/src/test_apcaqutinc.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/base/flamec/main/FLA_Obj.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test/legacyflame/src/test_apcaqutinc.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * AOCL-LAPACK: Addressed review suggestions * Update src/base/flamec/main/FLA_Obj.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/base/flamec/main/FLA_Obj.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/base/flamec/main/FLA_Obj.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * AOCL-LAPACK: Error Fixes * Update test/legacyflame/src/test_caqrutinc.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test/legacyflame/src/test_apcaqutinc.c Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * AOCL-LAPACK: Addressed review comments * Reverted type casting from uintptr_t to unsigned long * Update CMakeLists.txt Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: tprnaidu <tprnaidu@amd.com> Co-authored-by: ai-pr-platform[bot] <217061646+ai-pr-platform[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Updated License and TPN files according to latest requirements.

* Resolving merge conflicts * replaced GEMM switches with fla_invoke_gemm * BDSQR: Added conformance tests * BDSQR: Added Overflow/Underflow tests * BDSQR: Clang-format * U and VT are orthogonal, D and E are derived from GEBRD. Validation fixed * Used build_bidiagonal_matrix function from test_common * Used build_bidiagonal_matrix function from test_common * Added BRT support * Fixed BDSQR BRT Tests + Corrected GEJSV api config file test parameters * Resolving conflicts * resolving conflicts * used fla_mem_alloc in test_gejsv and addressed negative test failing in BDSQR * Removed trailing whitespaces. * U and VT from GEBRD/ORGBR + Added C matrix validations. * Used GETRI for C matrix validation + SVD reconstruction test * Fixed conformance tests and validation * Addressed segfault for lapacke_row * Removed redundant variables. * Correct matrix dimension allocation. * Reconstruction U and VT tests * Fixed zero residual issue * Fixed compute_matrix_norm call * Added compute_matrix_inverse to test_common * Removed reset_matrix calls * Fixed memory leaks

Address alignment of key x86 optimized functions called for DGELS. This improves the performance of small size problems for dgels.

Also removed warnings in CPP interface for deprecated APIs. Root cause: Residual value is NAN/INF as norm value is 0 and test case fails. Solution: Added prevention check, if norm value > safe_min before residual calculation. Added common eps (E, P), safe min to be used to compute residual changes of all APIs. AMD-Internal: CPUPL-7742

Fixed build error when libflame is built for AVX2 with ENABLE_AOCL_BLAS option. Fix: Used macro BLIS_KERNELS_ZEN4 to run direct BLIS kernels for zen4 and above. Change-Id: I19e4cea12becd9daac7dc85f4ce3c0ebb5d366c9 AMD-Internal: CPUPL-7475

* AOCL-LAPACK: Fix for GELSS validation failure for Rank < N -> Updated validation formula for rank < n case. -> Updated fla_invoke_gemm function to have alpha and beta parameters. AMD-Internal: [CPUPL-7571]

…… (#220) AOCL-LAPACK: Fix for Reproducibility failures observed on POTRF, POTF2, POTRI, POTRS, GEBRD APIs -> aocl_fla_init() was not invoked before checking the minimum arch id, added this call for potrf and potri code. -> Fix for GEBRD reproducibility failures AMD-Internal: [CPUPL-7493] Co-authored-by: Reshi Krish <reshikrish.thangajawahar@amd.com>

Updated version string in the so_version file and wherever applicable.

We have removed support for auto-conf tools based build. Hence removing this file.

Co-authored-by: tprnaidu <tprnaidu@amd.com>

samahmad and others added 30 commits October 9, 2024 02:48

Accurary tests implementation for sygvd/hegvd APIs

599802f

Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-5556 Change-Id: I774e1a39b12feb6aef65d964704017aea7a45579

Update the LICENSE text

cae8ea2

Change-Id: I74067c461c68cff55c6b5f02a9080e3df9352b9e

Added -fopenmp flag in compiler flag

1ed94b9

-fopenmp was missing in compiler flag. Included the flag in both automake and cmake configure file CPUPL-5677 Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com> Change-Id: I444abc4ce35dbbce1b45bdde27faf2a5f3f250f2

AOCL-LAPACK: Version upgrade to 5.0.1

3b0f1f7

AOCL-LAPACK Version upgraded from 4.2.1 to 5.0.1 Signed-off-by: tprnaidu <tprnaidu@amd.com> Change-Id: I066ae2d790f55b135ac00b868ef9acd03ba1762d

Updated configure file to include -fopenmp in compiler flag only when…

b2a305a

… multi-thread is enabled CPUPL-5677 Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com> Change-Id: Iaa1b79ca8e7e7c15df31b02e7fef450eb530a4e7

AOCL LAPACK Testsuite: LAPACKE interface support for GBTRF and GBTRS API

0fd8373

Added LAPACKE interface testing support in the main test suite for GBTRF and GBTRS APIs. Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com> Change-Id: I97b7aaf8b697a1d1779bbebb69dc7cec9348d1f8

Update License text

70132a4

Update License text with latest Third Party Notices Change-Id: I79589b0821cab802157406415d49d898a7a83d2f

Revert "Update License text"

4d3bc15

This reverts commit 70132a4. Change-Id: I1002f00295f77caf2097ff67bf1b51889067bf96

Add NOTICES file

99958ef

NOTICES file with Third Party Licence information Change-Id: I8b91edd3481bef3e5378a2c91e67c1b4eb81d1b1

Performance improvement in dgetri 3x3 input size

f94fd32

1. Optimized the LAPACK_GETRI_SMALL_D_3x3 kernel by reducing the internal repetitive memory load/stores AMD Internal : [CPUPL-5865] Change-Id: Ib4e33cccdc4a78820014a676bcb9808f4c685797

AOCL LAPACK Test suite: Optimizing Memory Usage in structures

5065142

details: Datatype alignment changes for the structures used in test suite Signed-off-by: ksaithar <katteboina.saitharun@amd.com> Change-Id: I08ce86f5d642189b6f9142c74af41a633415b1f3

AOCL-LAPACK: Upgrade to LAPACK-3.12 Part 1 - Double Precision

f8de6a3

Port Netlib Lapack 3.12 FORTRAN code to C files for double precision APIs Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-5708] Change-Id: If2ddc85a9ad0818c96155945340b9cea23b40c8e

Optimisation porting to sgetrf

0aee1c5

- Added avx2, avx512 and parallel version for sgetrf Change-Id: I724cc5c9bf98f42014bcaf680a2fa7373195f10d Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-6060

AOCL-LAPACK Test-Suite: Add sytrf_rook test case

7fe4e04

Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com> AMD-Internal: CPUPL-6155 Change-Id: Icbd84fdfa434875bf3bfd2072b09d8e77c326702

Ahmad, Sameer and others added 30 commits October 31, 2025 12:18

AOCL LAPACK: DLACPY invalid size handling (#163)

2e9d475

Handled a case where invalid arguments to dlacpy could cause segmentation faults AMD-Internal: CPUPL-7560

AOCL LAPACK: DLACPY invalid size handling (#164)

626d268

Handled a case where invalid arguments to dlacpy could cause segmentation faults AMD-Internal: CPUPL-7560

AOCL-LAPACK: BUILD.md update

0b3f94b

-> Updated BUILD.md to be consistent with user guide. -> Removed ENABLE_EMBED_AOCLUTILS related information from BUILD.md. AMD-Internal: [CPUPL-7491]

AOCL-LAPACK: BUILD.md update

1f1a0e0

-> Updated BUILD.md to be consistent with user guide. -> Removed ENABLE_EMBED_AOCLUTILS related information from BUILD.md. AMD-Internal: [CPUPL-7491]

Small size threshold update for DGESDD

9b28d8c

DGESDD was reusing threshold from SVD for small size paths. It is changed to SDD specific value to make it independent of SVD. Macro name for thresholds changed to include full API name. AMD Internal: CPUPL-7572

Small size threshold update for DGESDD

83ba6ab

DGESDD was reusing threshold from SVD for small size paths. It is changed to SDD specific value to make it independent of SVD. Macro name for thresholds changed to include full API name. AMD Internal: CPUPL-7572

AI Based Coverity (#181)

6956f1a

Fix to resolve failures with GEEV, GEEVX tests. (#176)

99a7a34

Root cause: Residual value is NAN/INF as norm value is 0 and test case fails. Solution: Check norm value is not 0 before residual calculation. Added FLA_COMPUTE_RESIDUAL macro for residual computation. AMD-Internal: CPUPL-7592

DPOTRI optimization for medium sizes upto 65 (#175)

ae26923

Used existing AVX2 code for medium sizes. AMD-Internal: CPUPL-7516

Remove unused configure_tidsp file (#184)

4aa04f5

Updates to LICENSE and TPN

ae8f43e

Updated License and TPN files according to latest requirements.

Merging changes from AMD libFLAME internal repo

ddf0dc7

Fix for DGELS Regression

69cb2b6

Address alignment of key x86 optimized functions called for DGELS. This improves the performance of small size problems for dgels.

AOCL-LAPACK: Build Error Fix for AVX2 (#213)

365a579

Fixed build error when libflame is built for AVX2 with ENABLE_AOCL_BLAS option. Fix: Used macro BLIS_KERNELS_ZEN4 to run direct BLIS kernels for zen4 and above. Change-Id: I19e4cea12becd9daac7dc85f4ce3c0ebb5d366c9 AMD-Internal: CPUPL-7475

AOCL-LAPACK: Fix for GELSS validation failure for Rank < N (#167)

cc113f7

* AOCL-LAPACK: Fix for GELSS validation failure for Rank < N -> Updated validation formula for rank < n case. -> Updated fla_invoke_gemm function to have alpha and beta parameters. AMD-Internal: [CPUPL-7571]

zen5 model update

7db13e8

Version String Update to 5.2

ffd88d9

Updated version string in the so_version file and wherever applicable.

Remove unused configure_tidsp file

badddac

We have removed support for auto-conf tools based build. Hence removing this file.

AOCL-LAPACK: Version string update as 5.2.2 (#234)

7315582

Co-authored-by: tprnaidu <tprnaidu@amd.com>

Updated LICENSE and NOTICES for 5.2.2 release

55e0dd7

Merging changes from internal repo

ed29a1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AOCL 2.2 changes - Majorly include LAPACK 3.9.0 support#36

AOCL 2.2 changes - Majorly include LAPACK 3.9.0 support#36
rsanagap wants to merge 1485 commits into
flame:masterfrom
amd:master

rsanagap commented Jul 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

rsanagap commented Jul 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants