Introduction
RaijinCL is a library for matrix operations for OpenCL. GPU architectures vary widely so it is difficult to provide a single implementation of kernels that work well everywhere. Therefore, RaijinCL is an autotuning library. Instead of providing a single optimized implementation of kernels, it generates many different kernels, tests it on the user's machine and records the best performing kernel.
Initial results are very encouraging. For example, RaijinCL is competitive with GEMM provided by AMD's OpenCL BLAS on AMD GPUs and competitive with CUBLAS on Nvidia GPUs that I tested.
A detailed description and results can be found in this technical report.
Author Info
The library was written by Rahul Garg and is a part of his ongoing PhD thesis at McGill University about compiling array-based languages to GPUs. The work was done under the supervision of
Prof. Laurie Hendren at the
SABLE research group.
You can send me an email at my firstname.lastname with domain being mail.mcgill.ca.
Download
You can clone the repository at
Bitbucket. The library is now considered stable for use.
However, an experimental extension to the library (related to use of proper FMA instructions) that enhances performance on some GPUs is available by private request.
Device profiles, which contain pretuned information for devices, is also available upon request for some devices such as AMD Radeon 7970, Nvidia Tesla C2050, Intel HD 4000.
Device profiles will be made public very soon.
Supported platforms
RaijinCL is written in C++, currently supported on Linux and usually tested against AMD and Nvidia SDKs. If you are interested in my help in porting to a different hardware or OS, contact me.
Status
- Provides basic SGEMM, DGEMM, CGEMM and ZGEMM implementation with good performance.
- Matrix transpose kernels
- Matrix-vector multiply
- Provides initial support for sum and product reduction routines for float, double and complex. This is quite general and applicable to reduction along one particular axis dimension of multi-dimensional strided matrices. The common one-dimensional reduction is a supported special case.
- Element-wise unitary operations (such as computing sin, cos, exponentation etc.) on multidimensional matrices.
Long term plans
Long term plan is to also provide support for some matrix decomposition operators as well. However, no timeline can be guaranteed.
License
Licensed under Apache v2.0 license. Have fun!