Easy Python/Numpy CUDA/CUBLAS Integration

CUDA is Nvidia’s C-like API for non-graphic number crunching on their 8xxx level and above video cards. For certain operations, it is amazingly fast. Unfortunately, it is painful in the extreme to use, especially when compared to Numpy, Python’s wonderful scientific computing package.

So, to marry the two, I wrote for myself some wrapper code. It’s pretty much only good for one thing: multiplying large matrices together really fast. But it’s really good at it. (and it’s really easy to use) For example:

import numpy
from pycublas import CUBLASMatrix
A = CUBLASMatrix( numpy.mat([[1,2,3],[4,5,6]],numpy.float32) )
B = CUBLASMatrix( numpy.mat([[2,3],[4,5],[6,7]],numpy.float32) )
C = A*B
print C.np_mat()

All CUBLAS alloc and free calls are mapped to the CUBLASMatrix object’s life in Python, so you don’t have to worry about memory management. (other than filling up the card, or course)

Here are some performance numbers: (includes memory transfer times)
(4160×4160)*(4160×4160) = 43.0X faster than numpy
(4096×4096)*(4096×4096) = 34.0X
(3900×3900)*(3900×3900) = 47.3X
(2048×2048)*(2048×2048) = 28.2X
(1024×1024)*(1024×1024) = 58.8X
(512×512)*(512×512) = 24.1X
(256×256)*(256×256) = 6.3X
(128×128)*(128×128) = 1.1X
CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz stepping 06
GPU: nVidia Corporation GeForce 8800 GT (rev a2)

Note: This version only supports float32.
Note: CUBLAS limits matrix dims to (65536×65536).

Source code available here: pycublas.py (rename download to pycublas.py to use)

12 Responses to “Easy Python/Numpy CUDA/CUBLAS Integration”

  1. Ahmed Fasih Says:

    This is great! Any advice on how appropriate reusing your efforts for CUFFT would be?

  2. derek Says:

    i don’t know much about FFTs, but it would seem that CUFFT doesn’t provide that much of a performance increase. at least according to http://www.science.uwaterloo.ca/~hmerz/CUDA_benchFFT/.

    but if you do want to try and use it, the numpy -> graphics card code in the __init__() function should help you in using the APIs. just need to load the CUFFT library and create ctypes bindings to it.

  3. Bob Says:

    Thanks, appreciate this! Think you need to change 190-191 from “if” to “elif”, for non-linux machines.

  4. Ashleigh Says:

    First of all, thanks for sharing this. That said I am having some trouble because I have a 64bit Mac and I think* that my CUDA install expects 64bit, do you know how hard it would be to make your current code 64 bit compatible? I’m planning to tinker with it myself and see if I can get it working but if you could offer any advice I would really appreciate it.

    >>> A = numpy.mat([[1,2,3],[4,5,6]], numpy.float32)
    >>> B = numpy.mat([[2,3],[4,5],[6,7]], numpy.float32)
    >>> C = (CUBLASMatrix(A)*CUBLASMatrix(B)).np_mat()
    >>> C
    Warning: invalid value encountered in reduce
    Warning: invalid value encountered in reduce
    array([[ nan, nan],
    [ nan, 79. ]], dtype=float32)

  5. arief nur andono Says:

    why pycublaspy1.txt no longer exist in this website? where should i download it??

  6. derek Says:

    sorry, link fixed!

  7. sean Says:

    I ran into the same bug as Bob (change 190-191 from “if” to “elif”).

  8. Boom Says:

    Hi can you post the code for CPU and GPU I would like to test it on my Supermicro server with a Tesla card in.

  9. Jordan Says:

    I’ve been trying to get your code to work but keep running into a problem.

    When i try and execute your source code, i get the following error (and i can’t use “import pycublas”, but i’m guessing it is related to this error):

    Traceback (most recent call last):
    File “C:\Users\jordan\Desktop\pycublas.py”, line 192, in
    else: libcublas = ctypes.cdll.LoadLibrary(‘libcublas.so’)
    File “C:\Python27\lib\ctypes\__init__.py”, line 443, in LoadLibrary
    return self._dlltype(name)
    File “C:\Python27\lib\ctypes\__init__.py”, line 365, in __init__
    self._handle = _dlopen(self._name, mode)
    WindowsError: [Error 126] The specified module could not be found

    Can you give me any insight as to what might be going wrong?

  10. TK Says:


    Excellent work. How easy would it be to do something similar for CUSPARSE?

  11. derek Says:

    i hadn’t seen CUSPARSE, thank you! prob. pretty easy. this was a pretty tiny ctypes wrapper.

  12. Psksvp Says:

    Can it do non square matrix multiplication?

Leave a Reply

<Kered.org>   © Copyright 2000-2005 by Derek Anderson
Get Firefox