Easy Python/Numpy CUDA/CUBLAS Integration
CUDA is Nvidia’s C-like API for non-graphic number crunching on their 8xxx level and above video cards. For certain operations, it is amazingly fast. Unfortunately, it is painful in the extreme to use, especially when compared to Numpy, Python’s wonderful scientific computing package.
So, to marry the two, I wrote for myself some wrapper code. It’s pretty much only good for one thing: multiplying large matrices together really fast. But it’s really good at it. (and it’s really easy to use) For example:
import numpy
from pycublas import CUBLASMatrix
A = CUBLASMatrix( numpy.mat([[1,2,3],[4,5,6]],numpy.float32) )
B = CUBLASMatrix( numpy.mat([[2,3],[4,5],[6,7]],numpy.float32) )
C = A*B
print C.np_mat()
All CUBLAS alloc and free calls are mapped to the CUBLASMatrix object’s life in Python, so you don’t have to worry about memory management. (other than filling up the card, or course)
Here are some performance numbers: (includes memory transfer times)
(4160×4160)*(4160×4160) = 43.0X faster than numpy
(4096×4096)*(4096×4096) = 34.0X
(3900×3900)*(3900×3900) = 47.3X
(2048×2048)*(2048×2048) = 28.2X
(1024×1024)*(1024×1024) = 58.8X
(512×512)*(512×512) = 24.1X
(256×256)*(256×256) = 6.3X
(128×128)*(128×128) = 1.1X
CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz stepping 06
GPU: nVidia Corporation GeForce 8800 GT (rev a2)
Note: This version only supports float32.
Note: CUBLAS limits matrix dims to (65536×65536).
Source code available here: pycublas.py (rename download to pycublas.py to use)








April 13th, 2009 at 5:24 pm
This is great! Any advice on how appropriate reusing your efforts for CUFFT would be?
April 14th, 2009 at 12:40 pm
i don’t know much about FFTs, but it would seem that CUFFT doesn’t provide that much of a performance increase. at least according to http://www.science.uwaterloo.ca/~hmerz/CUDA_benchFFT/.
but if you do want to try and use it, the numpy -> graphics card code in the __init__() function should help you in using the APIs. just need to load the CUFFT library and create ctypes bindings to it.
April 16th, 2009 at 9:05 pm
Thanks, appreciate this! Think you need to change 190-191 from “if” to “elif”, for non-linux machines.
June 9th, 2009 at 2:11 am
First of all, thanks for sharing this. That said I am having some trouble because I have a 64bit Mac and I think* that my CUDA install expects 64bit, do you know how hard it would be to make your current code 64 bit compatible? I’m planning to tinker with it myself and see if I can get it working but if you could offer any advice I would really appreciate it.
Cheers!
*
>>> A = numpy.mat([[1,2,3],[4,5,6]], numpy.float32)
>>> B = numpy.mat([[2,3],[4,5],[6,7]], numpy.float32)
>>> C = (CUBLASMatrix(A)*CUBLASMatrix(B)).np_mat()
>>> C
Warning: invalid value encountered in reduce
Warning: invalid value encountered in reduce
array([[ nan, nan],
[ nan, 79. ]], dtype=float32)
February 25th, 2010 at 10:36 pm
why pycublaspy1.txt no longer exist in this website? where should i download it??
March 4th, 2010 at 9:24 pm
sorry, link fixed!
March 15th, 2011 at 7:07 pm
I ran into the same bug as Bob (change 190-191 from “if” to “elif”).
June 3rd, 2011 at 9:34 am
Hi can you post the code for CPU and GPU I would like to test it on my Supermicro server with a Tesla card in.
August 6th, 2012 at 11:22 am
I’ve been trying to get your code to work but keep running into a problem.
When i try and execute your source code, i get the following error (and i can’t use “import pycublas”, but i’m guessing it is related to this error):
Traceback (most recent call last):
File “C:\Users\jordan\Desktop\pycublas.py”, line 192, in
else: libcublas = ctypes.cdll.LoadLibrary(‘libcublas.so’)
File “C:\Python27\lib\ctypes\__init__.py”, line 443, in LoadLibrary
return self._dlltype(name)
File “C:\Python27\lib\ctypes\__init__.py”, line 365, in __init__
self._handle = _dlopen(self._name, mode)
WindowsError: [Error 126] The specified module could not be found
Can you give me any insight as to what might be going wrong?
November 22nd, 2012 at 11:00 am
Hi
Excellent work. How easy would it be to do something similar for CUSPARSE?
April 13th, 2013 at 8:14 pm
i hadn’t seen CUSPARSE, thank you! prob. pretty easy. this was a pretty tiny ctypes wrapper.