Easy Python/Numpy CUDA/CUBLAS Integration
CUDA is Nvidia’s C-like API for non-graphic number crunching on their 8xxx level and above video cards. For certain operations, it is amazingly fast. Unfortunately, it is painful in the extreme to use, especially when compared to Numpy, Python’s wonderful scientific computing package.
So, to marry the two, I wrote for myself some wrapper code. It’s pretty much only good for one thing: multiplying large matrices together really fast. But it’s really good at it. (and it’s really easy to use) For example:
import numpy
from pycublas import CUBLASMatrix
A = CUBLASMatrix( numpy.mat([[1,2,3],[4,5,6]],numpy.float32) )
B = CUBLASMatrix( numpy.mat([[2,3],[4,5],[6,7]],numpy.float32) )
C = A*B
print C.np_mat()
All CUBLAS alloc and free calls are mapped to the CUBLASMatrix object’s life in Python, so you don’t have to worry about memory management. (other than filling up the card, or course)
Here are some performance numbers: (includes memory transfer times)
(4160×4160)*(4160×4160) = 43.0X faster than numpy
(4096×4096)*(4096×4096) = 34.0X
(3900×3900)*(3900×3900) = 47.3X
(2048×2048)*(2048×2048) = 28.2X
(1024×1024)*(1024×1024) = 58.8X
(512×512)*(512×512) = 24.1X
(256×256)*(256×256) = 6.3X
(128×128)*(128×128) = 1.1X
CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz stepping 06
GPU: nVidia Corporation GeForce 8800 GT (rev a2)
Note: This version only supports float32.
Note: CUBLAS limits matrix dims to (65536×65536).
Source code available here: pycublas.py (rename download to pycublas.py to use)








April 13th, 2009 at 5:24 pm
This is great! Any advice on how appropriate reusing your efforts for CUFFT would be?
April 14th, 2009 at 12:40 pm
i don’t know much about FFTs, but it would seem that CUFFT doesn’t provide that much of a performance increase. at least according to http://www.science.uwaterloo.ca/~hmerz/CUDA_benchFFT/.
but if you do want to try and use it, the numpy -> graphics card code in the __init__() function should help you in using the APIs. just need to load the CUFFT library and create ctypes bindings to it.
April 16th, 2009 at 9:05 pm
Thanks, appreciate this! Think you need to change 190-191 from “if” to “elif”, for non-linux machines.
June 9th, 2009 at 2:11 am
First of all, thanks for sharing this. That said I am having some trouble because I have a 64bit Mac and I think* that my CUDA install expects 64bit, do you know how hard it would be to make your current code 64 bit compatible? I’m planning to tinker with it myself and see if I can get it working but if you could offer any advice I would really appreciate it.
Cheers!
*
>>> A = numpy.mat([[1,2,3],[4,5,6]], numpy.float32)
>>> B = numpy.mat([[2,3],[4,5],[6,7]], numpy.float32)
>>> C = (CUBLASMatrix(A)*CUBLASMatrix(B)).np_mat()
>>> C
Warning: invalid value encountered in reduce
Warning: invalid value encountered in reduce
array([[ nan, nan],
[ nan, 79. ]], dtype=float32)
February 25th, 2010 at 10:36 pm
why pycublaspy1.txt no longer exist in this website? where should i download it??
March 4th, 2010 at 9:24 pm
sorry, link fixed!