http://www.parallelpython.com/
Parallel Python (View in Weblogs)
Posted: Sep 11, 2007 8:24 AM
Reply to this message Reply
Summary
My previous post led me to this library, which appears to solve the
coarse-grained parallelism problem quite elegantly.
Advertisement
You can find the library here, which was written by Vitalii Vanovschi, a
Russian chemist now doing graduate work at USC. It appears he created the
library to serve his own computational needs, and designed it to be simple
so his colleagues could use it.
Parallel Python is based on a functional model; you submit a function to a
"job server" and then later fetch the result. It uses processes (just like
I requested in my previous post) and IPC (InterProcess Communication) to
execute the function, so there is no shared memory and thus no side
effects.
The pp module will automatically figure out the number of processors
available and by default create as many worker processes as there are
processors.
You provide the function, a tuple of that function's arguments, and tuples
with the dependent functions and modules that your function uses. You can
also provide a callback function called when your function completes.
Here's what the syntax looks like:
import pp
job_server = pp.Server() # Uses number of processors in system
f1 = job_server.submit(func1, args1, depfuncs1, modules1)
f2 = job_server.submit(func1, args2, depfuncs1, modules1)
f3 = job_server.submit(func2, args3, depfuncs2, modules2)
# Retrieve results:
r1 = f1()
r2 = f2()
r3 = f3()
What's even more interesting is that Vitalii has already solved the scaling
problem. If you want to use a network of machines to solve your problem,
the change is relatively minor. You start an instance of the Parallel
Python server on each node machine:
node-1> ./ppserver.py
node-2> ./ppserver.py
node-3> ./ppserver.py
Then create the Server() by handing it a list of nodes in the cluster:
import pp
ppservers=("node-1", "node-2", "node-3")
job_server = pp.Server(ppservers=ppservers)
Submitting jobs and getting results is the same as before, thus switching
from multicores to a cluster of computers is virtually effortless. Notice
that it transparently handles the problem of distributing code to the
remote machines. It was not clear, however, whether ppserver.py
automatically makes use of multiple cores on the node machines, but you
would think so.
This library allows you to stay within Python for everything you're doing,
although you can easily do further optimizations by writing time-critical
sections in C, C++, Fortran, etc. and effortlessly and efficiently linking
to them using Python 2.5's ctypes.
This is an exciting development for anyone doing parallel processing work,
and something I want to explore further once my dual-core machine comes
online.
No comments:
Post a Comment