Quantcast
Viewing all articles
Browse latest Browse all 16

Map/Reduce in Python

My interest in Grid Computing over the last weeks begins to show. After reading the Google MapReduce paper, I tried my fingers on a client side toy problem.

For formatting purposes, I was interested in the size of the longest string in a sequence. There are lots of ways to do this, but I wanted to try Python’s map and reduce functions.

This is my sample input sequence:

  values = ('abc', 'xy', 'abcdef', 'xyz')

The first step is to transform the values sequence into a new list which contains the lengths of each string:

  mapped = map(lambda x: len(x), values)

After that, the mapped list is reduced to a single value:

  max_len = reduce(lambda x, y: max(x, y), mapped)

Both user-defined map and reduce functions are associative and commutative, so this task can be parallelized easily. In fact, I could in theory hand each map operation off to a different node in a grid. Because of the max function’s properties, the reduce could be executed in multiple steps on a grid, too.

BTW: While map/reduce was a nice experiment, I used a different implementation in the end, using list comprehension and the powerful max() function:

  max( [len(x) for x in values] )

If I had been interested in the longest string itself, I would have used the following (python-2.5 only):

  max(values, key=lambda x: len(x))

Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.

Viewing all articles
Browse latest Browse all 16

Trending Articles