This example shows how to use the map_reduce() method to perform map/reduce style aggregations on your data.
To start, we’ll insert some example data which we can perform map/reduce queries on:
>>> from pymongo import Connection
>>> db = Connection().map_reduce_example
>>> db.things.insert({"x": 1, "tags": ["dog", "cat"]})
ObjectId('...')
>>> db.things.insert({"x": 2, "tags": ["cat"]})
ObjectId('...')
>>> db.things.insert({"x": 3, "tags": ["mouse", "cat", "dog"]})
ObjectId('...')
>>> db.things.insert({"x": 4, "tags": []})
ObjectId('...')
Now we’ll define our map and reduce functions. In this case we’re performing the same operation as in the MongoDB Map/Reduce documentation - counting the number of occurrences for each tag in the tags array, across the entire collection.
Our map function just emits a single (key, 1) pair for each tag in the array:
>>> from bson.code import Code
>>> map = Code("function () {"
... " this.tags.forEach(function(z) {"
... " emit(z, 1);"
... " });"
... "}")
The reduce function sums over all of the emitted values for a given key:
>>> reduce = Code("function (key, values) {"
... " var total = 0;"
... " for (var i = 0; i < values.length; i++) {"
... " total += values[i];"
... " }"
... " return total;"
... "}")
Note
We can’t just return values.length as the reduce function might be called iteratively on the results of other reduce steps.
Finally, we call map_reduce() and iterate over the result collection:
>>> result = db.things.map_reduce(map, reduce, "myresults")
>>> for doc in result.find():
... print doc
...
{u'_id': u'cat', u'value': 3.0}
{u'_id': u'dog', u'value': 2.0}
{u'_id': u'mouse', u'value': 1.0}
PyMongo’s API supports all of the features of MongoDB’s map/reduce engine. One interesting feature is the ability to get more detailed results when desired, by passing full_response=True to map_reduce(). This returns the full response to the map/reduce command, rather than just the result collection:
>>> db.things.map_reduce(map, reduce, "myresults", full_response=True)
{u'counts': {u'input': 4, u'reduce': 2, u'emit': 6, u'output': 3}, u'timeMillis': ..., u'ok': ..., u'result': u'...'}
All of the optional map/reduce parameters are also supported, simply pass them as keyword arguments. In this example we use the query parameter to limit the documents that will be mapped over:
>>> result = db.things.map_reduce(map, reduce, "myresults", query={"x": {"$lt": 3}})
>>> for doc in result.find():
... print doc
...
{u'_id': u'cat', u'value': 2.0}
{u'_id': u'dog', u'value': 1.0}
With MongoDB 1.8.0 or newer you can use SON to specify a different database to store the result collection:
>>> from bson.son import SON
>>> db.things.map_reduce(map, reduce, out=SON([("replace", "results"), ("db", "outdb")]), full_response=True)
{u'counts': {u'input': 4, u'reduce': 2, u'emit': 6, u'output': 3}, u'timeMillis': ..., u'ok': ..., u'result': {u'db': ..., u'collection': ...}}
See also
The full list of options for MongoDB’s map reduce engine