MongoDB + Java (MapReduce)

MapReduce is a bit complicated concept. Here we will implement the count() function “build similar function returning number of documents in a collection” using map reduce.

Objective:

  • Implementing map reduce concept to count documents in a collection

Environment:

  • Windows
  • Eclipse or Maven
  • Mongodb

Libraries:

<dependency>
      <groupId>org.mongodb</groupId>
      <artifactId>mongo-java-driver</artifactId>
      <version>2.10.1</version>
      <scope>compile</scope>
    </dependency>

( 1 ) Java Project “eclipse or maven”

( 2 ) Mongodb Document

  • documents are stored under person collection. “inserted using Main.java”
  • documents has the following keys
  1. _id : document id
  2. name: string
  3. age: int
  4. join: date
  5. friends: array of string
  6. address: sub document

{ “_id” : { “$oid” : “514a23c70ce40031c3afc52b”} , “name” : “John” , “age” : 20 , “join” : { “$date” : “2013-03-20T21:01:59.050Z”} , “friends” : [ "Robin" , "Lora" , "Lyla" , "John"] , “address” : { “country” : “US” , “state” : “NY” , “city” : “Buffalo”}}

( 3 ) MapReduce

MapReduce needs to functions to be defined

  • Map function processes each document in the collection and emits X values.
  • X: is 0 or more values.
  • e.g. in this example map functions emits onevalue (“a”: {count:1}) for each document.
> map
function () {
    emit("a", {count:1});
}
  • Reduce function takes values emitted by map function and reduces them to a single value for each key.
  • key is defined in the map function e.g. in this example key is “a”.
> reduce
function (key, values) {
    total = 0;
    for (var i in values) {
        total += values[i].count;
    }
    return {count:total};
}

( 4 ) Java Code

package com.hmkcode.mongodb;

import java.net.UnknownHostException;

import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBObject;
import com.mongodb.MapReduceCommand;
import com.mongodb.MapReduceOutput;
import com.mongodb.MongoClient;

/**
 * Java + MongoDB MapReduce  count()
 * 
 */
public class MapReduce {

  public static void main(String[] args) {

	// Connect to mongodb
	MongoClient mongo = null;
	try {
		mongo = new MongoClient("localhost", 27017);
	} catch (UnknownHostException e) {
		// TODO Auto-generated catch block
		e.printStackTrace();
	}

	// get database 
	// if database doesn't exists, mongodb will create it for you
	DB db = mongo.getDB("test");

	// get collection
	// if collection doesn't exists, mongodb will create it for you
	DBCollection collection = db.getCollection("person");

	String map ="function () {"+
			"emit('size', {count:1});"+
  	"}";

	String reduce = "function (key, values) { "+
    " total = 0; "+
    " for (var i in values) { "+
        " total += values[i].count; "+
    " } "+
    " return {count:total} }";

	MapReduceCommand cmd = new MapReduceCommand(collection, map, reduce,
				     null, MapReduceCommand.OutputType.INLINE, null);

	MapReduceOutput out = collection.mapReduce(cmd);

	for (DBObject o : out.results()) {
		System.out.println(o.toString());
	}
	System.out.println("Done");

  }
}

( 5 ) Run

Source Code github

One thought on “MongoDB + Java (MapReduce)

Comments are closed.