Skip to content Skip to sidebar Skip to footer

Create Custom Writable Key/value Type In Python For Hadoop Map Reduce?

I have worked on Hadoop MR for quite some time and I have created and used custom(extension) Writable classes including MapWritable. Now I am required to translate the same MR that

Solution 1:

In Pydoop, explicit support for custom Hadoop types is still WIP. In other words, right now we're not making things easy for the user, but it can be done with a bit of work. A couple of pointers:

  • Pydoop already includes custom Java code, auto-installed together with the Python package as pydoop.jar. We pass this extra jar to Hadoop as needed. Adding more Java code is a matter of placing the source in src/ and listing it in JavaLib.java_files in setup.py

  • On the Python side, you need deserializers for the new types. See for instance LongWritableDeserializer in pydoop.mapreduce.pipes.

Hope this helps.


Post a Comment for "Create Custom Writable Key/value Type In Python For Hadoop Map Reduce?"