Skip to content Skip to sidebar Skip to footer

Hadoop: How To Include Third Party Library In Python MapReduce

I am writing MapReduce job in Python, and want to use some third libraries like chardet. I konw that we can use option -libjars=... to include them for java MapReduce. But how to i

Solution 1:

Problem has been solved by zipimport.

Then I zip chardet to file module.mod, and used like this:

importer = zipimport.zipimporter('module.mod')
chardet = importer.load_module('chardet')

Add -file module.mod in hadoop streaming command.

Now chardet can be used in script.

More details shown in: How can I include a python package with Hadoop streaming job?


Post a Comment for "Hadoop: How To Include Third Party Library In Python MapReduce"