How To Run A Mrjob In A Local Hadoop Cluster With Hadoop Streaming?
I'm currently taking a Big Data Class, and one of my projects is to run my Mapper/Reducer on a Hadoop Cluster which is set up locally. I've been using Python along with the MRJob l
Solution 1:
It seems like the issue was in the etc/hadoop/hadoop-env.sh
script file.
The JAVA_HOME environmental variable was configured to be:
exportJAVA_HOME=$(JAVA_HOME)
So I went ahead and changed it to the following:
exportJAVA_HOME=/usr/lib/jvm/java-8-openjdk
I attempted to run the following command again, in hopes that it would work:
python hrc_discover.py /hdfs/user/user/HRCmail/* -r hadoop --hadoop-bin /usr/bin/hadoop > /hdfs/user/user/output
Thankfully MRJob picked up on the JAVA_HOME environment and resulted in the following output:
No configs found; falling back on auto-configuration
Using Hadoop version 2.7.3
Looking for Hadoop streaming jar in /home/hadoop/contrib...
Looking for Hadoop streaming jar in /usr/lib/hadoop-mapreduce...
Hadoop streaming jar not found. Use --hadoop-streaming-jar
Creating temp directory /tmp/hrc_discover.user.20170306.022649.449218
Copying local files to hdfs:///user/user/tmp/mrjob/hrc_discover.user.20170306.022649.449218/files/...
..
To fix the issue with the Hadoop streaming jar, I added the following switch to the command:
--hadoop-streaming-jar /usr/lib/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar
The full command looked like the following:
python hrc_discover.py /hdfs/user/user/HRCmail/* -r hadoop --hadoop-streaming-jar /usr/lib/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar --hadoop-bin /usr/bin/hadoop > /hdfs/user/user/output
To which the following output was the result:
No configs found; falling back on auto-configuration
Using Hadoop version 2.7.3
Creating temp directory /tmp/hrc_discover.user.20170306.022649.449218
Copying local files to hdfs:///user/user/tmp/mrjob/hrc_discover.user.20170306.022649.449218/files/...
It seems the issue has been resolved and Hadoop should process my job.
Post a Comment for "How To Run A Mrjob In A Local Hadoop Cluster With Hadoop Streaming?"