Hadoop Mapreduce Demo

Versions:
  • Hadoop 3.1.1 
  • Java10
Set the next surroundings variables:
  • JAVA_HOME 
  • HADOOP_HOME

For Windows

Download Hadoop 3.1.1 binaries for windows at https://github.com/s911415/apache-hadoop-3.1.0-winutils. Extract inwards HADOOP_HOME\bin in addition to brand certain to override the existing files.

For Ubuntu

$ ssh-keygen -t rsa -P '' -f  /.ssh/id_rsa $ truthful cat  /.ssh/id_rsa.pub >>  /.ssh/authorized_keys $ chmod 0600  /.ssh/authorized_keys 

The next teaching volition install Hadoop every bit Pseudo-Distributed Operation

1.) Create the next folders:
HADOOP_HOME/tmp HADOOP_HOME/tmp/dfs/data HADOOP_HOME/tmp/dfs/name 

2.) Set the next properties: core-site.xml in addition to hdfs-site.xml
<property>   fs.defaultFS   hdfs://localhost:9001 </property>
<property> </property>
core-site.xml
<property>  hadoop.tmp.dir  HADOOP_HOME/tmp </property>
<property> </property>
hdfs-site.xml
<property>  dfs.namenode.name.dir  file:///HADOOP_HOME/tmp/dfs/name </property> <property>  dfs.datanode.data.dir  file:///HADOOP_HOME/tmp/dfs/data </property>  <property>  dfs.permissions  false </property>
<property> </property>
3.) Run hadoop namenode -format Don't forget the file:/// prefix inwards hdfs-site.xml for windows. Otherwise, the format volition fail.

4.) Run HADOOP_HOME/sbin/start-dfs.xml.

5.) If all goes well, y'all tin banking concern represent the log for the spider web port inwards the console. In my instance it's http://localhost:9870.


6.) You tin straight off upload whatever file inwards the #4 URL.



Now let's endeavour to work a projection that volition examine our Hadoop setup. Or download an already existing one. For instance this project: https://www.guru99.com/create-your-first-Hadoop-program.html. It has a overnice explanation amongst it, thus let's try. I've repackaged it into a pom projection in addition to uploaded at Github at https://github.com/czetsuya/Hadoop-MapReduce.
  1. Clone the repository. 
  2. Open the hdfs url from the #5 above, in addition to work an input in addition to output folder.
  3. In input folder, upload the file SalesJan2009 from the project's root folder. 
  4. Run Hadoop jounce Hadoop-mapreduce-0.0.1-SNAPSHOT.jar /input /output. 
  5. Check the output from the URL in addition to download the resulting file.

To run Hadoop every bit standalone, download in addition to unpack it every bit is. Go to our projects folder, construct using maven, thus run the Hadoop ascendency below:
>$HADOOP_HOME/bin/hadoop jounce target/hadoop-mapreduce-0.0.1-SNAPSHOT.jar input output

input - is a directory that should comprise the csv file
output - is a directory that volition travel created after launch. The output file volition travel salvage here.

The mutual drive of problems: 

  • Un-properly configured core-site or hdfs-site related to information in addition to advert node?
  • File / folder permission

References

  • https://www.guru99.com/create-your-first-hadoop-program.html
  • https://github.com/czetsuya/Hadoop-MapReduce
  • https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#Standalone_Operation
Next
Previous
Click here for Comments

0 komentar:

Please comment if there are any that need to be asked.