Simple Example to Read and Write files from Hadoop DFS
Reading from and writing to Hadoop DFS is no different from how it is done with other file systems. The example HadoopDFSFileReadWrite.java reads a file from HDFS and writes it to another file on HDFS (copy command).
Hadoop FileSystem API describes the methods available to user. Let us walk through the code to understand how it is done.
Before we Begin
Any Java program can talk to HDFS, provided that program is set up right
- The classpath contains the Hadoop JAR files and its client-side dependencies. (we are being vague here as those dependencies varies from version to version).
- The hadoop configuration files on the classpath
- Log4J on the classpath along with a log4.properties resource, or commons-logging preconfigured to use a different logging framework.
Step By Step
Create a FileSystem instance by passing a new Configuration object. Please note that the following example code assumes that the Configuration object will automatically load the hadoop-default.xml and hadoop-site.xml configuration files. You may need to explicitly add these resource paths if you are not running inside of the Hadoop runtime environment.
Validate the input/output paths before reading/writing.
Open inFile for reading.
Open outFile for writing.
Read from input stream and write to output stream until EOF.
Close the streams when done.