libhdfs is a JNI based C api for Hadoop's DFS.
It provides a simple subset of C apis to manipulate DFS files and the filesystem. libhdfs is available for download as a part of Hadoop itself. The source for libhdfs is available for browsing here.
Table Of Contents
- Overview 2. Setup 3. APIs
Here is an overview of Hadoop's DFS. The javadocs for DFS are available here and here. libdhfs is a simple JNI based C api for accessing and manipulating Hadoop's DFS from native code. It offers a simple subset of the same functionality.
It is necessary to setup Hadoop's DFS itself first. The information to setup Hadoop is available here. Once you have a working setup, you will need to get into the src/c++/libhdfs directory and use the Makefile to build libhdfs (in case of issues use this). Once you have successfully built libhdfs you can link it into your programs and are good to go.
This section describes the various apis provided by libhdfs to manipulate the DFS. It is classified into apis which manipulate individual files and those which manipulate the filesystem itself. (Please see the doxygen documentation [# here] for details of individual apis.)
libhdfs provides apis for both generic manipulation of the filesytem (create directories, copy/move files etc.) and also some very DFS specific functionality (get information on file replication etc.).
At startup one should use the hdfsConnect api to connect to the DFS before any operations can be performed (on files or the filesystem); the analogous hdfsDisconnect performs a clean teardown of the connection.
- hdfsCopy (across filesytems also)
- hdfsMove (across filesytems also)
libhdfs also provides apis for manipulating directories on the DFS:
- hdfsListDirectory / hdfsGetPathInfo / hdfsFreeFileInfo
The apis to query the filesytems for various properties:
- hdfsGetUsed / hdfsGetCapacity
libhdfs provides posix-like apis to manipulate individual files (create, read/write, query etc.) listed below:
- hdfsOpenFile / hdfsCloseFile
- hdfsRead / hdfsWrite
- hdfsTell / hdfsSeek
Using libhdfs in Threaded Applications
libhdfs can be used in threaded applications using the Posix Threads. However to carefully interact with JNI's global/local references the user has to explicitly call the hdfsConvertToGlobalRef / hdfsDeleteGlobalRef apis.
The test cases for libhdfs provide some good examples on how to use libhdfs.
Please drop us an email at email@example.com if you have any questions or any suggestions. Use Jira (component: hdfs) to report bugs.
Thank you for your interest in Hadoop and libhdfs!