Page tree
Skip to end of metadata
Go to start of metadata

Hadoop Distributed File System (HDFS) APIs in perl, python, ruby and php

The Hadoop Distributed File System is written in Java. An application that wants to store/fetch data to/from HDFS can use the Java API This means that applications that are not written in Java cannot access HDFS in an elegant manner.

Thrift is a software framework for scalable cross-language services development. It combines a powerful software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, and Ruby.

This project exposes HDFS APIs using the Thrift software stack. This allows applications written in a myriad of languages to access HDFS elegantly.

The Application Programming Interface (API)

The HDFS API that is exposed through Thrift can be found in hdfs/src/contrib/thriftfs/if/hadoopfs.thrift. The perl, python, ruby, and php APIs can be found at src/contrib/thriftfs/gen-* directories.


The compilation process creates a server org.apache.hadoop.thriftfs.HadooopThriftServer that implements the Thrift interface defined in if/hadoopfs.thrift.

The thrift compiler is used to generate API stubs in python, php, ruby, cocoa, etc. The generated code is checked into the directories gen-*. The generated java API is checked into lib/hadoopthriftapi.jar.

There is a sample python script in the scripts directory. This python script, when invoked, creates a HadoopThriftServer in the background, and then communicates with HDFS using the API. This script is for demonstration purposes only.

  • No labels