This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


Code Block
# Clone hawq repository if you haven't previously done
git clone
# Head to PXF code
cd incubator-hawq/pxf
# Compile & Test PXF
# Simply Run unittest
make unittest


Init/Start/Stop PXF


Code Block
# Deploy PXF
$PXF_HOME/bin/pxf init
# If you get an error "WARNING: instance already exists in ..." make sure you clean up pxf-service directory under $PXF_HOME/bin/pxf and rerun init
# Create PXF Log Dir
mkdir $PXF_HOME/logs
# Start PXF
$PXF_HOME/bin/pxf start
# Check Status
$PXF_HOME/bin/pxf status
# You can also check if the service is running by using the following request to check API version 
curl "localhost:51200/pxf/ProtocolVersion"
# To stop PXF $PXF_HOME/bin/pxf stop
## Note: If you see a failure 

Test PXF

Below are steps which demonstrates accessing a HDFS file from HAWQ.

Code Block
# Create an HDFS directory for PXF example data files
$HADOOP_HOME/bin/hadoop fs -mkdir -p /data/pxf_examples
# Create a delimited plain text data file named pxf_hdfs_simple.txt:
echo 'Prague,Jan,101,4875.33' > /tmp/pxf_hdfs_simple.txt
echo 'Rome,Mar,87,1557.39' >> /tmp/pxf_hdfs_simple.txt
echo 'Bangalore,May,317,8936.99' >> /tmp/pxf_hdfs_simple.txt
echo 'Beijing,Jul,411,11600.67' >> /tmp/pxf_hdfs_simple.txt

# Add the data file to HDFS:
$HADOOP_HOME/bin/hadoop fs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/

#Display the contents of the pxf_hdfs_simple.txt file stored in HDFS:
$HADOOP_HOME/bin/hadoop fs -cat /data/pxf_examples/pxf_hdfs_simple.txt

Now you can access the hdfs file from HAWQ using the HdfsTextSimple profile as shown below.

Code Block
postgres=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8)
            LOCATION ('pxf://localhost:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
          FORMAT 'TEXT' (delimiter=E',');
postgres=# SELECT * FROM pxf_hdfs_textsimple;          

   location    | month | num_orders | total_sales 
 Prague        | Jan   |        101 |     4875.33
 Rome          | Mar   |         87 |     1557.39
 Bangalore     | May   |        317 |     8936.99
 Beijing       | Jul   |        411 |    11600.67
(4 rows)

Below are steps which demonstrates accessing a Hive table from HAWQ

Code Block
# Create a Hive table to expose our sample data set.
hive> CREATE TABLE sales_info (location string, month string,
        number_of_orders int, total_sales double)
        STORED AS textfile;

# Load the pxf_hive_datafile.txt sample data file into the sales_info table you just created:
hive> LOAD DATA LOCAL INPATH '/tmp/pxf_hive_datafile.txt'
        INTO TABLE sales_info;

# Perform a query from hive on sales_info to verify that the data was loaded successfully:
hive> SELECT * FROM sales_info;

# Query the table from HAWQ to access the hive table
postgres=# SELECT * FROM hcatalog.default.sales_info

   location    | month | num_orders | total_sales
 Prague        | Jan   |        101 |     4875.33
 Rome          | Mar   |         87 |     1557.39
 Bangalore     | May   |        317 |     8936.99

Build PXF for other databases

PXF can be deployed to different environments, for different databases. Thus it's convenient to tailor PXF build for some specific default configuration parameters, such as - default PXF user, default log and run directories.

All supported databases are stored in incubator- hawq/pxf/gradle/profiles. By default, HAWQ databases is used.

To build PXF bundle for GPDB:

Code Block
make install DATABASE=gpdb