Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
# Clone hawq repository if you haven't previously done
git clone https://git-wip-us.apache.org/repos/asf/incubator-hawq.git
 
# Head to PXF code
cd incubator-hawq/pxf
 
# Compile & Test PXF
make
 
# Simply Run unittest
make unittest

...


Init/Start/Stop PXF

 

Code Block
languagebash
# Deploy PXF
$PXF_HOME/bin/pxf init
# If you get an error "WARNING: instance already exists in ..." make sure you clean up pxf-service directory under $PXF_HOME/bin/pxf and rerun init
 
# Create PXF Log Dir
mkdir $PXF_HOME/logs
 
# Start PXF
$PXF_HOME/bin/pxf start
 
# Check Status
$PXF_HOME/bin/pxf status
# You can also check if the service is running by using the following request to check API version 
curl "localhost:51200/pxf/ProtocolVersion"
 
# To stop PXF $PXF_HOME/bin/pxf stop
 
## Note: If you see a failure 
 

Test PXF

Below are steps which demonstrates accessing a HDFS file from HAWQ.

Code Block
languagebash
# Create an HDFS directory for PXF example data files
$HADOOP_HOME/bin/hadoop fs -mkdir -p /data/pxf_examples
 
# Create a delimited plain text data file named pxf_hdfs_simple.txt:
echo 'Prague,Jan,101,4875.33' > /tmp/pxf_hdfs_simple.txt
echo 'Rome,Mar,87,1557.39' >> /tmp/pxf_hdfs_simple.txt
echo 'Bangalore,May,317,8936.99' >> /tmp/pxf_hdfs_simple.txt
echo 'Beijing,Jul,411,11600.67' >> /tmp/pxf_hdfs_simple.txt

# Add the data file to HDFS:
$HADOOP_HOME/bin/hadoop fs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/

#Display the contents of the pxf_hdfs_simple.txt file stored in HDFS:
$HADOOP_HOME/bin/hadoop fs -cat /data/pxf_examples/pxf_hdfs_simple.txt

Now you can access the hdfs file from HAWQ using the HdfsTextSimple profile as shown below.

Code Block
languagesql
postgres=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8)
            LOCATION ('pxf://localhost:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
          FORMAT 'TEXT' (delimiter=E',');
postgres=# SELECT * FROM pxf_hdfs_textsimple;          

   location    | month | num_orders | total_sales 
---------------+-------+------------+-------------
 Prague        | Jan   |        101 |     4875.33
 Rome          | Mar   |         87 |     1557.39
 Bangalore     | May   |        317 |     8936.99
 Beijing       | Jul   |        411 |    11600.67
(4 rows)

Below are steps which demonstrates accessing a Hive table from HAWQ

Code Block
languagebash
# Create a Hive table to expose our sample data set.
hive> CREATE TABLE sales_info (location string, month string,
        number_of_orders int, total_sales double)
        ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
        STORED AS textfile;

# Load the pxf_hive_datafile.txt sample data file into the sales_info table you just created:
hive> LOAD DATA LOCAL INPATH '/tmp/pxf_hive_datafile.txt'
        INTO TABLE sales_info;

# Perform a query from hive on sales_info to verify that the data was loaded successfully:
hive> SELECT * FROM sales_info;

# Query the table from HAWQ to access the hive table
postgres=# SELECT * FROM hcatalog.default.sales_info

   location    | month | num_orders | total_sales
---------------+-------+------------+-------------
 Prague        | Jan   |        101 |     4875.33
 Rome          | Mar   |         87 |     1557.39
 Bangalore     | May   |        317 |     8936.99
 ...

Build PXF for other databases

PXF can be deployed to different environments, for different databases. Thus it's convenient to tailor PXF build for some specific default configuration parameters, such as - default PXF user, default log and run directories.

All supported databases are stored in incubator- hawq/pxf/gradle/profiles. By default, HAWQ databases is used.

To build PXF bundle for GPDB:

Code Block
languagebash
make install DATABASE=gpdb