Troubleshooting PXF

    Enabling more verbose logging may aid PXF troubleshooting efforts.

    PXF provides two categories of message logging - service-level and database-level.

    PXF utilizes log4j for service-level logging. PXF-service-related log messages are captured in a log file specified by PXF’s log4j properties file, /etc/pxf/conf/pxf-log4j.properties. The default PXF logging configuration will write INFO and more severe level logs to /var/log/pxf/pxf-service.log.

    PXF provides more detailed logging when the DEBUG level is enabled. To configure PXF DEBUG logging, uncomment the following line in pxf-log4j.properties:

    and restart the PXF service:

    1. $ sudo service pxf-service restart
    1. $ psql
    1. gpadmin=# CREATE EXTERNAL TABLE hivetest(id int, newid int)
    2. LOCATION ('pxf://namenode:51200/pxf_hive1?PROFILE=Hive')
    3. gpadmin=# select * from hivetest;

    Examine/collect the log messages from pxf-service.log.

    Note: DEBUG logging is verbose and has a performance impact. Remember to turn off PXF service DEBUG logging after you have collected the desired information.

    Database-level logging may provide insight into internal PXF service operations. Additionally, when you access Hive tables using hcatalog or the Hive* profiles, log messages identify the underlying Hive* profile(s) employed to access the data.

    Enable HAWQ and PXF debug message logging during operations on PXF external tables by setting the client_min_messages server configuration parameter to DEBUG2 in your psql session.

    1. gpadmin=# SET client_min_messages=DEBUG2
    2. gpadmin=# SELECT * FROM hivetest;
    3. ...
    4. DEBUG2: churl http header: cell #19: X-GP-URL-HOST: localhost
    5. DEBUG2: churl http header: cell #20: X-GP-URL-PORT: 51200
    6. DEBUG2: churl http header: cell #21: X-GP-DATA-DIR: pxf_hive1
    7. DEBUG2: churl http header: cell #22: X-GP-profile: Hive
    8. DEBUG2: churl http header: cell #23: X-GP-URI: pxf://namenode:51200/pxf_hive1?profile=Hive
    9. DEBUG2: pxf: set_current_fragment_headers: using profile: Hive
    10. ...

    Examine/collect the log messages from stdout.

    Note: DEBUG2 database session logging has a performance impact. Remember to turn off DEBUG2 logging after you have collected the desired information.

    1. gpadmin=# SET client_min_messages=NOTICE

    The Java heap size can be a limiting factor in PXF’s ability to serve many concurrent requests or to run queries against large tables.

    You may run into situations where a query will hang or fail with an Out of Memory exception (OOM). This typically occurs when many threads are reading different data fragments from an external table and insufficient heap space exists to open all fragments at the same time. To avert or remedy this situation, Pivotal recommends first increasing the Java maximum heap size or decreasing the Tomcat maximum number of threads, depending upon what works best for your system configuration.

    Note: The configuration changes described in this topic require modifying config files on each PXF node in your HAWQ cluster. After performing the updates, be sure to verify that the configuration on all PXF nodes is the same.

    You will need to re-apply these configuration changes after any PXF version upgrades.

    Perform the following steps to increase the PXF agent heap size in your HAWQ deployment. You must perform the configuration changes on each PXF node in your HAWQ cluster.

    1. Open /var/pxf/pxf-service/bin/setenv.sh in a text editor.

      1. Update the -Xmx option to the desired value in the JVM_OPTS setting:

      2. Restart PXF:

        1. If you use Ambari to manage your cluster, restart the PXF service via the Ambari console.
        2. If you do not use Ambari, restart the PXF service from the command line on each node:

          1. root@pxf-node$ service pxf-service restart

      If increasing the maximum heap size is not suitable for your HAWQ cluster, try decreasing the number of concurrent working threads configured for the underlying Tomcat web application. A decrease in the number of running threads will prevent any PXF node from exhausting its memory, while ensuring that current queries run to completion (albeit a bit slower). As Tomcat’s default behavior is to queue requests until a thread is free, decreasing this value will not result in denied requests.

      The Tomcat default maximum number of threads is 300. Pivotal recommends decreasing the maximum number of threads to under 6. (If you plan to run large workloads on a large number of files using a Hive profile, Pivotal recommends you pick an even lower value.)

      Perform the following steps to decrease the maximum number of Tomcat threads in your HAWQ PXF deployment. You must perform the configuration changes on each PXF node in your HAWQ cluster.

      1. Open the /var/pxf/pxf-service/conf/server.xml file in a text editor.

        1. root@pxf-node$ vi /var/pxf/pxf-service/conf/server.xml
      2. Update the Catalina Executor block to identify the desired maxThreads value:

        1. <Executor maxThreads="2"
        2. minSpareThreads="50"
        3. name="tomcatThreadPool"
        1. If you do not use Ambari, restart the PXF service from the command line on each node: