Troubleshooting PXF
Enabling more verbose logging may aid PXF troubleshooting efforts.
PXF provides two categories of message logging - service-level and database-level.
PXF utilizes log4j
for service-level logging. PXF-service-related log messages are captured in a log file specified by PXF’s log4j
properties file, /etc/pxf/conf/pxf-log4j.properties
. The default PXF logging configuration will write INFO
and more severe level logs to /var/log/pxf/pxf-service.log
.
PXF provides more detailed logging when the DEBUG
level is enabled. To configure PXF DEBUG
logging, uncomment the following line in pxf-log4j.properties
:
and restart the PXF service:
$ sudo service pxf-service restart
$ psql
gpadmin=# CREATE EXTERNAL TABLE hivetest(id int, newid int)
LOCATION ('pxf://namenode:51200/pxf_hive1?PROFILE=Hive')
gpadmin=# select * from hivetest;
Examine/collect the log messages from pxf-service.log
.
Note: DEBUG
logging is verbose and has a performance impact. Remember to turn off PXF service DEBUG
logging after you have collected the desired information.
Database-level logging may provide insight into internal PXF service operations. Additionally, when you access Hive tables using hcatalog
or the Hive*
profiles, log messages identify the underlying Hive*
profile(s) employed to access the data.
Enable HAWQ and PXF debug message logging during operations on PXF external tables by setting the client_min_messages
server configuration parameter to DEBUG2
in your psql
session.
gpadmin=# SET client_min_messages=DEBUG2
gpadmin=# SELECT * FROM hivetest;
...
DEBUG2: churl http header: cell #19: X-GP-URL-HOST: localhost
DEBUG2: churl http header: cell #20: X-GP-URL-PORT: 51200
DEBUG2: churl http header: cell #21: X-GP-DATA-DIR: pxf_hive1
DEBUG2: churl http header: cell #22: X-GP-profile: Hive
DEBUG2: churl http header: cell #23: X-GP-URI: pxf://namenode:51200/pxf_hive1?profile=Hive
DEBUG2: pxf: set_current_fragment_headers: using profile: Hive
...
Examine/collect the log messages from stdout
.
Note: DEBUG2
database session logging has a performance impact. Remember to turn off DEBUG2
logging after you have collected the desired information.
gpadmin=# SET client_min_messages=NOTICE
The Java heap size can be a limiting factor in PXF’s ability to serve many concurrent requests or to run queries against large tables.
You may run into situations where a query will hang or fail with an Out of Memory exception (OOM). This typically occurs when many threads are reading different data fragments from an external table and insufficient heap space exists to open all fragments at the same time. To avert or remedy this situation, Pivotal recommends first increasing the Java maximum heap size or decreasing the Tomcat maximum number of threads, depending upon what works best for your system configuration.
Note: The configuration changes described in this topic require modifying config files on each PXF node in your HAWQ cluster. After performing the updates, be sure to verify that the configuration on all PXF nodes is the same.
You will need to re-apply these configuration changes after any PXF version upgrades.
Perform the following steps to increase the PXF agent heap size in your HAWQ deployment. You must perform the configuration changes on each PXF node in your HAWQ cluster.
Open
/var/pxf/pxf-service/bin/setenv.sh
in a text editor.Update the
-Xmx
option to the desired value in theJVM_OPTS
setting:Restart PXF:
- If you use Ambari to manage your cluster, restart the PXF service via the Ambari console.
If you do not use Ambari, restart the PXF service from the command line on each node:
root@pxf-node$ service pxf-service restart
If increasing the maximum heap size is not suitable for your HAWQ cluster, try decreasing the number of concurrent working threads configured for the underlying Tomcat web application. A decrease in the number of running threads will prevent any PXF node from exhausting its memory, while ensuring that current queries run to completion (albeit a bit slower). As Tomcat’s default behavior is to queue requests until a thread is free, decreasing this value will not result in denied requests.
The Tomcat default maximum number of threads is 300. Pivotal recommends decreasing the maximum number of threads to under 6. (If you plan to run large workloads on a large number of files using a Hive profile, Pivotal recommends you pick an even lower value.)
Perform the following steps to decrease the maximum number of Tomcat threads in your HAWQ PXF deployment. You must perform the configuration changes on each PXF node in your HAWQ cluster.
Open the
/var/pxf/pxf-service/conf/server.xml
file in a text editor.root@pxf-node$ vi /var/pxf/pxf-service/conf/server.xml
Update the
Catalina
Executor
block to identify the desiredmaxThreads
value:<Executor maxThreads="2"
minSpareThreads="50"
name="tomcatThreadPool"
-
If you do not use Ambari, restart the PXF service from the command line on each node: