Lesson 1 - Runtime Environment

    • Install a HAWQ commercial product distribution or HAWQ sandbox virtual machine or docker environment, or build and install HAWQ from source. Ensure that your HAWQ installation is configured appropriately.

    • Make note of the HAWQ master node hostname or IP address.

    • The HAWQ administrative user is named . This is the user account from which you will administer your HAWQ cluster. To perform the exercises in this tutorial, you must:

      • Obtain the gpadmin user credentials.
      • Ensure that your HAWQ runtime environment is configured such that the HAWQ admin user gpadmin can run commands to access the HDFS Hadoop system accounts (hdfs, hadoop) via sudo without having to provide a password.
      • Obtain the Ambari UI user name and password (optional, if Ambari is installed in your HAWQ deployment). The default Ambari user name and password are both admin.

    HAWQ installs a script that you can use to set up your HAWQ cluster environment. The greenplum_path.sh script, located in your HAWQ root install directory, sets $PATH and other environment variables to find HAWQ files. Most importantly, greenplum_path.sh sets the $GPHOME environment variable to point to the root directory of the HAWQ installation. If you installed HAWQ from a product distribution or are running a HAWQ sandbox environment, the HAWQ root is typically /usr/local/hawq. If you built HAWQ from source or downloaded the tarball, your $GPHOME may differ.

    Perform the following steps to set up your HAWQ runtime environment:

    1. Log in to the HAWQ master node using the gpadmin user credentials; you may not need to provide a password:

    2. Set up your HAWQ operating environment by sourcing the greenplum_path.sh file. If you built HAWQ from source or downloaded the tarball, substitute the path to the installed or extracted greenplum_path.sh file (for example /opt/hawq-2.1.0.0/greenplum_path.sh):

      1. gpadmin@master$ source /usr/local/hawq/greenplum_path.sh

      sourceing greenplum_path.sh sets:

      • $GPHOME
      • $PATH to include the HAWQ $GPHOME/bin/ directory
      • $LD_LIBRARY_PATH to include the HAWQ libraries in $GPHOME/lib/
      1. gpadmin@master$ echo $GPHOME
      2. /usr/local/hawq/.
      3. gpadmin@master$ echo $PATH
      4. /usr/local/hawq/./bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/gpadmin/bin
      5. /usr/local/hawq/./lib

      Note: You must source greenplum_path.sh before invoking any HAWQ commands.

    3. Examine your HAWQ installation:

      1. gpadmin@master$ ls $GPHOME
      2. bin docs etc greenplum_path.sh include lib sbin share

      The HAWQ command line utilities are located in $GPHOME/bin. $GPHOME/lib includes HAWQ and PostgreSQL libraries.

    4. View the current state of your HAWQ cluster, and if it is not already running, start the cluster. In practice, you will perform different procedures depending upon whether you manage your cluster from the command line or use Ambari. While you are introduced to both in this tutorial, lessons will focus on command line instructions, as not every HAWQ deployment will utilize Ambari.

      Command Line:

      If your cluster is not running, start it:

      1. gpadmin@master$ hawq start cluster
      2. 20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-Prepare to do 'hawq start'
      3. 20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-You can find log in:
      4. 20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_start_20170411.log
      5. 20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-GPHOME is set to:
      6. 20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-/usr/local/hawq/.
      7. 20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-Start hawq with args: ['start', 'cluster']
      8. 20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-Gathering information and validating the environment...
      9. 20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-Start all the nodes in hawq cluster
      10. 20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-Starting master node 'master'
      11. 20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-Start master service
      12. 20170411:15:54:48:357122 hawq_start:master:gpadmin-[INFO]:-Master started successfully
      13. 20170411:15:54:48:357122 hawq_start:master:gpadmin-[INFO]:-Start all the segments in hawq cluster
      14. 20170411:15:54:48:357122 hawq_start:master:gpadmin-[INFO]:-Start segments in list: ['segment']
      15. 20170411:15:54:48:357122 hawq_start:master:gpadmin-[INFO]:-Start segment service
      16. 20170411:15:54:48:357122 hawq_start:master:gpadmin-[INFO]:-Total segment number is: 1
      17. 20170411:15:54:53:357122 hawq_start:master:gpadmin-[INFO]:-1 of 1 segments start successfully
      18. 20170411:15:54:53:357122 hawq_start:master:gpadmin-[INFO]:-Segments started successfully
      19. 20170411:15:54:53:357122 hawq_start:master:gpadmin-[INFO]:-HAWQ cluster started successfully

      Get the status of your cluster:

      1. gpadmin@master$ hawq state
      2. 20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:-- HAWQ instance status summary
      3. 20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:------------------------------------------------------
      4. 20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:-- Master instance = Active
      5. 20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:-- No Standby master defined
      6. 20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:-- Total segment instance count from config file = 1
      7. 20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:------------------------------------------------------
      8. 20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:-- Segment Status
      9. 20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:------------------------------------------------------
      10. 20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:-- Total segments count from catalog = 1
      11. 20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:-- Total segment valid (at master) = 1
      12. 20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:-- Total segment failures (at master) = 0
      13. 20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:-- Total number of postmaster.pid files missing = 0
      14. 20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:-- Total number of postmaster.pid files found = 1

      Ambari:

      If your deployment includes an Ambari server, perform the following steps to start and view the current state of your HAWQ cluster.

      1. Start the Ambari management console by entering the following URL in your favorite (supported) browser window:

      2. Log in with the Ambari credentials (default admin:admin) and view the Ambari dashboard:

        The Ambari dashboard provides an at-a-glance status of the health of your HAWQ cluster. A list of each running service and its status is provided in the left panel. The main display area includes a set of configurable tiles providing specific information about your cluster, including HAWQ segment status, HDFS disk usage, and resource manager metrics.

      3. Navigate to the HAWQ service listed in the left pane. If the service is not running (i.e. no green checkmark to the left of the service name), start your HAWQ cluster by clicking the HAWQ service name, and then selecting the Start operation from the Service Actions menu button.

      4. Log out of the Ambari console by clicking the admin button and selecting the Sign out drop down menu item.

    Your HAWQ cluster is now running. For additional information:

    • HAWQ Files and Directories identifies HAWQ files and directories and their install locations.
    • provides an overview of the components comprising a HAWQ cluster, including the users (administrative and operating), deployment systems (HAWQ master, standby, and segments), databases, and data sources.

    Lesson 2: Cluster Administration