With this connector, you can

    • Load a single TsFile, from either the local file system or hdfs, into hive
    • Load all files in a specific directory, from either the local file system or hdfs, into hive
    • Query the tsfile through HQL.
    • As of now, the write operation is not supported in hive-connector. So, insert operation in HQL is not allowed while operating tsfile through hive.

    System Requirements

    TsFile data typeHive field type
    BOOLEANBoolean
    INT32INT
    INT64BIGINT
    FLOATFloat
    DOUBLEDouble
    TEXTSTRING

    Add Dependency For Hive

    To use hive-connector in hive, we should add the hive-connector jar into hive.

    After downloading the code of iotdb from , you can use the command of mvn clean package -pl hive-connector -am -Dmaven.test.skip=true -P get-jar-with-dependencies to get a hive-connector-X.X.X-jar-with-dependencies.jar.

    To create a Tsfile-backed table, specify the serde as org.apache.iotdb.hive.TsFileSerDe, specify the inputformat as org.apache.iotdb.hive.TSFHiveInputFormat, and the outputformat as org.apache.iotdb.hive.TSFHiveOutputFormat.

    Also provide a schema which only contains two fields: time_stamp and sensor_id for the table. time_stamp is the time value of the time series and sensor_id is the sensor name to extract from the tsfile to hive such as . The name of the table can be any valid table names in hive.

    Also a location provided for hive-connector to pull the most current data for the table.

    The location should be a specific directory on your local file system or HDFS to set up Hadoop. If it is in your local file system, the location should look like file:///data/data/sequence/root.baic2.WWS.leftfrontdoor/

    For example:

    1. CREATE EXTERNAL TABLE IF NOT EXISTS only_sensor_1(
    2. time_stamp TIMESTAMP,
    3. ROW FORMAT SERDE 'org.apache.iotdb.hive.TsFileSerDe'
    4. STORED AS
    5. INPUTFORMAT 'org.apache.iotdb.hive.TSFHiveInputFormat'
    6. OUTPUTFORMAT 'org.apache.iotdb.hive.TSFHiveOutputFormat'
    7. LOCATION '/data/data/sequence/root.baic2.WWS.leftfrontdoor/'
    8. TBLPROPERTIES ('device_id'='root.baic2.WWS.leftfrontdoor.plc1');

    In this example, the data of root.baic2.WWS.leftfrontdoor.plc1.sensor_1 is pulled from the directory of /data/data/sequence/root.baic2.WWS.leftfrontdoor/. This table results in a description as below:

    At this point, the Tsfile-backed table can be worked with in Hive like any other table.

    Query from TsFile-backed Hive tables

    Before we do any queries, we should set the hive.input.format in hive by executing the following command.

    1. hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

    For example:

    Select Clause Example

    Aggregate Clause Example

    1. hive> select count(*) from only_sensor_1;
    2. Query ID = jackietien_20191016202416_d1e3e233-d367-4453-b39a-2aac9327a3b6
    3. Total jobs = 1
    4. Launching Job 1 out of 1
    5. Number of reduce tasks determined at compile time: 1
    6. In order to change the average load for a reducer (in bytes):
    7. In order to limit the maximum number of reducers:
    8. set hive.exec.reducers.max=<number>
    9. In order to set a constant number of reducers:
    10. set mapreduce.job.reduces=<number>
    11. Job running in-process (local Hadoop)
    12. 2019-10-16 20:24:18,305 Stage-1 map = 0%, reduce = 0%
    13. 2019-10-16 20:24:27,443 Stage-1 map = 100%, reduce = 100%
    14. Ended Job = job_local867757288_0002
    15. MapReduce Jobs Launched:
    16. Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS
    17. Total MapReduce CPU Time Spent: 0 msec
    18. OK
    19. Time taken: 11.334 seconds, Fetched: 1 row(s)