TsFile-Hive-Connector User Guide
- TsFile-Hive-Connector User Guide
- About TsFile-Hive-Connector
- System Requirements
- Data Type Correspondence
- Add Dependency For Hive
- Creating Tsfile-backed Hive tables
- Querying from Tsfile-backed Hive tables
- Select Clause Example
- What’s Next
About TsFile-Hive-Connector
With this connector, you can
- Load a single TsFile, from either the local file system or hdfs, into hive
- Load all files in a specific directory, from either the local file system or hdfs, into hive
- Query the tsfile through HQL.
- As of now, the write operation is not supported in hive-connector. So, insert operation in HQL is not allowed while operating tsfile through hive.
Data Type Correspondence
TsFile data type | Hive field type |
---|---|
BOOLEAN | Boolean |
INT32 | INT |
INT64 | BIGINT |
FLOAT | Float |
DOUBLE | Double |
TEXT | STRING |
To use hive-connector in hive, we should add the hive-connector jar into hive.
After downloading the code of iotdb from https://github.com/apache/incubator-iotdb, you can use the command of mvn clean package -pl hive-connector -am -Dmaven.test.skip=true
to get a hive-connector-X.X.X-jar-with-dependencies.jar
.
Then in hive, use the command of add jar XXX
to add the dependency. For example:
Creating Tsfile-backed Hive tables
Also provide a schema which only contains two fields: time_stamp
and sensor_id
for the table. time_stamp
is the time value of the time series and sensor_id
is the name of the sensor you want to extract from the tsfile to hive such as . The name of the table can be any valid tables names in hive.
Also provide a location from which hive-connector will pull the most current data for the table.
The location must be a specific directory, it can be on your local file system or HDFS if you have set up Hadoop. If it is in your local file system, the location should look like file:///data/data/sequence/root.baic2.WWS.leftfrontdoor/
At last, you should set the device_id
in TBLPROPERTIES
to the device name you want to analyze.
For example:
CREATE EXTERNAL TABLE IF NOT EXISTS only_sensor_1(
time_stamp TIMESTAMP,
sensor_1 BIGINT)
ROW FORMAT SERDE 'org.apache.iotdb.hive.TsFileSerDe'
STORED AS
INPUTFORMAT 'org.apache.iotdb.hive.TSFHiveInputFormat'
OUTPUTFORMAT 'org.apache.iotdb.hive.TSFHiveOutputFormat'
LOCATION '/data/data/sequence/root.baic2.WWS.leftfrontdoor/'
TBLPROPERTIES ('device_id'='root.baic2.WWS.leftfrontdoor.plc1');
At this point, the Tsfile-backed table can be worked with in Hive like any other table.
Before we do any queries, we should set the hive.input.format
in hive by executing the following command.
hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
Now, we already have an external table named only_sensor_1
in hive. We can use any query operations through HQL to analyse it.
For example:
Aggregate Clause Example
hive> select count(*) from only_sensor_1;
Query ID = jackietien_20191016202416_d1e3e233-d367-4453-b39a-2aac9327a3b6
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2019-10-16 20:24:18,305 Stage-1 map = 0%, reduce = 0%
2019-10-16 20:24:27,443 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local867757288_0002
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 11.334 seconds, Fetched: 1 row(s)
What’s Next
We’re currently only supporting read operation. Writing tables to Tsfiles is under development.