There are two ways to use TsFile in your own project.

    • Using as jars:

      • Compile the source codes and build to jars

        Then, all the jars can be get in folder named . Import target/tsfile-0.9.3-jar-with-dependencies.jar to your project.

    • Using as a maven dependency:

      Compile source codes and deploy to your local repository in three steps:

      • Get the source codes

        1. git clone https://github.com/apache/incubator-iotdb.git
      • Compile the source codes and deploy

        1. cd tsfile/
        2. mvn clean install -Dmaven.test.skip=true
      • add dependencies into your project:

        1. <dependency>
        2. <groupId>org.apache.iotdb</groupId>
        3. <artifactId>tsfile</artifactId>
        4. <version>0.9.3</version>
        5. </dependency>
    1. Or, you can download the dependencies from official Maven repository:
    2. - First, find your maven `settings.xml` on path: `${username}\.m2\settings.xml` , add this `<profile>` to `<profiles>`:
    3. ```
    4. <profile>
    5. <id>allow-snapshots</id>
    6. <activation><activeByDefault>true</activeByDefault></activation>
    7. <repositories>
    8. <repository>
    9. <id>apache.snapshots</id>
    10. <name>Apache Development Snapshot Repository</name>
    11. <url>https://repository.apache.org/content/repositories/snapshots/</url>
    12. <releases>
    13. <enabled>false</enabled>
    14. </releases>
    15. <snapshots>
    16. <enabled>true</enabled>
    17. </snapshots>
    18. </repository>
    19. </repositories>
    20. </profile>
    21. ```
    22. - Then add dependencies into your project:
    23. ```
    24. <dependency>
    25. <groupId>org.apache.iotdb</groupId>
    26. <artifactId>tsfile</artifactId>
    27. <version>0.9.3</version>
    28. </dependency>
    29. ```

    This section demonstrates the detailed usages of TsFile.

    Time-series Data

    A time-series is considered as a sequence of quadruples. A quadruple is defined as (device, measurement, time, value).

    • measurement: A physical or formal measurement that a time-series is taking, e.g., the temperature of a city, the sales number of some goods or the speed of a train at different times. As a traditional sensor (like a thermometer) also takes a single measurement and produce a time-series, we will use measurement and sensor interchangeably below.

    • device: A device refers to an entity that is taking several measurements (producing multiple time-series), e.g., a running train monitors its speed, oil meter, miles it has run, current passengers each is conveyed to a time-series.

    Table 1 illustrates a set of time-series data. The set showed in the following table contains one device named “device_1” with three measurements named “sensor_1”, “sensor_2” and “sensor_3”.

    A set of time-series data

    One Line of Data: In many industrial applications, a device normally contains more than one sensor and these sensors may have values at a same timestamp, which is called one line of data.

    Formally, one line of data consists of a device_id, a timestamp which indicates the milliseconds since January 1, 1970, 00:00:00, and several data pairs composed of measurement_id and corresponding value. All data pairs in one line belong to this device_id and have the same timestamp. If one of the measurements does not have a value in the timestamp, use a space instead(Actually, TsFile does not store null values). Its format is shown as follow:

    1. device_id, timestamp, <measurement_id, value>...

    An example is illustrated as follow. In this example, the data type of two measurements are INT32, FLOAT respectively.

    1. device_1, 1490860659000, m1, 10, m2, 12.12

    Writing TsFile

    Generate a TsFile File.

    A TsFile can be generated by following three steps and the complete code will be given in the section “Example for writing TsFile”.

    • First, construct a TsFileWriter instance.

      Here are the available constructors:

      • Without pre-defined schema
      1. public TsFileWriter(File file) throws IOException
      • With pre-defined schema
      1. public TsFileWriter(File file, Schema schema) throws IOException

      This one is for using the HDFS file system. TsFileOutput can be an instance of class HDFSOutput.

      1. public TsFileWriter(TsFileOutput output, Schema schema) throws IOException

      If you want to set some TSFile configuration on your own, you could use param config. For example:

      1. TSFileConfig conf = new TSFileConfig();
      2. conf.setTSFileStorageFs("HDFS");
      3. TsFileWriter tsFileWriter = new TsFileWriter(file, schema, conf);

      In this example, data files will be stored in HDFS, instead of local file system. If you’d like to store data files in local file system, you can use conf.setTSFileStorageFs("LOCAL"), which is also the default config.

      You can also config the ip and port of your HDFS by config.setHdfsIp(...) and config.setHdfsPort(...). The default ip is localhost and default port is 9000.

      Parameters:

      • file : The TsFile to write

      • schema : The file schemas, will be introduced in next part.

      • config : The config of TsFile.

    • Second, add measurements

      The class Schema contains a map whose key is the name of one measurement schema, and the value is the schema itself.

      Here are the interfaces:

      1. // Create an empty Schema or from an existing map
      2. public Schema()
      3. public Schema(Map<String, MeasurementSchema> measurements)
      4. // Use this two interfaces to add measurements
      5. public void registerMeasurement(MeasurementSchema descriptor)
      6. public void registerMeasurements(Map<String, MeasurementSchema> measurements)
      7. // Some useful getter and checker
      8. public TSDataType getMeasurementDataType(String measurementId)
      9. public MeasurementSchema getMeasurementSchema(String measurementId)
      10. public Map<String, MeasurementSchema> getAllMeasurementSchema()
      11. public boolean hasMeasurement(String measurementId)

      You can always use the following interface in TsFileWriter class to add additional measurements: ​

      1. public void addMeasurement(MeasurementSchema measurementSchema) throws WriteProcessException

      The class MeasurementSchema contains the information of one measurement, there are several constructors:

      1. public MeasurementSchema(String measurementId, TSDataType type, TSEncoding encoding)
      2. public MeasurementSchema(String measurementId, TSDataType type, TSEncoding encoding, CompressionType compressionType)
      3. public MeasurementSchema(String measurementId, TSDataType type, TSEncoding encoding, CompressionType compressionType,
      4. Map<String, String> props)

      Parameters:

      • measurementID: The name of this measurement, typically the name of the sensor.

      • type: The data type, now support six types: BOOLEAN, INT32, INT64, FLOAT, DOUBLE, TEXT;

      • encoding: The data encoding. See Chapter 2-3.

      • compression: The data compression. Now supports UNCOMPRESSED and SNAPPY.

      • props: Properties for special data types.Such as max_point_number for FLOAT and DOUBLE, max_string_length for TEXT. Use as string pairs into a map such as (“max_point_number”, “3”).

    • Third, insert and write data continually.

      Use this interface to create a new TSRecord(a timestamp and device pair).

      1. public TSRecord(long timestamp, String deviceId)

      Then create a DataPoint(a measurement and value pair), and use the addTuple method to add the DataPoint to the correct TsRecord.

      Use this method to write

      1. public void write(TSRecord record) throws IOException, WriteProcessException
    • Finally, call close to finish this writing process.

      1. public void close() throws IOException
    Example for writing a TsFile

    You should install TsFile to your local maven repository.

    1. mvn clean install -pl tsfile -am -DskipTests

    You could write a TsFile by constructing TSRecord if you have the non-aligned (e.g. not all sensors contain values) time series data.

    A more thorough example can be found at /example/tsfile/src/main/java/org/apache/iotdb/tsfile/TsFileWriteWithTSRecord.java

    1. package org.apache.iotdb.tsfile;
    2. import java.io.File;
    3. import org.apache.iotdb.tsfile.file.metadata.enums.TSEncoding;
    4. import org.apache.iotdb.tsfile.write.TsFileWriter;
    5. import org.apache.iotdb.tsfile.write.record.datapoint.DataPoint;
    6. import org.apache.iotdb.tsfile.write.record.datapoint.FloatDataPoint;
    7. import org.apache.iotdb.tsfile.write.record.datapoint.IntDataPoint;
    8. import org.apache.iotdb.tsfile.write.schema.MeasurementSchema;
    9. /**
    10. * An example of writing data to TsFile
    11. * It uses the interface:
    12. * public void addMeasurement(MeasurementSchema MeasurementSchema) throws WriteProcessException
    13. */
    14. public class TsFileWriteWithTSRecord {
    15. public static void main(String args[]) {
    16. try {
    17. String path = "test.tsfile";
    18. File f = new File(path);
    19. if (f.exists()) {
    20. f.delete();
    21. }
    22. TsFileWriter tsFileWriter = new TsFileWriter(f);
    23. // add measurements into file schema
    24. tsFileWriter
    25. .addMeasurement(new MeasurementSchema("sensor_1", TSDataType.INT64, TSEncoding.RLE));
    26. tsFileWriter
    27. .addMeasurement(new MeasurementSchema("sensor_2", TSDataType.INT64, TSEncoding.RLE));
    28. tsFileWriter
    29. .addMeasurement(new MeasurementSchema("sensor_3", TSDataType.INT64, TSEncoding.RLE));
    30. // construct TSRecord
    31. TSRecord tsRecord = new TSRecord(1, "device_1");
    32. DataPoint dPoint1 = new LongDataPoint("sensor_1", 1);
    33. DataPoint dPoint2 = new LongDataPoint("sensor_2", 2);
    34. DataPoint dPoint3 = new LongDataPoint("sensor_3", 3);
    35. tsRecord.addTuple(dPoint1);
    36. tsRecord.addTuple(dPoint2);
    37. tsRecord.addTuple(dPoint3);
    38. // write TSRecord
    39. tsFileWriter.write(tsRecord);
    40. // close TsFile
    41. tsFileWriter.close();
    42. } catch (Throwable e) {
    43. e.printStackTrace();
    44. System.out.println(e.getMessage());
    45. }
    46. }
    47. }

    You could write a TsFile by constructing RowBatch if you have the aligned time series data.

    A more thorough example can be found at /example/tsfile/src/main/java/org/apache/iotdb/tsfile/TsFileWriteWithRowBatch.java

    1. package org.apache.iotdb.tsfile;
    2. import java.io.File;
    3. import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType;
    4. import org.apache.iotdb.tsfile.file.metadata.enums.TSEncoding;
    5. import org.apache.iotdb.tsfile.write.TsFileWriter;
    6. import org.apache.iotdb.tsfile.write.schema.Schema;
    7. import org.apache.iotdb.tsfile.write.schema.MeasurementSchema;
    8. import org.apache.iotdb.tsfile.write.record.RowBatch;
    9. /**
    10. * An example of writing data with RowBatch to TsFile
    11. */
    12. public class TsFileWriteWithRowBatch {
    13. public static void main(String[] args) {
    14. try {
    15. String path = "test.tsfile";
    16. File f = new File(path);
    17. if (f.exists()) {
    18. f.delete();
    19. }
    20. Schema schema = new Schema();
    21. // the number of rows to include in the row batch
    22. int rowNum = 1000000;
    23. // the number of values to include in the row batch
    24. int sensorNum = 10;
    25. // add measurements into file schema (all with INT64 data type)
    26. for (int i = 0; i < sensorNum; i++) {
    27. schema.registerMeasurement(
    28. new MeasurementSchema("sensor_" + (i + 1), TSDataType.INT64, TSEncoding.TS_2DIFF));
    29. }
    30. // add measurements into TSFileWriter
    31. TsFileWriter tsFileWriter = new TsFileWriter(f, schema);
    32. // construct the row batch
    33. RowBatch rowBatch = schema.createRowBatch("device_1");
    34. long[] timestamps = rowBatch.timestamps;
    35. Object[] values = rowBatch.values;
    36. long timestamp = 1;
    37. long value = 1000000L;
    38. for (int r = 0; r < rowNum; r++, value++) {
    39. int row = rowBatch.batchSize++;
    40. timestamps[row] = timestamp++;
    41. for (int i = 0; i < sensorNum; i++) {
    42. long[] sensor = (long[]) values[i];
    43. sensor[row] = value;
    44. }
    45. // write RowBatch to TsFile
    46. if (rowBatch.batchSize == rowBatch.getMaxBatchSize()) {
    47. tsFileWriter.write(rowBatch);
    48. rowBatch.reset();
    49. }
    50. }
    51. // write RowBatch to TsFile
    52. if (rowBatch.batchSize != 0) {
    53. rowBatch.reset();
    54. }
    55. // close TsFile
    56. } catch (Throwable e) {
    57. e.printStackTrace();
    58. System.out.println(e.getMessage());
    59. }
    60. }
    61. }

    Interface for Reading TsFile

    Before the Start

    The set of time-series data in section “Time-series Data” is used here for a concrete introduction in this section. The set showed in the following table contains one deltaObject named “device_1” with three measurements named “sensor_1”, “sensor_2” and “sensor_3”. And the measurements has been simplified to do a simple illustration, which contains only 4 time-value pairs each.

    A set of time-series data

    Definition of Path

    A path is a dot-separated string which uniquely identifies a time-series in TsFile, e.g., “root.area_1.device_1.sensor_1”. The last section “sensor_1” is called “measurementId” while the remaining parts “root.area_1.device_1” is called deviceId. As mentioned above, the same measurement in different devices has the same data type and encoding, and devices are also unique.

    In read interfaces, The parameter paths indicates the measurements to be selected.

    Path instance can be easily constructed through the class Path. For example:

    1. Path p = new Path("device_1.sensor_1");

    We will pass an ArrayList of paths for final query call to support multiple paths.

    1. List<Path> paths = new ArrayList<Path>();
    2. paths.add(new Path("device_1.sensor_1"));
    3. paths.add(new Path("device_1.sensor_3"));
    Definition of Filter
    Usage Scenario

    Filter is used in TsFile reading process to select data satisfying one or more given condition(s).

    IExpression

    The IExpression is a filter expression interface and it will be passed to our final query call. We create one or more filter expressions and may use binary filter operators to link them to our final expression.

    • Create a Filter Expression

      • TimeFilter: A filter for time in time-series data.

        1. IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter);

        Use the following relationships to get a TimeFilter object (value is a long int variable).

      • ValueFilter: A filter for value in time-series data.

        1. IExpression valueFilterExpr = new SingleSeriesExpression(Path, ValueFilter);

        The usage of ValueFilter is the same as using TimeFilter, just to make sure that the type of the value equal to the measurement’s(defined in the path).

    • Binary Filter Operators

      Binary filter operators can be used to link two single expressions.

      • BinaryExpression.and(Expression, Expression): Choose the value satisfy for both expressions.
      • BinaryExpression.or(Expression, Expression): Choose the value satisfy for at least one expression.
    Filter Expression Examples
    • TimeFilterExpression Examples

      1. IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.eq(15)); // series time = 15
      1. IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.ltEq(15)); // series time <= 15
      1. IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.lt(15)); // series time < 15
      1. IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.notEq(15)); // series time != 15
      1. IExpression timeFilterExpr = BinaryExpression.and(new GlobalTimeExpression(TimeFilter.gtEq(15L)),
      2. new GlobalTimeExpression(TimeFilter.lt(25L))); // 15 <= series time < 25
      1. IExpression timeFilterExpr = BinaryExpression.or(new GlobalTimeExpression(TimeFilter.gtEq(15L)),
      2. new GlobalTimeExpression(TimeFilter.lt(25L))); // series time >= 15 or series time < 25
    Read Interface

    First, we open the TsFile and get a ReadOnlyTsFile instance from a file path string path.

    1. TsFileSequenceReader reader = new TsFileSequenceReader(path);
    2. ReadOnlyTsFile readTsFile = new ReadOnlyTsFile(reader);

    Next, we prepare the path array and query expression, then get final QueryExpression object by this interface:

    1. QueryExpression queryExpression = QueryExpression.create(paths, statement);

    The ReadOnlyTsFile class has two query method to perform a query.

    • Method 1

      1. public QueryDataSet query(QueryExpression queryExpression) throws IOException
    • Method 2

      1. public QueryDataSet query(QueryExpression queryExpression, long partitionStartOffset, long partitionEndOffset) throws IOException

      This method is designed for advanced applications such as the TsFile-Spark Connector.

      • params : For method 2, two additional parameters are added to support partial query:

        • partitionStartOffset: start offset for a TsFile
        • partitionEndOffset: end offset for a TsFile

        What is Partial Query ?

        In some distributed file systems(e.g. HDFS), a file is split into severval parts which are called “Blocks” and stored in different nodes. Executing a query paralleled in each nodes involved makes better efficiency. Thus Partial Query is needed. Paritial Query only selects the results stored in the part split by QueryConstant.PARTITION_START_OFFSET and QueryConstant.PARTITION_END_OFFSET for a TsFile.

    QueryDataset Interface

    The query performed above will return a QueryDataset object.

    Here’s the useful interfaces for user.

    • bool hasNext();

      Return true if this dataset still has elements.

    • List<Path> getPaths()

      Get the paths in this data set.

    • List<TSDataType> getDataTypes();

      Get the data types. The class TSDataType is an enum class, the value will be one of the following:

      1. BOOLEAN,
      2. INT32,
      3. INT64,
      4. FLOAT,
      5. DOUBLE,
      6. TEXT;
    • RowRecord next() throws IOException;

      Get the next record.

      The class RowRecord consists of a long timestamp and a List<Field> for data in different sensors, we can use two getter methods to get them.

      1. long getTimestamp();
      2. List<Field> getFields();

      To get data from one Field, use these methods:

      1. TSDataType getDataType();
      2. Object getObjectValue();
    Example for reading an existing TsFile

    You should install TsFile to your local maven repository.

    A more thorough example with query statement can be found at /tsfile/example/src/main/java/org/apache/iotdb/tsfile/TsFileRead.java

    1. package org.apache.iotdb.tsfile;
    2. import java.io.IOException;
    3. import java.util.ArrayList;
    4. import org.apache.iotdb.tsfile.read.ReadOnlyTsFile;
    5. import org.apache.iotdb.tsfile.read.TsFileSequenceReader;
    6. import org.apache.iotdb.tsfile.read.common.Path;
    7. import org.apache.iotdb.tsfile.read.expression.IExpression;
    8. import org.apache.iotdb.tsfile.read.expression.QueryExpression;
    9. import org.apache.iotdb.tsfile.read.expression.impl.BinaryExpression;
    10. import org.apache.iotdb.tsfile.read.expression.impl.GlobalTimeExpression;
    11. import org.apache.iotdb.tsfile.read.expression.impl.SingleSeriesExpression;
    12. import org.apache.iotdb.tsfile.read.filter.TimeFilter;
    13. import org.apache.iotdb.tsfile.read.filter.ValueFilter;
    14. import org.apache.iotdb.tsfile.read.query.dataset.QueryDataSet;
    15. /**
    16. * The class is to show how to read TsFile file named "test.tsfile".
    17. * The TsFile file "test.tsfile" is generated from class TsFileWrite.
    18. * Run TsFileWrite to generate the test.tsfile first
    19. */
    20. public class TsFileRead {
    21. private static void queryAndPrint(ArrayList<Path> paths, ReadOnlyTsFile readTsFile, IExpression statement)
    22. throws IOException {
    23. QueryExpression queryExpression = QueryExpression.create(paths, statement);
    24. QueryDataSet queryDataSet = readTsFile.query(queryExpression);
    25. while (queryDataSet.hasNext()) {
    26. System.out.println(queryDataSet.next());
    27. }
    28. System.out.println("------------");
    29. }
    30. public static void main(String[] args) throws IOException {
    31. // file path
    32. String path = "test.tsfile";
    33. // create reader and get the readTsFile interface
    34. TsFileSequenceReader reader = new TsFileSequenceReader(path);
    35. ReadOnlyTsFile readTsFile = new ReadOnlyTsFile(reader);
    36. // use these paths(all sensors) for all the queries
    37. ArrayList<Path> paths = new ArrayList<>();
    38. paths.add(new Path("device_1.sensor_1"));
    39. paths.add(new Path("device_1.sensor_2"));
    40. paths.add(new Path("device_1.sensor_3"));
    41. // no query statement
    42. queryAndPrint(paths, readTsFile, null);
    43. //close the reader when you left
    44. reader.close();
    45. }
    46. }
    1. TSFileConfig config = TSFileDescriptor.getInstance().getConfig();
    2. config.setXXX();
    configuration

    you can control the false positive rate of bloom filter by changing the bloomFilterErrorRate in TSFileConfig

    1. # The acceptable error rate of bloom filter, should be in [0.01, 0.1], default is 0.05