Using M3DB as a general purpose time series database

Data Model

M3DB’s data model allows multiple namespaces, each of which can be .

Each namespace can also be configured with its own schema (see “Schema Modeling” section below).

Within a namespace, each time series is uniquely identified by an ID which can be any valid string / byte array. In addition, tags can be attached to any series which makes the series queryable using the inverted index.

M3DB’s inverted index supports term (exact match) and regular expression queries over all tag values, and individual tag queries can be arbitrarily combined using , OR, and NOT operators.

For example, imagine an application that tracks a fleet of vehicles. One potential structure for the time series could be as follows:

This would allow users to issue queries that answer questions like:

“What time series IDs exist for any vehicle type operating in San Francisco?”
“What time series IDs exist for scooters that are NOT operating in Chicago?”
“What time series IDs exist where the “version” tag matches the regular expression: 0_1_[12]”

TODO(rartoul): Discuss the ability to perform limited amounts of aggregation queries here as well.

TODO(rartoul): Discuss ID / tags mutability.

Each time series in M3DB stores data as a stream of datapoints in the form of <timestamp, value> tuples. Timestamp resolution can be as granular as individual nanoseconds.

Every M3DB namespace can be configured with a Protobuf-defined schema that every value in the time series must conform to

For example, continuing with the vehicle fleet tracking example introduced earlier, a schema might look as follows:

While M3DB strives to support the entire proto3 language spec, only :

Scalar values
Nested messages
Map fields
Reserved fields

The following features are currently not supported:

Any fields
Options of any type
Custom field types

Compression

While M3DB supports schemas that contain nested messages, repeated fields, and map fields, currently it can only effectively compress top level scalar fields. For example, M3DB can compress every field in the following schema:

syntax = "proto3";
message VehicleLocation {
  double latitude = 1;
  double fuel_percent = 3;
  string status = 4;
}

however, it will not apply any form of compression to the attributes field in this schema:

While the latter schema is valid, the attributes field will not be compressed; users should weigh the tradeoffs between more expressive schema and better compression for each use case.

For more details on the compression scheme and its limitations, review .

M3DB setup

For more advanced setups, it’s best to follow the guides on how to configure an M3DB cluster or using Kubernetes. However, this tutorial will walk you through configuring a single node setup locally for development.

Next, run the following command to start the M3DB container:

Breaking that down:

All the -p flags expose the necessary ports.
The -v $(pwd)/m3db_data:/var/lib/m3db section creates a bind mount that enables M3DB to persist data between container restarts.
The -v <PATH_TO_YAML_CONFIG_FILE>:/etc/m3dbnode/m3dbnode.yml section mounts the specified configuration file in the container which allows configuration changes by restarting the container (rather than rebuilding it). can be used as a good starting point. It configures the database to have the Protobuf feature enabled and expects one namespace with the name default and a Protobuf message name of VehicleLocation for the schema. You’ll need to update that portion of the config if you intend to use a different schema than the example one used throughout this document. Note that hard-coding paths to the schema should only be done for local development and testing. For production use-cases, M3DB supports storing the current schema in etcd so that it can be update dynamically. TODO(rartoul): Document how to do that as well as what kind of schema changes are safe / backwards compatible.
The section mounts the Protobuf file containing the schema in the container, similar to the configuration file this allows the schema to be changed by restarting the container instead of rebuilding it. You can use this example schema as a starting point. Is is also the same example schema that is used by the sample Go program discussed below in the “Clients” section. Also see the bullet point above about not hard coding schema files in production.

Once the M3DB container has started, issue the following CURL statement to create the default namespace:

curl -X POST http://localhost:7201/api/v1/database/create -d '{
  "type": "local",
  "namespaceName": "default",
  "retentionTime": "4h"
}'

Note that the retentionTime is set artificially low to conserve resources.

Once a namespace has finished bootstrapping, you must mark it as ready before receiving traffic by using the .

Command

{
  "ready": true
}

At this point it should be ready to serve write and read queries.

Clients

Note: M3DB only has a Go client; this is unlikely to change in the future due to the fact that the client is “fat” and contains a substantial amount of logic that would be difficult to port to other languages.

Users interested in interacting with M3DB directly from Go applications can reference to get an understanding of how to interact with M3DB in Go. Note that the example above uses the same default namespace and schema used throughout this document so it can be run directly against an M3DB docker container setup using the “M3DB setup” instructions above.

M3DB will eventually support other languages by exposing an M3Coordinator endpoint which will allow users to write/read from M3DB directly using GRPC/JSON.