Ingestion troubleshooting FAQ

We recommend using batch ingestion methods for historical data in production.

Batch Ingestion

If you are trying to batch load historical data but no events are being loaded, make sure the interval of your ingestion spec actually encapsulates the interval of your data. Events outside this interval are dropped.

Druid can ingest JSON, CSV, TSV and other delimited data out of the box. Druid supports single dimension values, or multiple dimension values (an array of strings). Druid supports long, float, and double numeric columns.

Not all of my events were ingested

Druid will reject events outside of a window period. The best way to see if events are being rejected is to check the .

If the number of ingested events seem correct, make sure your query is correctly formed. If you included a count aggregator in your ingestion spec, you will need to query for the results of this aggregate with a aggregator. Issuing a query with a count aggregator will count the number of Druid rows, which includes roll-up.

Where do my Druid segments end up after ingestion?

Depending on what druid.storage.type is set to, Druid will upload segments to some Deep Storage. Local disk is used as the default deep storage.

Other common reasons that hand-off fails are as follows:

  1. Druid is unable to write to the metadata storage. Make sure your configurations are correct.

  2. Historical processes are out of capacity and cannot download any more segments. You’ll see exceptions in the Coordinator logs if this occurs and the Coordinator console will show the Historicals are near capacity.

  3. Segments are corrupt and cannot be downloaded. You’ll see exceptions in your Historical processes if this occurs.

  4. Deep storage is improperly configured. Make sure that your segment actually exists in deep storage and that the Coordinator logs have no errors.

How do I get HDFS to work?

Make sure to include the druid-hdfs-storage and all the hadoop configuration, dependencies (that can be obtained by running command hadoop classpath on a machine where hadoop has been setup) in the classpath. And, provide necessary HDFS settings as described in deep storage .

I don’t see my Druid segments on my Historical processes

You can use a segment metadata query for the dimensions and metrics that have been created for your datasource. Make sure that the name of the aggregators you use in your query match one of these metrics. Also make sure that the query interval you specify match a valid time range where data exists.

How can I Reindex existing data in Druid with schema changes?

You can use DruidInputSource with the Parallel task to ingest existing druid segments using a new schema and change the name, dimensions, metrics, rollup, etc. of the segment. See for more details. Or, if you use hadoop based ingestion, then you can use “dataSource” input spec to do reindexing.

See the Update existing data section of the data management page for more details.

How can I change the granularity of existing data in Druid?

In a lot of situations you may want to lower the granularity of older data. Example, any data older than 1 month has only hour level granularity but newer data has minute level granularity. This use case is same as re-indexing.

To do this use the DruidInputSource and run a . The DruidInputSource will allow you to take in existing segments from Druid and aggregate them and feed them back into Druid. It will also allow you to filter the data in those segments while feeding it back in. This means if there are rows you want to delete, you can just filter them away during re-ingestion. Typically the above will be run as a batch job to say everyday feed in a chunk of data and aggregate it. Or, if you use hadoop based ingestion, then you can use “dataSource” input spec to do reindexing.

See the Update existing data section of the data management page for more details.

More information

Getting data into Druid can definitely be difficult for first time users. Please don’t hesitate to ask questions in our IRC channel or on our google groups page.