7.17. Pinot Connector

    To configure the Pinot connector, create a catalog properties file in named, for example, pinot.properties, to mount the Pinot connector as the pinot catalog. Create the file with the following contents, replacing the connection properties as appropriate for your setup:

    Where the pinot.controller-urls property allows you to specify a comma separated list of the pinot controller host/port pairs.

    You can have as many catalogs as you need, so if you have additional Pinot clusters, simply add another properties file to etc/catalog with a different name (making sure it ends in .properties). For example, if you name the property file sales.properties, Presto will create a catalog named sales using the configured connector.

    In general Pinot schema to Presto schema mapping are pretty straight forward. By default, the data type mapping follows the table below.

    Since Pinot defines each field as dimension, metric or time(date_time) field, it’s possible to infer Presto data type DATE and TIMESTAMP:

    • A Pinot TIME field with timeGranularity { "TimeFormat":"EPOCH", "TimeUnit":"DAYS", "TimeUnitSize": 1 } could be map to a DATE type.
    • A Pinot TIME field with timeGranularity { "TimeFormat":"EPOCH", "TimeUnit":"MILLISECONDS", "TimeUnitSize": 1 } could be map to a TIMESTAMP type.
    • A Pinot DATE_TIME field with format 1:DAYS:EPOCH could be map to a DATE type.
    • A Pinot DATE_TIME field with format 1:MILLISECONDS:EPOCH could be map to a TIMESTAMP type.
    • pinot.infer-date-type-in-schema: This config is false by default. Setting it to true will infer a Pinot TIME/DATE_TIME field to DATE in Presto if possible.
    • pinot.infer-timestamp-type-in-schema: This config is false by default. Setting it to true will infer a Pinot TIME/DATE_TIME field to TIMESTAMP in Presto if possible.

    Below is an example with config: pinot.infer-timestamp-type-in-schema=true.

    Sample Pinot Schema:

    1. {
    2. "dimensionFieldSpecs": [
    3. {
    4. "name": "venue_name",
    5. "dataType": "STRING"
    6. },
    7. {
    8. "name": "event_name",
    9. "dataType": "STRING"
    10. {
    11. "name": "event_id",
    12. "dataType": "STRING"
    13. },
    14. {
    15. "name": "event_time",
    16. "dataType": "LONG"
    17. },
    18. {
    19. "name": "group_city",
    20. "dataType": "STRING"
    21. },
    22. {
    23. "name": "group_country",
    24. "dataType": "STRING"
    25. },
    26. {
    27. "name": "group_id",
    28. "dataType": "LONG"
    29. },
    30. {
    31. }
    32. ],
    33. "metricFieldSpecs": [
    34. {
    35. "name": "rsvp_count",
    36. "dataType": "INT"
    37. }
    38. ],
    39. "timeFieldSpec": {
    40. "incomingGranularitySpec": {
    41. "name": "mtime",
    42. "dataType": "LONG",
    43. "timeType": "MILLISECONDS"
    44. }
    45. }
    46. }

    Sample Presto Schema:

    The Pinot catalog exposes all pinot tables inside a flat schema. The schema name is immaterial when querying but running SHOW SCHEMAS, will show just one schema entry of default.

    The name of the pinot catalog is the catalog file you created above without the .properties extension.

    1. SHOW TABLES from mypinotcluster.default

    OR:

    Both of these commands will list all the tables in your pinot cluster. This is because Pinot does not have a notion of schemas.

    Consider you have a table called clicks in the mypinotcluster. You can see a list of the columns in the clicks table using either of the following:

    1. DESCRIBE mypinotcluster.dontcare.clicks;
    2. SHOW COLUMNS FROM mypinotcluster.dontcare.clicks;

    Finally, you can access the clicks table:

    The connector tries to push the maximal subquery inferred from the presto query into pinot. It can push down everything Pinot supports including aggregations, group by, all UDFs etc. It generates the correct Pinot PQL keeping Pinot’s quirks in mind.

    There are a few configurations that control this behavior:

    • pinot.prefer-broker-queries: This config is true by default. Setting it to false will also create parallel plans for aggregation and limit queries.
    • pinot.non-aggregate-limit-for-broker-queries: To prevent overwhelming the broker, the connector only allows querying the pinot broker for short queries. We define a query to be either an aggregation (or group-by) query or a query with a limit less than the value configured for pinot.non-aggregate-limit-for-broker-queries. The default value for this limit is 25K rows.