12.2. Connectors

    Instances of your connecor splits.

    The method indicates the node affinity of a Split,it has three options:

    HARD_AFFINITY: Split is NOT remotely accessible and has to be on specific nodes

    NO_PREFERENCE: Split is remotely accessible and can be on any nodes

    The method provides a list of preferred nodes for scheduler to pick.

    The scheduler will respect the preference if the strategy is HARD_AFFINITY. Otherwise, the scheduler will prioritize the provided nodes if the strategy is SOFT_AFFINITY. But there is no guarantee that the scheduler will pick them if the provided nodes are busy. Empty list indicates no preference.

    ConnectorFactory

    • ConnectorSplitManager
    • ConnectorHandleResolver

    The connector metadata interface has a large number of important methods that are responsible for allowing Presto to look at lists of schemas, lists of tables, lists of columns, and other metadata about a particular data source.

    This interface is too big to list in this documentation, but if you are interested in seeing strategies for implementing these methods, look at the and the Cassandra connector. If your underlying data source supports schemas, tables and columns, this interface should be straightforward to implement. If you are attempting to adapt something that is not a relational database (as the Example HTTP connector does), you may need to get creative about how you map your data source to Presto’s schema, table, and column concepts.

    The split manager partitions the data for a table into the individual chunks that Presto will distribute to workers for processing. For example, the Hive connector lists the files for each Hive partition and creates one or more split per file. For data sources that don’t have partitioned data, a good strategy here is to simply return a single split for the entire table. This is the strategy employed by the Example HTTP connector.