Catalogs Beta
One of the most crucial aspects of data processing is managing metadata.It may be transient metadata like temporary tables, or UDFs registered against the table environment.Or permanent metadata, like that in a Hive Metastore. Catalogs provide a unified API for managing metadata and making it accessible from the Table API and SQL Queries.
Flink sessions always have a built-in named default_catalog
, which has a built-in default database named default_database
.All temporary metadata, such tables defined using TableEnvironment#registerTable
is registered to this catalog.
HiveCatalog
Warning The Hive Metastore stores all meta-object names in lower case. This is unlike GenericInMemoryCatalog
which is case-sensitive
Catalogs are pluggable and users can develop custom catalogs by implementing the Catalog
interface.To use custom catalogs in SQL CLI, users should develop both a catalog and its corresponding catalog factory by implementing the CatalogFactory
interface.
Catalog API
Registering a Catalog
Users can register additional catalogs into an existing Flink session.
All catalogs defined using YAML must provide a type
property that specifies the type of catalog. The following types are supported out of the box.
catalogs:
- name: myCatalog
type: custom_catalog
hive-conf-dir: ...
tableEnv.useCatalog("myCatalog");
Flink SQL> USE myDB;
Metadata from catalogs that are not the current catalog are accessible by providing fully qualified names in the form catalog.database.object
.
Flink SQL> SELECT * FROM not_the_current_catalog.not_the_current_db.my_table;
List Available Catalogs
tableEnv.listCatalogs();
Flink SQL> show catalogs;
Flink SQL> show databases;
List Available Tables
tableEnv.listTables();