5.1. Alluxio Cache Service
Presto can easily use Alluxio as a distributed caching file system on top of persistent storages, including file systems like HDFS or object stores like AWS S3, GCP, Azure blob store. Users may either preload data into Alluxio using Alluxio command-lines before running Presto queries, or simply rely on Alluxio to transparently cache the most recently or frequently accessed data based on the data access pattern.
Using Alluxio Structured Data Service
In addition to caching data as a file system, Alluxio can further provide data abstracted as tables and via the Alluxio Structured Data Service. The Alluxio catalog is the main component responsible for managing the structured data metadata, and caching that information from the underlying table metastore (such as Hive Metastore). After an existing table metastore is to the Alluxio catalog, the catalog will cache the table metadata from the underlying metastore, and serve that information to Presto. When Presto accesses the Alluxio catalog for table metadata, the Alluxio catalog will automatically use the Alluxio locations of the files, which removes the need to modify any locations in the existing Hive Metastore. Therefore, when Presto is using the Alluxio catalog, the table metadata is cached in the catalog, and the file contents are cached with Alluxio’s file system caching.
Then configure a Presto catalog to connect to the Alluxio catalog: