Model Metadata Storage

    SQLFlow models can be saved in MySQL, Hive, OSS and other places. While the model can be trained side by side to the docker image, it can also be trained remote to the step image (as a job on third-party platform). As a result, we do not have a unified way to store the model currently. As for the default submitter which may use MySQL or Hive as it data source, we store the model with some metadata into a zipped file, and finally store the file into a database. In this case, there is only one metadata got saved, that is the TrainSelect SQL statement. As to pai submitter, we store the model to OSS with more metadata, such as the Estimator and the FeatureColumnNames.

    First, we do not save the model metadata from the step go code any more. Because the real training work may be remote to this image. We move the saving work to the python code which is doing the real training. A file named model_meta.json is dedicated to store the metadata. Basically, we can serialize all fields in Train ir to the file. Additionally, the evaluation result will be stored if it exists.

    When releasing a trained model to Model Zoo, we can dump the zipped model dir to local file system. Then extract the metadata using the command:

    First, we implement this feature for default submitter which store the model in data storage like MySQL, Hive or . Then we implement the feature on OSS storage which is not really a database.