Model Metadata Storage
SQLFlow models can be saved in MySQL
, Hive
, OSS
and other places. While the model can be trained side by side to the docker image, it can also be trained remote to the step image (as a job on third-party platform). As a result, we do not have a unified way to store the model currently. As for the default submitter
which may use MySQL
or Hive
as it data source, we store the model with some metadata into a zipped file, and finally store the file into a database. In this case, there is only one metadata got saved, that is the TrainSelect
SQL statement. As to pai
submitter, we store the model to OSS
with more metadata, such as the Estimator
and the FeatureColumnNames
.
First, we do not save the model metadata from the step
go code any more. Because the real training work may be remote to this image. We move the saving work to the python code which is doing the real training. A file named model_meta.json
is dedicated to store the metadata. Basically, we can serialize all fields in Train ir
to the file. Additionally, the evaluation result will be stored if it exists.
When releasing a trained model to Model Zoo, we can dump the zipped model dir to local file system. Then extract the metadata using the command:
First, we implement this feature for default submitter which store the model in data storage like MySQL
, Hive
or . Then we implement the feature on OSS
storage which is not really a database.