Deploy Transformer with InferenceService

    KServe.KFModel base class mainly defines three handlers preprocess, predict and postprocess, these handlers are executed in sequence, the output of the preprocess is passed to predict as the input, when predictor_host is passed the predict handler by default makes a HTTP call to the predictor url and gets back a response which then passes to postproces handler. KServe automatically fills in the predictor_host for and handle the call to the Predictor, for gRPC predictor currently you would need to overwrite the predict handler to make the gRPC call.

    To implement a Transformer you can derive from the base KFModel class and then overwrite the preprocess and postprocess handler to have your own customized transformation logic.

    Please see the code example here

    1. docker build -t {username}/image-transformer:latest -f transformer.Dockerfile .
    2. docker push {username}/image-transformer:latest

    By default InferenceService uses TorchServe to serve the PyTorch models and the models are loaded from a model repository in KServe example gcs bucket according to model repository layout. The model repository contains a mnist model but you can store more than one models there. In the Transformer image you can create a tranformer class for all the models in the repository if they can share the same transformer or maintain a map from model name to transformer classes so KServe knows to use the transformer for the corresponding model.

    Note

    STORAGE_URI environment variable is a build-in env to inject the storage initializer for custom container just like StorageURI field for prepackaged predictors and the downloaded artifacts are stored under /mnt/models.

    1. kubectl apply -f transformer.yaml

    Expected Output

    The first step is to and set INGRESS_HOST and INGRESS_PORT

    1. MODEL_NAME=mnist
    2. INPUT_PATH=@./input.json
    3. SERVICE_HOSTNAME=$(kubectl get inferenceservice $SERVICE_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)

    Expected Output