TensorFlow Serving

To deploy a model we create following resources as illustrated below

A deployment to deploy the model using TFServing
A K8s service to create an endpoint a service
An Istio virtual service to route traffic to the model and expose it through the Istio gateway
An Istio DestinationRule is for doing traffic splitting.

Referring to the above example, you can customize your deployment by changing the following configurations in the YAML file:

In the deployment resource, the argument points to the model.Change the value to your own model.
GPU. If you want to use GPU, add nvidia.com/gpu: 1in container resources, and use a GPU image, for example:tensorflow/serving:1.11.1-gpu.

resources:
limits:
  cpu: "4"
  memory: 4Gi
  nvidia.com/gpu: 1

The resource VirtualService and DestinationRule are for routing.With the example above, the model is accessible at (HOSTNAME is your Kubeflow deployment hostname). To change the path, edit thehttp.match.uri of VirtualService.

Google cloud

Change the deployment spec as follows:

The changes are:

environment variable GOOGLE_APPLICATION_CREDENTIALS
volume gcp-credentials
volumeMount gcp-credentials

We need a service account that can access the model.If you are using Kubeflow’s click-to-deploy app, there should be already a secret, user-gcp-sa, in the cluster.

The model at gs://kubeflow-examples-data/mnist is publicly accessible. However, if your environment doesn’thave google cloud credential setup, TF serving will not be able to read the model.See this issue for example.To setup the google cloud credential, you should either have the environment variableGOOGLE_APPLICATION_CREDENTIALS pointing to the credential file, or run gcloud auth login.See for more detail.

To use S3, first you need to create secret that will contain access credentials. Use base64 to encode your credentials and check details in the Kubernetes guide to creating a secret manually

apiVersion: v1
  name: secretname
  AWS_ACCESS_KEY_ID: bmljZSB0cnk6KQ==
  AWS_SECRET_ACCESS_KEY: YnV0IHlvdSBkaWRuJ3QgZ2V0IG15IHNlY3JldCE=
kind: Secret

Then use the following manifest as an example:

If the service type is LoadBalancer, it will have its own accessible external ip.Get the external ip by:

kubectl get svc mnist-service

And then send the request

If the service type is ClusterIP, you can access through ingress.It’s protected and only one with right credentials can access the endpoint.Below shows how to programmatically authenticate a service account to access IAP.

Save the client ID that you used to as IAP_CLIENT_ID.
Create a service accountgcloud iam service-accounts create —project=$PROJECT $SERVICE_ACCOUNT
Grant the service account access to IAP enabled resources:gcloud projects add-iam-policy-binding $PROJECT \ —role roles/iap.httpsResourceAccessor \ —member serviceAccount:$SERVICE_ACCOUNT
Download the service account key:gcloud iam service-accounts keys create ${KEY_FILE} \ —iam-account ${SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com
Export the environment variable GOOGLE_APPLICATION_CREDENTIALS to point to the key file of the service account. Finally, you can send the request with this pythonscript

See the guide to for instructions on getting logs and metrics using Stackdriver.