TensorFlow Serving

Out of date

This guide contains outdated information pertaining to Kubeflow 1.0. This guide needs to be updated for Kubeflow 1.1.

This Kubeflow component has stable status. See the .

To deploy a model we create following resources as illustrated below

  • A deployment to deploy the model using TFServing
  • A K8s service to create an endpoint a service
  • An Istio virtual service to route traffic to the model and expose it through the Istio gateway
  • An Istio DestinationRule is for doing traffic splitting.

Referring to the above example, you can customize your deployment by changing the following configurations in the YAML file:

  • In the deployment resource, the argument points to the model. Change the value to your own model.

  • The example contains three configurations for Google Cloud Storage (GCS) access: volumes (secret user-gcp-sa), volumeMounts, and env (GOOGLE_APPLICATION_CREDENTIALS). If your model is not at GCS (e.g. using S3 from AWS), See the section below on how to setup access.

  • GPU. If you want to use GPU, add nvidia.com/gpu: 1 in container resources, and use a GPU image, for example: tensorflow/serving:1.11.1-gpu.

    1. resources:
    2. limits:
    3. cpu: "4"
    4. memory: 4Gi
    5. nvidia.com/gpu: 1
  • The resource VirtualService and DestinationRule are for routing. With the example above, the model is accessible at HOSTNAME/tfserving/models/mnist (HOSTNAME is your Kubeflow deployment hostname). To change the path, edit the http.match.uri of VirtualService.

Depending where model file is located, set correct parameters

Change the deployment spec as follows:

  1. spec:
  2. selector:
  3. matchLabels:
  4. app: mnist
  5. template:
  6. metadata:
  7. annotations:
  8. sidecar.istio.io/inject: "true"
  9. labels:
  10. app: mnist
  11. version: v1
  12. containers:
  13. - args:
  14. - --port=9000
  15. - --rest_api_port=8500
  16. - --model_base_path=gs://kubeflow-examples-data/mnist
  17. command:
  18. - /usr/bin/tensorflow_model_server
  19. env:
  20. - name: GOOGLE_APPLICATION_CREDENTIALS
  21. value: /secret/gcp-credentials/user-gcp-sa.json
  22. image: tensorflow/serving:1.11.1-gpu
  23. imagePullPolicy: IfNotPresent
  24. livenessProbe:
  25. initialDelaySeconds: 30
  26. periodSeconds: 30
  27. tcpSocket:
  28. port: 9000
  29. name: mnist
  30. ports:
  31. - containerPort: 9000
  32. - containerPort: 8500
  33. resources:
  34. limits:
  35. cpu: "4"
  36. memory: 4Gi
  37. nvidia.com/gpu: 1
  38. requests:
  39. cpu: "1"
  40. memory: 1Gi
  41. volumeMounts:
  42. name: config-volume
  43. name: gcp-credentials
  44. volumes:
  45. - configMap:
  46. name: mnist-v1-config
  47. name: config-volume
  48. - name: gcp-credentials
  49. secret:
  50. secretName: user-gcp-sa

The changes are:

  • environment variable GOOGLE_APPLICATION_CREDENTIALS
  • volume gcp-credentials
  • volumeMount gcp-credentials

We need a service account that can access the model. If you are using Kubeflow’s click-to-deploy app, there should be already a secret, user-gcp-sa, in the cluster.

The model at gs://kubeflow-examples-data/mnist is publicly accessible. However, if your environment doesn’t have google cloud credential setup, TF serving will not be able to read the model. See this issue for example. To setup the google cloud credential, you should either have the environment variable GOOGLE_APPLICATION_CREDENTIALS pointing to the credential file, or run gcloud auth login. See for more detail.

S3

To use S3, first you need to create secret that will contain access credentials. Use base64 to encode your credentials and check details in the Kubernetes guide to creating a secret manually

  1. apiVersion: v1
  2. metadata:
  3. name: secretname
  4. data:
  5. AWS_ACCESS_KEY_ID: bmljZSB0cnk6KQ==
  6. AWS_SECRET_ACCESS_KEY: YnV0IHlvdSBkaWRuJ3QgZ2V0IG15IHNlY3JldCE=
  7. kind: Secret

Then use the following manifest as an example:

If the service type is LoadBalancer, it will have its own accessible external ip. Get the external ip by:

  1. kubectl get svc mnist-service

And then send the request

  1. curl -X POST -d @input.json http://EXTERNAL_IP:8500/v1/models/mnist:predict
  1. Save the client ID that you used to as IAP_CLIENT_ID.
  2. Create a service account

    1. gcloud iam service-accounts create --project=$PROJECT $SERVICE_ACCOUNT
  3. Grant the service account access to IAP enabled resources:

  4. Download the service account key:

    1. gcloud iam service-accounts keys create ${KEY_FILE} \
    2. --iam-account ${SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com
  5. Export the environment variable GOOGLE_APPLICATION_CREDENTIALS to point to the key file of the service account.

Finally, you can send the request with an input file with this python script

To send a GET request:

    Please look at the .

    See the guide to logging and monitoring for instructions on getting logs and metrics using Stackdriver.