Authenticating Pipelines to GCP
This page describes authentication for Kubeflow Pipelines to GCP. Available options listed below have different tradeoffs. You should choose the one that fits your use-case.
- Configuring your cluster to access Google Cloud using Compute Engine default service account with the “cloud-platform” scope is easier to set up than the other options. However, this approach grants excessive permissions. Therefore, it is not suitable if you need workload permission separation.
- takes more efforts to set up, but allows fine-grained permission control. It is recommended for production use-cases.
- Google service account keys stored as Kubernetes secrets is the legacy approach and no longer recommended in GKE. However, it’s the only option to use GCP APIs when your cluster is an or on-prem cluster.
There are various options on how to install Kubeflow Pipelines in the Installation Options for Kubeflow Pipelines guide. Be aware that authentication support and cluster setup instructions will vary depending on the method you used to install Kubeflow Pipelines.
- For Kubeflow Pipelines standalone, you can compare and choose from all 3 options.
- For full Kubeflow starting from Kubeflow 1.1, is the recommended and default option.
- For AI Platform Pipelines, Compute Engine default service account is the only supported option.
This is good for trying out Kubeflow Pipelines, because it is easy to set up.
However, it does not support permission separation for workloads in the cluster. Any workload in the cluster will be able to call any Google Cloud APIs in the chosen scope.
NOTE: Using pipelines with Compute Engine default service account is not supported in Full Kubeflow deployment.
By default, your GKE nodes use . If you allowed scope when creating the cluster, Kubeflow Pipelines can authenticate to GCP and manage resources in your project without further configuration.
Use one of the following options to create a GKE cluster that uses the Compute Engine default service account:
- If you followed instructions in Setting up AI Platform Pipelines and checked
Allow access to the following Cloud APIs
, your cluster is already using Compute Engine default service account. - In Google Cloud Console UI, you can enable it in
Create a Kubernetes cluster -> default-pool -> Security -> Accesss Scopes -> Allow full access to all Cloud APIs
like the following: - Using
gcloud
CLI, you can enable it with--scopes cloud-platform
like the following:
Please refer to for other available options.
Pipelines don’t need any specific changes to authenticate to GCP, it will use the default service account transparently.
However, you must update existing pipelines that use the use_gcp_secret kfp sdk operator. Remove the use_gcp_secret
usage to let your pipeline authenticate to Google Cloud using the default service account.
- Instructions to enable it on your cluster.
- Whether its limitations affect your adoption.
Terminology
This document distinguishes between Kubernetes service accounts (KSAs) and (GSAs). KSAs are Kubernetes resources, while GSAs are specific to Google Cloud. Other documentation usually refers to both of them as just “service accounts”.
Authoring pipelines to use Workload Identity
Pipelines don’t need any specific changes to authenticate to Google Cloud. With Workload Identity, pipelines run as the Google service account that is bound to the KSA.
However, existing pipelines that use need to remove the use_gcp_secret
usage to use the bound GSA. You can also continue to use use_gcp_secret
in a cluster with Workload Identity enabled and use_gcp_secret
will take precedence for those workloads.
Cluster setup to use Workload Identity for Full Kubeflow
Starting from Kubeflow 1.1, Kubeflow Pipelines . Therefore, pipeline runs are executed in user namespaces using the default-editor
KSA. The KSA is auto-bound to the GSA specified in the user profile, which defaults to a shared GSA ${KFNAME}-user@${PROJECT}.iam.gserviceaccount.com
.
If you want to bind the default-editor
KSA with a different GSA for a specific namespace, refer to the In-cluster authentication to Google Cloud guide.
Additionally, the Kubeflow Pipelines UI, visualization, and TensorBoard server instances are deployed in your user namespace using the default-editor
KSA. Therefore, to , they can fetch artifacts in Google Cloud Storage using permissions of the same GSA you configured for this namespace.
Cluster setup to use Workload Identity for Pipelines Standalone
1. Create your cluster with Workload Identity enabled
In Google Cloud Console UI, you can enable Workload Identity in
Create a Kubernetes cluster -> Security -> Enable Workload Identity
like the following:Using
gcloud
CLI, you can enable it with:
References:
2. Deploy Kubeflow Pipelines
3. Bind Workload Identities for KSAs used by Kubeflow Pipelines
The following helper bash scripts bind Workload Identities for KSAs used by Kubeflow Pipelines:
- gcp-workload-identity-setup.sh helps you create GSAs and bind them to KSAs used by pipelines workloads. This script provides an interactive command line dialog with explanation messages.
- alternatively provides minimal utility bash functions that let you customize your setup. The minimal utilities make it easy to read and use programmatically.
For example, to get a default setup using gcp-workload-identity-setup.sh
, you can
4. Configure IAM permissions of used GSAs
If you used gcp-workload-identity-setup.sh
to bind Workload Identities for your cluster, you can simply add the following IAM bindings:
- Give GSA
<cluster-name>-kfp-system@<project-id>.iam.gserviceaccount.com
Storage Object Viewer
role to let UI load data in GCS in the same project. - Give GSA any permissions your pipelines need. For quick tryouts, you can give it
Project Editor
role for all permissions.
If you configured bindings by yourself, here are GCP permission requirements for KFP KSAs:
- Pipelines use
pipeline-runner
KSA. Configure IAM permissions of the GSA bound to this KSA to allow pipelines use GCP APIs. - Pipelines UI uses
ml-pipeline-ui
KSA. Pipelines Visualization Server usesml-pipeline-visualizationserver
KSA. If you need to view artifacts and visualizations stored in Google Cloud Storage (GCS) from pipelines UI, you should add Storage Object Viewer permission (or the minimal required permission) to their bound GSAs.
It is recommended to use Workload Identity for easier and secure management, but you can also choose to use GSA keys.
Authoring pipelines to use GSA keys
Each pipeline step describes a container that is run independently. If you want to grant access for a single step to use one of your service accounts, you can use kfp.gcp.use_gcp_secret()
. Examples for how to use this function can be found in the .
Cluster setup to use use_gcp_secret for Full Kubeflow
From Kubeflow 1.1, there’s no longer a user-gcp-sa
secrets deployed for you. Recommend using Workload Identity instead.
For Kubeflow 1.0 or earlier, you don’t need to do anything. Full Kubeflow deployment has already deployed the user-gcp-sa
secret for you.
Cluster setup to use use_gcp_secret for Pipelines Standalone
Pipelines Standalone require your manual setup for the user-gcp-sa
secret used by use_gcp_secret
.
Instructions to set up the secret:
First download the GCE VM service account token (refer to GCP documentation for more information):
Last modified 20.04.2021: