Component
A pipeline component is self-contained set of code that performs one step inthe ML workflow (pipeline), such as data preprocessing, data transformation,model training, and so on. A component is analogous to a function, in that ithas a name, parameters, return values, and a body.
The code for each component includes the following:
Client code: The code that talks to endpoints to submit jobs. For example,code to talk to the Google Dataproc API to submit a Spark job.
Note the naming convention for client code and runtime code—for a tasknamed “mytask”:
- The program contains the client code.
- The
mytask
directory contains all the runtime code.
A component specification in YAML format describes the component for theKubeflow Pipelines system. A component definition has the following parts:
- Interface: input/output specifications (name, type, description, defaultvalue, etc).
- Implementation: A specification of how to run the component given aset of argument values for the component’s inputs. The implementation sectionalso describes how to get the output values from the component once thecomponent has finished running.
For the complete definition of a component, see thecomponent specification.
Each component in a pipeline executes independently. The components do not runin the same process and cannot directly share in-memory data. You must serialize(to strings or files) all the data pieces that you pass between the componentsso that the data can travel over the distributed network. You must thendeserialize the data for use in the downstream component.
- Read an .
- Follow the pipelines quickstart guideto deploy Kubeflow and run a sample pipeline directly from the KubeflowPipelines UI.
- Build a forsharing in multiple pipelines.
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Last modified 21.06.2019: