Component

    A pipeline component is self-contained set of code that performs one step inthe ML workflow (pipeline), such as data preprocessing, data transformation,model training, and so on. A component is analogous to a function, in that ithas a name, parameters, return values, and a body.

    The code for each component includes the following:

    • Runtime code: The code that does the actual job and usually runs in thecluster. For example, Spark code that transforms raw data into preprocesseddata.

    Note the naming convention for client code and runtime code—for a tasknamed “mytask”:

    • The program contains the client code.
    • The mytask directory contains all the runtime code.
    • Metadata: name, description, etc.
    • Implementation: A specification of how to run the component given aset of argument values for the component’s inputs. The implementation sectionalso describes how to get the output values from the component once thecomponent has finished running.

    For the complete definition of a component, see thecomponent specification.

    You must package your component as a. Components represent aspecific program or entry point inside a container.