Predict on a Spark MLlib model PMML InferenceService

    1. Get JPMML-SparkML jar

    Launch pyspark with --jars to specify the location of the JPMML-SparkML uber-JAR

    1. pyspark --jars ./jpmml-sparkml-executable-1.6.3.jar
    1. from pyspark.ml import Pipeline
    2. from pyspark.ml.classification import DecisionTreeClassifier
    3. from pyspark.ml.feature import RFormula
    4. df = spark.read.csv("Iris.csv", header = True, inferSchema = True)
    5. formula = RFormula(formula = "Species ~ .")
    6. classifier = DecisionTreeClassifier()
    7. pipeline = Pipeline(stages = [formula, classifier])
    8. pipelineModel = pipeline.fit(df)
    9. from pyspark2pmml import PMMLBuilder
    10. pmmlBuilder = PMMLBuilder(sc, df, pipelineModel)
    11. pmmlBuilder.buildFile("DecisionTreeIris.pmml")

    Upload the DecisionTreeIris.pmml to a GCS bucket, note that the PMMLServer expect model file name to be model.pmml

    Create the InferenceService with pmml predictor and specify the storageUri with bucket location you uploaded to

    1. apiVersion: "serving.kserve.io/v1beta1"
    2. kind: "InferenceService"
    3. spec:
    4. predictor:
    5. pmml:
    6. storageUri: gs://kfserving-examples/models/sparkpmml
    1. kubectl apply -f spark_pmml.yaml

    Expected Output

    Wait the InferenceService to be ready

    1. kubectl wait --for=condition=Ready inferenceservice spark-pmml
    2. inferenceservice.serving.kserve.io/spark-pmml condition met
    1. MODEL_NAME=spark-pmml
    2. INPUT_PATH=@./pmml-input.json
    3. SERVICE_HOSTNAME=$(kubectl get inferenceservice spark-pmml -o jsonpath='{.status.url}' | cut -d "/" -f 3)

    Expected Output