Predict on a Spark MLlib model PMML InferenceService

Get JPMML-SparkML jar

Launch pyspark with --jars to specify the location of the JPMML-SparkML uber-JAR

pyspark --jars ./jpmml-sparkml-executable-1.6.3.jar

from pyspark.ml import Pipeline
from pyspark.ml.classification import DecisionTreeClassifier
from pyspark.ml.feature import RFormula
df = spark.read.csv("Iris.csv", header = True, inferSchema = True)
formula = RFormula(formula = "Species ~ .")
classifier = DecisionTreeClassifier()
pipeline = Pipeline(stages = [formula, classifier])
pipelineModel = pipeline.fit(df)
from pyspark2pmml import PMMLBuilder
pmmlBuilder = PMMLBuilder(sc, df, pipelineModel)
pmmlBuilder.buildFile("DecisionTreeIris.pmml")

Upload the DecisionTreeIris.pmml to a GCS bucket, note that the PMMLServer expect model file name to be model.pmml

Create the InferenceService with pmml predictor and specify the storageUri with bucket location you uploaded to

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
spec:
  predictor:
    pmml:
      storageUri: gs://kfserving-examples/models/sparkpmml

kubectl apply -f spark_pmml.yaml

Expected Output

Wait the InferenceService to be ready

kubectl wait --for=condition=Ready inferenceservice spark-pmml
inferenceservice.serving.kserve.io/spark-pmml condition met

MODEL_NAME=spark-pmml
INPUT_PATH=@./pmml-input.json
SERVICE_HOSTNAME=$(kubectl get inferenceservice spark-pmml -o jsonpath='{.status.url}' | cut -d "/" -f 3)

Expected Output