Predict on a Spark MLlib model PMML InferenceService
Launch pyspark with --jars
to specify the location of the JPMML-SparkML
uber-JAR
pyspark --jars ./jpmml-sparkml-executable-1.6.3.jar
from pyspark.ml import Pipeline
from pyspark.ml.classification import DecisionTreeClassifier
from pyspark.ml.feature import RFormula
df = spark.read.csv("Iris.csv", header = True, inferSchema = True)
formula = RFormula(formula = "Species ~ .")
classifier = DecisionTreeClassifier()
pipeline = Pipeline(stages = [formula, classifier])
pipelineModel = pipeline.fit(df)
from pyspark2pmml import PMMLBuilder
pmmlBuilder = PMMLBuilder(sc, df, pipelineModel)
pmmlBuilder.buildFile("DecisionTreeIris.pmml")
Upload the DecisionTreeIris.pmml
to a GCS bucket, note that the PMMLServer
expect model file name to be model.pmml
Create the InferenceService
with pmml
predictor and specify the storageUri
with bucket location you uploaded to
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
spec:
predictor:
pmml:
storageUri: gs://kfserving-examples/models/sparkpmml
kubectl apply -f spark_pmml.yaml
Expected Output
Wait the InferenceService
to be ready
kubectl wait --for=condition=Ready inferenceservice spark-pmml
inferenceservice.serving.kserve.io/spark-pmml condition met
MODEL_NAME=spark-pmml
INPUT_PATH=@./pmml-input.json
SERVICE_HOSTNAME=$(kubectl get inferenceservice spark-pmml -o jsonpath='{.status.url}' | cut -d "/" -f 3)
Expected Output