Model Inference Platform on Kubernetes
for Trusted AI
- KServe is a standard Model Inference Platform on Kubernetes, built for highly scalable use cases.
- Support modern serverless inference workload with Autoscaling including Scale to Zero on GPU.
- Provides high scalability, density packing and intelligent routing using ModelMesh
- Simple and Pluggable production serving for production ML serving including prediction, pre/post processing, monitoring and explainability.
- Advanced deployments with canary rollout, experiments, ensembles and transformers.
Provides Serverless deployment of single model inference on CPU/GPU for common ML frameworks Scikit-Learn, , Tensorflow, as well as pluggable custom model runtime.
ModelMesh
ModelMesh is designed for high-scale, high-density and frequently-changing model use cases. ModelMesh intelligently loads and unloads AI models to and from memory to strike an intelligent trade-off between responsiveness to users and computational footprint.
Model Explainability
Provides ML model inspection and interpretation, KServe integrates , AI Explainability 360, to help explain the predictions and gauge the confidence of those predictions.
Model Monitoring
Enables payload logging, outlier, adversarial and drift detection, KServe integrates , AI Fairness 360, to help monitor the ML models on production.
Advanced deployments
Supports canary rollout, model experiments/ensembles and feature transformers including as well as custom pre/post processing.