Deploy MLflow Model to Kubernetes
Using MLServer as the Inference Server
By default, MLflow deployment uses Flask, a widely used WSGI web application framework for Python, to serve the inference endpoint. However, Flask is mainly designed for a lightweight application and might not be suitable for production use cases at scale. To address this gap, MLflow integrates with MLServer as an alternative deployment option, which is used as a core Python inference server in Kubernetes-native frameworks like Seldon Core and KServe (formerly known as KFServing). Using MLServer, you can take advantage of the scalability and reliability of Kubernetes to serve your model at scale. See Serving Framework for the detailed comparison between Flask and MLServer, and why MLServer is a better choice for ML production use cases.
Building a Docker Image for MLflow Model
The essential step to deploy an MLflow model to Kubernetes is to build a Docker image that contains the MLflow model and the inference server. This can be done via
build-docker
CLI command or Python API.
mlflow models build-docker -m runs:/<run_id>/model -n <image_name> --enable-mlserver
If you want to use the bare-bones Flask server instead of MLServer, remove the --enable-mlserver
flag. For other options, see the
build-docker command documentation.
import mlflow
mlflow.models.build_docker(
model_uri=f"runs:/{run_id}/model",
name="<image_name>",
enable_mlserver=True,
)
If you want to use the bare-bones Flask server instead of MLServer, remove enable_mlserver=True
. For other options, see the
mlflow.models.build_docker function documentation.
Important
Since MLflow 2.10.1, the Docker image spec has been changed to reduce the image size and improve the performance.
Most notably, Java is no longer installed in the image except for the Java model flavor such as spark
.
If you need to install Java for other flavors, e.g. custom Python model that uses SparkML, please specify the --install-java
flag to enforce Java installation.
Deployment Steps
Please refer to the following partner documentations for deploying MLflow Models to Kubernetes using MLServer. You can also follow the tutorial below to learn the end-to-end process including environment setup, model training, and deployment.