MLflow Projects
MLflow Projects provide a standard format for packaging and sharing reproducible data science code. Based on simple conventions, Projects enable seamless collaboration and automated execution across different environments and platforms.
Quick Start
Running Your First Project
Execute any Git repository or local directory as an MLflow Project:
# Run a project from GitHub
mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=0.5
# Run a local project
mlflow run . -P data_file=data.csv -P regularization=0.1
# Run with specific entry point
mlflow run . -e validate -P data_file=data.csv
# Run projects programmatically
import mlflow
# Execute remote project
result = mlflow.run(
"https://github.com/mlflow/mlflow-example.git",
parameters={"alpha": 0.5, "l1_ratio": 0.01},
experiment_name="elasticnet_experiment",
)
# Execute local project
result = mlflow.run(
".", entry_point="train", parameters={"epochs": 100}, synchronous=True
)
Any directory with a MLproject
file or containing .py
/.sh
files can be run as an MLflow Project. No complex setup required!
Core Concepts
Project Components
Every MLflow Project consists of three key elements:
Project Name
A human-readable identifier for your project, typically defined in the MLproject
file.
Entry Points
Commands that can be executed within the project. Entry points define:
- Parameters - Inputs with types and default values
- Commands - What gets executed when the entry point runs
- Environment - The execution context and dependencies
Environment
The software environment containing all dependencies needed to run the project. MLflow supports multiple environment types:
Environment | Use Case | Dependencies |
---|---|---|
Virtualenv (Recommended) | Python packages from PyPI | python_env.yaml |
Conda | Python + native libraries | conda.yaml |
Docker | Complex dependencies, non-Python | Dockerfile |
System | Use current environment | None |
Project Structure & Configuration
Convention-Based Projects
Projects without an MLproject
file use these conventions:
my-project/
├── train.py # Executable entry point
├── validate.sh # Shell script entry point
├── conda.yaml # Optional: Conda environment
├── python_env.yaml # Optional: Python environment
└── data/ # Project data and assets
Default Behavior:
- Name: Directory name
- Entry Points: Any
.py
or.sh
file - Environment: Conda environment from
conda.yaml
, or Python-only environment - Parameters: Passed via command line as
--key value
MLproject File Configuration
For advanced control, create an MLproject
file:
name: My ML Project
# Environment specification (choose one)
python_env: python_env.yaml
# conda_env: conda.yaml
# docker_env:
# image: python:3.9
entry_points:
main:
parameters:
data_file: path
regularization: {type: float, default: 0.1}
max_epochs: {type: int, default: 100}
command: "python train.py --reg {regularization} --epochs {max_epochs} {data_file}"
validate:
parameters:
model_path: path
test_data: path
command: "python validate.py {model_path} {test_data}"
hyperparameter_search:
parameters:
search_space: uri
n_trials: {type: int, default: 50}
command: "python hyperparam_search.py --trials {n_trials} --config {search_space}"
Parameter Types
MLflow supports four parameter types with automatic validation and transformation:
Type | Description | Example | Special Handling |
---|---|---|---|
string | Text data | "hello world" | None |
float | Decimal numbers | 0.1 , 3.14 | Validation |
int | Whole numbers | 42 , 100 | Validation |
path | Local file paths | data.csv , s3://bucket/file | Downloads remote URIs to local files |
uri | Any URI | s3://bucket/ , ./local/path | Converts relative paths to absolute |
path
parameters automatically download remote files (S3, GCS, etc.) to local storage before execution. Use uri
for applications that can read directly from remote storage.
Environment Management
Python Virtual Environments (Recommended)
Create a python_env.yaml
file for pure Python dependencies:
# python_env.yaml
python: "3.9.16"
# Optional: build dependencies
build_dependencies:
- pip
- setuptools
- wheel==0.37.1
# Runtime dependencies
dependencies:
- mlflow>=2.0.0
- scikit-learn==1.2.0
- pandas>=1.5.0
- numpy>=1.21.0
# MLproject
name: Python Project
python_env: python_env.yaml
entry_points:
main:
command: "python train.py"
Conda Environments
For projects requiring native libraries or complex dependencies:
# conda.yaml
name: ml-project
channels:
- conda-forge
- defaults
dependencies:
- python=3.9
- cudnn=8.2.1 # CUDA libraries
- scikit-learn
- pip
- pip:
- mlflow>=2.0.0
- tensorflow==2.10.0
# MLproject
name: Deep Learning Project
conda_env: conda.yaml
entry_points:
train:
parameters:
gpu_count: {type: int, default: 1}
command: "python train_model.py --gpus {gpu_count}"
By using Conda, you agree to Anaconda's Terms of Service.
Docker Environments
For maximum reproducibility and complex system dependencies:
# Dockerfile
FROM python:3.9-slim
RUN apt-get update && apt-get install -y \
build-essential \
git \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install -r requirements.txt
WORKDIR /mlflow/projects/code
# MLproject
name: Containerized Project
docker_env:
image: my-ml-image:latest
volumes: ["/host/data:/container/data"]
environment:
- ["CUDA_VISIBLE_DEVICES", "0,1"]
- "AWS_PROFILE" # Copy from host
entry_points:
train:
command: "python distributed_training.py"
Advanced Docker Options:
docker_env:
image: 012345678910.dkr.ecr.us-west-2.amazonaws.com/ml-training:v1.0
volumes:
- "/local/data:/data"
- "/tmp:/tmp"
environment:
- ["MODEL_REGISTRY", "s3://my-bucket/models"]
- ["EXPERIMENT_NAME", "production-training"]
- "MLFLOW_TRACKING_URI" # Copy from host
Environment Manager Selection
Control which environment manager to use:
# Force virtualenv (ignores conda.yaml)
mlflow run . --env-manager virtualenv
# Use local environment (no isolation)
mlflow run . --env-manager local
# Use conda (default if conda.yaml present)
mlflow run . --env-manager conda
Execution & Deployment
Local Execution
# Basic execution
mlflow run .
# With parameters
mlflow run . -P lr=0.01 -P batch_size=32
# Specific entry point
mlflow run . -e hyperparameter_search -P n_trials=100
# Custom environment
mlflow run . --env-manager virtualenv
Remote Execution
Databricks Platform
# Run on Databricks cluster
mlflow run . --backend databricks --backend-config cluster-config.json
// cluster-config.json
{
"cluster_spec": {
"new_cluster": {
"node_type_id": "i3.xlarge",
"num_workers": 2,
"spark_version": "11.3.x-scala2.12"
}
},
"run_name": "distributed-training"
}
Kubernetes Clusters
# Run on Kubernetes
mlflow run . --backend kubernetes --backend-config k8s-config.json
// k8s-config.json
{
"kube-context": "my-cluster",
"repository-uri": "gcr.io/my-project/ml-training",
"kube-job-template-path": "k8s-job-template.yaml"
}
# k8s-job-template.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{replaced-with-project-name}"
namespace: mlflow
spec:
ttlSecondsAfterFinished: 3600
backoffLimit: 2
template:
spec:
containers:
- name: "{replaced-with-project-name}"
image: "{replaced-with-image-uri}"
command: ["{replaced-with-entry-point-command}"]
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
env:
- name: MLFLOW_TRACKING_URI
value: "https://my-mlflow-server.com"
restartPolicy: Never
Python API
import mlflow
from mlflow.projects import run
# Synchronous execution
result = run(
uri="https://github.com/mlflow/mlflow-example.git",
entry_point="main",
parameters={"alpha": 0.5},
backend="local",
synchronous=True,
)
# Asynchronous execution
submitted_run = run(
uri=".",
entry_point="train",
parameters={"epochs": 100},
backend="databricks",
backend_config="cluster-config.json",
synchronous=False,
)
# Monitor progress
if submitted_run.wait():
print("Training completed successfully!")
run_data = mlflow.get_run(submitted_run.run_id)
print(f"Final accuracy: {run_data.data.metrics['accuracy']}")