Version Tracking Data Model
MLflow's version tracking data model provides a structured approach to managing and analyzing different versions of your GenAI applications across their entire lifecycle. By organizing version metadata within MLflow's core entities, you can systematically track performance, debug regressions, and validate deployments across development, staging, and production environments.
Overviewโ
Version tracking in MLflow integrates seamlessly with the core data model through strategic use of tags and metadata. This approach enables comprehensive version management while maintaining the flexibility to adapt to your specific deployment and development workflows.
Core Entities for Version Trackingโ
๐งช Experiment: The Version Containerโ
An Experiment serves as the root container for all versions of your GenAI application. Within a single experiment, you can track multiple application versions, environments, and deployment states while maintaining a unified view of your application's evolution.
Key characteristics:
- Single namespace: One experiment contains all versions of your application
- Cross-version analysis: Compare performance across different versions within the same container
- Historical continuity: Maintain complete version history in one location
- Unified metadata: Consistent tagging and organization across all versions
๐ Traces: Version-Aware Execution Recordsโ
Each Trace represents a single execution of your application and carries version-specific metadata through tags. This enables granular tracking of how different versions perform in various contexts.
Version metadata captured in traces:
Standard vs Custom Version Tags:
Tag Type | Purpose | Examples |
---|---|---|
Automatic | MLflow-populated metadata | mlflow.source.git.commit , mlflow.source.name |
Standard | Reserved for specific meanings | mlflow.trace.session , mlflow.trace.user |
Custom | Application-specific context | app_version , environment , deployment_id |
๐ Assessments: Version-Specific Quality Judgmentsโ
Assessments enable version-specific quality analysis by attaching evaluations to traces. This creates a foundation for comparing quality metrics across different versions and deployment contexts.
Assessment types for version tracking:
- Performance Feedback: Latency, throughput, resource usage
- Quality Feedback: Relevance, accuracy, helpfulness scores
- User Experience: Satisfaction ratings, usability metrics
- Regression Testing: Expected outputs for version validation
๐ฏ Scorers: Automated Version Analysisโ
Scorers provide automated evaluation functions that can detect version-specific performance patterns, regressions, and improvements. They transform raw trace data into actionable version insights.
๐ Evaluation Datasets: Version Testing Collectionsโ
Evaluation Datasets support systematic version testing by providing curated collections of inputs and expected outputs. These datasets enable consistent comparison across versions and deployment validation.
Dataset organization for version management:
- Regression Testing: Core functionality validation across versions
- Performance Benchmarking: Standardized performance measurement
- Feature Validation: New capability testing and verification
- Environment Testing: Deployment-specific scenario validation
๐ Evaluation Runs: Version Comparison Engineโ
Evaluation Runs orchestrate systematic version comparisons by running different application versions against the same datasets and collecting scored results for analysis.