Skip to main content

Optimize Prompts

The simple way to continuously improve your AI agents and prompts.

MLflow's prompt optimization lets you systematically enhance your AI applications with minimal code changes. Whether you're building with LangChain, OpenAI Agent, CrewAI, or your own custom implementation, MLflow provides a universal path from initial prototyping to steady improvement.

Minimum rewrites, no lock-in, just better prompts.

MLflow supports multiple optimization algorithms to improve your prompts:

  • GEPA (GepaPromptOptimizer): Iteratively refines prompts using LLM-driven reflection and automated feedback, achieving systematic improvements through trial-and-error learning.
  • Metaprompting (MetaPromptOptimizer): Restructures prompts to be more systematic and effective, working in both zero-shot mode (without training data) and few-shot mode (learning from your examples).

See Choosing Your Optimizer for guidance on which optimizer to use for your specific needs.

Why Use MLflow Prompt Optimization?
  • Zero Framework Lock-in: Works with ANY agent framework—LangChain, OpenAI Agent, CrewAI, or custom solutions
  • Minimal Code Changes: Add a few lines to start optimizing; no architectural rewrites needed
  • Data-Driven Improvement: Automatically learn from your evaluation data and custom metrics
  • Multi-Prompt Optimization: Jointly optimize multiple prompts for complex agent workflows
  • Granular Control: Optimize single prompts or entire multi-prompt workflows—you decide what to improve
  • Production-Ready: Built-in version control and registry for seamless deployment
  • Extensible: Bring your own optimization algorithms with simple base class extension
Version Requirements

The optimize_prompts API requires MLflow >= 3.5.0.

Quick Start​

Here's a realistic example of optimizing a prompt for medical paper section classification:

python
import mlflow
import openai
from mlflow.genai.optimize import GepaPromptOptimizer
from mlflow.genai.scorers import Correctness

# Register initial prompt for classifying medical paper sections
prompt = mlflow.genai.register_prompt(
name="medical_section_classifier",
template="Classify this medical research paper sentence into one of these sections: CONCLUSIONS, RESULTS, METHODS, OBJECTIVE, BACKGROUND.\n\nSentence: {{sentence}}",
)


# Define your prediction function
def predict_fn(sentence: str) -> str:
prompt = mlflow.genai.load_prompt("prompts:/medical_section_classifier/1")
completion = openai.OpenAI().chat.completions.create(
model="gpt-5-nano",
# load prompt template using PromptVersion.format()
messages=[{"role": "user", "content": prompt.format(sentence=sentence)}],
)
return completion.choices[0].message.content


# Training data with medical paper sentences and ground truth labels
# fmt: off
raw_data = [
("The emergence of HIV as a chronic condition means that people living with HIV are required to take more responsibility for the self-management of their condition , including making physical , emotional and social adjustments .", "BACKGROUND"),
("This paper describes the design and evaluation of Positive Outlook , an online program aiming to enhance the self-management skills of gay men living with HIV .", "BACKGROUND"),
("This study is designed as a randomised controlled trial in which men living with HIV in Australia will be assigned to either an intervention group or usual care control group .", "METHODS"),
("The intervention group will participate in the online group program ` Positive Outlook ' .", "METHODS"),
("The program is based on self-efficacy theory and uses a self-management approach to enhance skills , confidence and abilities to manage the psychosocial issues associated with HIV in daily life .", "METHODS"),
("Participants will access the program for a minimum of 90 minutes per week over seven weeks .", "METHODS"),
("Primary outcomes are domain specific self-efficacy , HIV related quality of life , and outcomes of health education .", "METHODS"),
("Secondary outcomes include : depression , anxiety and stress ; general health and quality of life ; adjustment to HIV ; and social support .", "METHODS"),
("Data collection will take place at baseline , completion of the intervention ( or eight weeks post randomisation ) and at 12 week follow-up .", "METHODS"),
("Results of the Positive Outlook study will provide information regarding the effectiveness of online group programs improving health related outcomes for men living with HIV .", "CONCLUSIONS"),
("The aim of this study was to evaluate the efficacy , safety and complications of orbital steroid injection versus oral steroid therapy in the management of thyroid-related ophthalmopathy .", "OBJECTIVE"),
("A total of 29 patients suffering from thyroid ophthalmopathy were included in this study .", "METHODS"),
("Patients were randomized into two groups : group I included 15 patients treated with oral prednisolone and group II included 14 patients treated with peribulbar triamcinolone orbital injection .", "METHODS"),
("Both groups showed improvement in symptoms and in clinical evidence of inflammation with improvement of eye movement and proptosis in most cases .", "RESULTS"),
("Mean exophthalmometry value before treatment was 22.6 1.98 mm that decreased to 18.6 0.996 mm in group I , compared with 23 1.86 mm that decreased to 19.08 1.16 mm in group II .", "RESULTS"),
("There was no change in the best-corrected visual acuity in both groups .", "RESULTS"),
("There was an increase in body weight , blood sugar , blood pressure and gastritis in group I in 66.7 % , 33.3 % , 50 % and 75 % , respectively , compared with 0 % , 0 % , 8.3 % and 8.3 % in group II .", "RESULTS"),
("Orbital steroid injection for thyroid-related ophthalmopathy is effective and safe .", "CONCLUSIONS"),
("It eliminates the adverse reactions associated with oral corticosteroid use .", "CONCLUSIONS"),
("The aim of this prospective randomized study was to examine whether active counseling and more liberal oral fluid intake decrease postoperative pain , nausea and vomiting in pediatric ambulatory tonsillectomy .", "OBJECTIVE"),
]
# fmt: on

# Format dataset for optimization
dataset = [
{
"inputs": {"sentence": sentence},
"expectations": {"expected_response": label},
}
for sentence, label in raw_data
]

# Optimize the prompt
result = mlflow.genai.optimize_prompts(
predict_fn=predict_fn,
train_data=dataset,
prompt_uris=[prompt.uri],
optimizer=GepaPromptOptimizer(
reflection_model="openai:/gpt-5", max_metric_calls=300
),
scorers=[Correctness(model="openai:/gpt-5-mini")],
)

# Use the optimized prompt
optimized_prompt = result.optimized_prompts[0]
print(f"Optimized template: {optimized_prompt.template}")

The API will automatically improve the prompt to better classify medical paper sections by learning from the training examples.

Choosing Your Optimizer​

MLflow currently supports two optimization algorithms: GEPA and Metaprompting. Each uses different strategies to improve your prompts.

GEPA (Genetic-Pareto)​

GEPA is a prompt optimization technique that uses natural language reflection to iteratively improve LLM performance through trial-and-error learning. It's particularly effective at extracting rich learning signals from system behavior by analyzing failures in natural language.

Key Features:

  • Natural Language Reflection: Leverages interpretable language to extract learning signals from execution traces, reasoning chains, and tool interactions
  • High Efficiency: Achieves superior results with dramatically fewer iterations (up to 35x fewer rollouts compared to traditional methods like GRPO)
  • Pareto Synthesis: Smartly picks the past prompt to mutate and improve
  • Strong Performance: Demonstrates reliable gains on a wide range of tasks, e.g., context compression, Q&A agents, etc.

Best For:

  • Tasks where you have clear evaluation metrics and a dataset of decent size (e.g., 100+ records)
  • Tasks where quality is critical to your system (e.g., medical agents, financial agents, etc.), so that the optimization cost and longer prompt as produced by GEPA is worth it
Reduce the Cost of GEPA Optimization

The cost of GEPA optimization is tightly coupled with the reflection model you use and the max number of metric calls you allow. You can reduce the cost by using a cheaper reflection model or reducing the max number of metric calls.

Learn More: GEPA Research Paper | GEPA GitHub Repository

Metaprompting​

Metaprompting is a prompt optimization technique that utilizes a metaprompt to call the language model to restructure your prompts to be more systematic and effective. It operates in two modes:

Zero-Shot Mode:

  • Analyzes your initial prompt and restructures it to follow best practices
  • Makes prompts more systematic without requiring training data
  • Quick to run and requires no examples

Few-Shot Mode:

  • Evaluates the initial prompt on your training data to understand task-specific patterns
  • Leverages the evaluation results along with general best practices to restructure the prompt to be more systematic and effective

Key Features:

  • Fast Optimization: Runs fast because it only does one evaluation round in few-shot mode and just one single call to the language model in zero-shot mode
  • Minimal Data Requirement: Works well with zero or just a few examples (less than 10)
  • Systematic Improvement: Restructures prompts to follow clear patterns and best practices
  • Data-Aware: In few-shot mode, learns from your specific data to tailor improvements
  • Custom Guidelines: You can provide custom guidelines to the optimizer to tailor the optimization to your specific needs

Best For: Tasks where you want quick improvements based on prompt engineering best practices, or when you have limited training data but want to leverage it for targeted improvements.

Usage Examples:

python
from mlflow.genai.optimize import MetaPromptOptimizer

# Zero-shot mode: No training data or scorers required
# The optimizer automatically uses zero-shot mode when train_data is empty and scorers is empty
results = mlflow.genai.optimize_prompts(
predict_fn=predict_fn,
train_data=[],
prompt_uris=[prompt.uri],
optimizer=MetaPromptOptimizer(
reflection_model="openai:/gpt-5",
guidelines="This prompt is used in a finance agent to project tax situations.",
),
scorers=[],
)

# Few-shot mode: Learn from training data
# The optimizer automatically uses few-shot mode when train_data and scorers are provided
results = mlflow.genai.optimize_prompts(
predict_fn=predict_fn,
train_data=dataset,
prompt_uris=[prompt.uri],
optimizer=MetaPromptOptimizer(
reflection_model="openai:/gpt-5",
guidelines="This prompt is used in a finance agent to project tax situations.",
),
scorers=[Correctness(model="openai:/gpt-5-mini")],
)

Comparison Summary​

FeatureGEPAMetaprompting (Zero-Shot)Metaprompting (Few-Shot)
Requires Training DataYesNoYes
Optimization SpeedModerateFastFast-Moderate
Learning ApproachIterative trial-and-error with reflectionSystematic restructuringData-driven restructuring
Best Use CaseComplex tasks with clear metricsQuick improvements without dataTargeted improvements with limited data

Choose the optimizer that best fits your task requirements, available data, and optimization budget.

Example: Simple Prompt → Optimized Prompt​

Before Optimization:

text
Classify this medical research paper sentence
into one of these sections: CONCLUSIONS, RESULTS,
METHODS, OBJECTIVE, BACKGROUND.

Sentence: {{sentence}}

After Optimization:

text
You are a single-sentence classifier for medical research abstracts. For each input sentence, decide which abstract section it belongs to and output exactly one label in UPPERCASE with no extra words, punctuation, or explanation.

Allowed labels: CONCLUSIONS, RESULTS, METHODS, OBJECTIVE, BACKGROUND

Input format:
- The prompt will be:
"Classify this medical research paper sentence into one of these sections: CONCLUSIONS, RESULTS, METHODS, OBJECTIVE, BACKGROUND.

Sentence: {{sentence}}"

Core rules:
- Use only the information in the single sentence.
- Classify by the sentence's function: context-setting vs aim vs procedure vs findings vs interpretation.
- Return exactly one uppercase label from the allowed set.

Decision guide and lexical cues:

1) RESULTS
- Reports observed findings/outcomes tied to data.
- Common cues: past-tense result verbs and outcome terms: "showed," "was/were associated with," "increased/decreased," "improved," "reduced," "significant," "p < …," "odds ratio," "risk ratio," "95% CI," percentages, rates, counts or numbers tied to effects/adverse events.
- If it explicitly states changes, associations, statistical significance, or quantified outcomes, choose RESULTS.

2) CONCLUSIONS
- Interpretation, implications, recommendations, or high-level takeaways.
- Common cues: "In conclusion," "These findings suggest/indicate," "We conclude," statements about practice/policy/clinical implications, benefit–risk judgments, feasibility statements.
- Sentences that forecast the significance/utility of the study's results ("Results will provide insight/information," "Findings will inform/guide practice") are CONCLUSIONS.
- Tie-break with RESULTS: If a sentence describes an outcome as a general claim without specific observed data/statistics, prefer CONCLUSIONS over RESULTS.

3) METHODS
- How the study was conducted: design, participants, interventions/programs, measurements/outcomes lists, timelines, procedures, or analyses.
- Common cues: design terms ("randomized," "double-blind," "cross-sectional," "cohort," "case-control"), "participants," "n =," inclusion/exclusion criteria, instruments/scales, dosing/protocols, schedules/timelines, statistical tests/analysis plans ("multivariate regression," "Kaplan–Meier," "ANOVA," "we will compare"), trial registration, ethics approval.
- Measurement/outcome lists are METHODS (e.g., "Secondary outcomes include: …"; "Primary outcome was …").
- Numbers specifying sample size (e.g., "n = 200") → METHODS; numbers tied to effects → RESULTS.
- Program/intervention descriptions, components, theoretical basis, and mechanisms are METHODS, even if written in present tense and even if they contain purpose phrases. Examples: "The program is based on self-efficacy theory…," "The intervention uses a self-management approach to enhance skills…," "The device is designed to…"
- Important: An infinitive "to [verb] …" inside a program/intervention description (e.g., "uses X to improve Y") is METHODS, not OBJECTIVE, because it describes how the intervention works, not the study's aim.

4) OBJECTIVE
- The aim/purpose/hypothesis of the study.
- Common cues: "Objective(s):" "Aim/Purpose was," "We aimed/sought/intended to," "We hypothesized that …"
- Infinitive purpose phrases indicating the study's aim without procedures or results: "To determine/evaluate/assess/investigate whether …" → OBJECTIVE.
- Phrases like "The aim of this study was to evaluate the efficacy/safety of X vs Y …" → OBJECTIVE.
- If "We evaluated/assessed …" is clearly used as a purpose statement (not describing methods or results), label OBJECTIVE.

5) BACKGROUND
- Context, rationale, prior knowledge, unmet need; introduces topic without specific aims, procedures, or results.
- Common cues: burden/prevalence statements, "X is common," "X remains poorly understood," prior work summaries, general descriptions.
- If a sentence merely states that a paper describes/reports a program/design/evaluation without concrete procedures/analyses, label as BACKGROUND.

Important tie-break rules:
- RESULTS vs CONCLUSIONS: Observed data/findings → RESULTS; interpretation/generalization/recommendation → CONCLUSIONS.
- OBJECTIVE vs METHODS: Purpose/aim of the study → OBJECTIVE; concrete design/intervention details/measurements/analysis steps → METHODS.
- BACKGROUND vs OBJECTIVE: Context/motivation without an explicit study aim → BACKGROUND.
- BACKGROUND vs METHODS: General description without concrete procedures/analyses → BACKGROUND.
- The word "Results" at the start does not guarantee RESULTS; e.g., "Results will provide information …" → CONCLUSIONS.

Output constraint:
- Return exactly one uppercase label: CONCLUSIONS, RESULTS, METHODS, OBJECTIVE, or BACKGROUND. No extra text or punctuation.

Components​

The mlflow.genai.optimize_prompts() API requires the following components:

ComponentDescription
Target Prompt URIsList of prompt URIs to optimize (e.g., ["prompts:/qa/1"])
Predict FunctionA callable that takes inputs as keyword arguments and returns outputs. Must load templates from MLflow prompt versions (e.g., call PromptVersion.format()).
Training DataDataset with inputs (dict) and expectations (expected results). Supports pandas DataFrame, list of dicts, or MLflow EvaluationDataset.
OptimizerPrompt optimizer instance (e.g., GepaPromptOptimizer or MetaPromptOptimizer). See Choosing Your Optimizer for guidance.

1. Target Prompt URIs​

Specify which prompts to optimize using their URIs from MLflow Prompt Registry:

python
prompt_uris = [
"prompts:/qa/1", # Specific version
"prompts:/instruction@latest", # Latest version
]

You can reference prompts by:

  • Specific version: "prompts:/qa/1" - Optimize a particular version
  • Latest version: "prompts:/qa@latest" - Optimize the most recent version
  • Alias: "prompts:/qa@champion" - Optimize a version with a specific alias

2. Predict Function​

Your predict_fn must:

python
def predict_fn(question: str) -> str:
# Load prompt from registry
prompt = mlflow.genai.load_prompt("prompts:/qa/1")

# Format the prompt with input variables
formatted_prompt = prompt.format(question=question)

# Call your LLM
response = your_llm_call(formatted_prompt)

return response

3. Training Data​

Provide a dataset with inputs and expectations. Both columns should have dictionary values. inputs values will be passed to the predict function as keyword arguments. Please refer to Built-in Judges for the expected format of each built in scorers.

python
# List of dictionaries - Example: Medical paper classification
dataset = [
{
"inputs": {
"sentence": "The emergence of HIV as a chronic condition means that people living with HIV are required to take more responsibility..."
},
"expectations": {"expected_response": "BACKGROUND"},
},
{
"inputs": {
"sentence": "This study is designed as a randomised controlled trial in which men living with HIV..."
},
"expectations": {"expected_response": "METHODS"},
},
{
"inputs": {
"sentence": "Both groups showed improvement in symptoms and in clinical evidence of inflammation..."
},
"expectations": {"expected_response": "RESULTS"},
},
{
"inputs": {
"sentence": "Orbital steroid injection for thyroid-related ophthalmopathy is effective and safe."
},
"expectations": {"expected_response": "CONCLUSIONS"},
},
{
"inputs": {
"sentence": "The aim of this study was to evaluate the efficacy, safety and complications..."
},
"expectations": {"expected_response": "OBJECTIVE"},
},
]

# Or pandas DataFrame
import pandas as pd

dataset = pd.DataFrame(
{
"inputs": [
{"sentence": "The emergence of HIV as a chronic condition..."},
{"sentence": "This study is designed as a randomised controlled trial..."},
{"sentence": "Both groups showed improvement in symptoms..."},
],
"expectations": [
{"expected_response": "BACKGROUND"},
{"expected_response": "METHODS"},
{"expected_response": "RESULTS"},
],
}
)

4. Optimizer​

Create an optimizer instance for the optimization algorithm. MLflow supports GepaPromptOptimizer and MetaPromptOptimizer. See Choosing Your Optimizer for detailed guidance on which optimizer to use.

python
from mlflow.genai.optimize import GepaPromptOptimizer, MetaPromptOptimizer

# Option 1: GEPA optimizer
optimizer = GepaPromptOptimizer(
reflection_model="openai:/gpt-5", # Powerful model for optimization
max_metric_calls=100,
display_progress_bar=False,
)

# Option 2: Metaprompting optimizer
# Note: Zero-shot vs few-shot is determined by whether you provide
# scorers and train_data to optimize_prompts()
optimizer = MetaPromptOptimizer(
reflection_model="openai:/gpt-5",
guidelines="Optional custom guidelines for optimization",
)

Advanced Usage​

Works with Any Agent Framework​

MLflow's optimization is framework-agnostic—it works seamlessly with LangChain, LangGraph, OpenAI Agent, Pydantic AI, CrewAI, AutoGen, or any custom framework. No need to rewrite your existing agents or switch frameworks.

See these framework-specific guides for detailed examples:

LangChain Logo
LangGraph Logo
OpenAI Agent Logo
Pydantic AI Logo

Using Custom Scorers​

Define custom evaluation metrics to guide optimization:

python
from typing import Any
from mlflow.genai.scorers import scorer


@scorer
def accuracy_scorer(outputs: Any, expectations: dict[str, Any]):
"""Check if output matches expected value."""
return 1.0 if outputs.lower() == expectations.lower() else 0.0


@scorer
def brevity_scorer(outputs: Any):
"""Prefer shorter outputs (max 50 chars)."""
return min(1.0, 50 / max(len(outputs), 1))


# Combine scorers with a weighted objective
def weighted_objective(scores: dict[str, Any]):
return 0.7 * scores["accuracy_scorer"] + 0.3 * scores["brevity_scorer"]


# Use custom scorers
result = mlflow.genai.optimize_prompts(
predict_fn=predict_fn,
train_data=dataset,
prompt_uris=[prompt.uri],
optimizer=GepaPromptOptimizer(reflection_model="openai:/gpt-5"),
scorers=[accuracy_scorer, brevity_scorer],
aggregation=weighted_objective,
)

Custom Optimization Algorithm​

Implement your own optimizer by extending BasePromptOptimizer:

python
from mlflow.genai.optimize import BasePromptOptimizer, PromptOptimizerOutput
from mlflow.genai.scorers import Correctness


class MyCustomOptimizer(BasePromptOptimizer):
def __init__(self, model_name: str):
self.model_name = model_name

def optimize(self, eval_fn, train_data, target_prompts, enable_tracking):
# Your custom optimization logic
optimized_prompts = {}
for prompt_name, prompt_template in target_prompts.items():
# Implement your algorithm
optimized_prompts[prompt_name] = your_optimization_algorithm(
prompt_template, train_data, self.model_name
)

return PromptOptimizerOutput(optimized_prompts=optimized_prompts)


# Use custom optimizer
result = mlflow.genai.optimize_prompts(
predict_fn=predict_fn,
train_data=dataset,
prompt_uris=[prompt.uri],
optimizer=MyCustomOptimizer(model_name="openai:/gpt-5"),
scorers=[Correctness(model="openai:/gpt-5")],
)

Multi-Prompt Optimization​

Optimize multiple prompts together:

python
import mlflow
from mlflow.genai.scorers import Correctness

# Register multiple prompts
plan_prompt = mlflow.genai.register_prompt(
name="plan",
template="Make a plan to answer {{question}}.",
)
answer_prompt = mlflow.genai.register_prompt(
name="answer",
template="Answer {{question}} following the plan: {{plan}}",
)


def predict_fn(question: str) -> str:
plan_prompt = mlflow.genai.load_prompt("prompts:/plan/1")
completion = openai.OpenAI().chat.completions.create(
model="gpt-5", # strong model
messages=[{"role": "user", "content": plan_prompt.format(question=question)}],
)
plan = completion.choices[0].message.content

answer_prompt = mlflow.genai.load_prompt("prompts:/answer/1")
completion = openai.OpenAI().chat.completions.create(
model="gpt-5-mini", # cost efficient model
messages=[
{
"role": "user",
"content": answer_prompt.format(question=question, plan=plan),
}
],
)
return completion.choices[0].message.content


# Optimize both
result = mlflow.genai.optimize_prompts(
predict_fn=predict_fn,
train_data=dataset,
prompt_uris=[plan_prompt.uri, answer_prompt.uri],
optimizer=GepaPromptOptimizer(reflection_model="openai:/gpt-5"),
scorers=[Correctness(model="openai:/gpt-5")],
)

# Access optimized prompts
optimized_plan = result.optimized_prompts[0]
optimized_answer = result.optimized_prompts[1]

Result Object​

The API returns a PromptOptimizationResult object:

python
result = mlflow.genai.optimize_prompts(...)

# Access optimized prompts
for prompt in result.optimized_prompts:
print(f"Name: {prompt.name}")
print(f"Version: {prompt.version}")
print(f"Template: {prompt.template}")
print(f"URI: {prompt.uri}")

# Check optimizer used
print(f"Optimizer: {result.optimizer_name}")

# View evaluation scores (if available)
print(f"Initial score: {result.initial_eval_score}")
print(f"Final score: {result.final_eval_score}")

Common Use Cases​

Improving Accuracy​

Optimize prompts to produce more accurate outputs:

python
from mlflow.genai.scorers import Correctness


result = mlflow.genai.optimize_prompts(
predict_fn=predict_fn,
train_data=dataset,
prompt_uris=[prompt.uri],
optimizer=GepaPromptOptimizer(reflection_model="openai:/gpt-5"),
scorers=[Correctness(model="openai:/gpt-5")],
)

Optimizing for Safeness​

Ensure outputs are safe:

python
from mlflow.genai.scorers import Safety


result = mlflow.genai.optimize_prompts(
predict_fn=predict_fn,
train_data=dataset,
prompt_uris=[prompt.uri],
optimizer=GepaPromptOptimizer(reflection_model="openai:/gpt-5"),
scorers=[Safety(model="openai:/gpt-5")],
)

Model Switching and Migration​

When switching between different language models (e.g., migrating from gpt-5 to gpt-5-mini for cost reduction), you may need to rewrite your prompts to maintain output quality with the new model. The mlflow.genai.optimize_prompts() API can help adapt prompts automatically using your existing application outputs as training data.

See the Auto-rewrite Prompts for New Models guide for a complete model migration workflow.

Troubleshooting​

Issue: Optimization Takes Too Long​

Solution: Reduce dataset size or reduce the optimizer budget:

python
# Use fewer examples
small_dataset = dataset[:20]

# Use faster model for optimization
optimizer = GepaPromptOptimizer(
reflection_model="openai:/gpt-5-mini", max_metric_calls=100
)

Issue: No Improvement Observed​

Solution: Check your evaluation metrics and increase dataset diversity:

  • Ensure scorers accurately measure what you care about
  • Increase training data size and diversity
  • Try to modify optimizer configurations
  • Verify outputs format matches expectations

Issue: Prompts Not Being Used​

Solution: Ensure predict_fn calls PromptVersion.format() during execution:

python
# âś… Correct - loads from registry
def predict_fn(question: str):
prompt = mlflow.genai.load_prompt("prompts:/qa@latest")
return llm_call(prompt.format(question=question))


# ❌ Incorrect - hardcoded prompt
def predict_fn(question: str):
return llm_call(f"Answer: {question}")

See Also​