mlflow.evaluation

class mlflow.evaluation.Assessment(name: str, source: Optional[mlflow.entities.assessment_source.AssessmentSource] = None, value: Optional[Union[bool, float, str]] = None, rationale: Optional[str] = None, metadata: Optional[dict] = None, error_code: Optional[str] = None, error_message: Optional[str] = None)[source]

Note

Experimental: This class may change or be removed in a future release without warning.

Assessment data associated with an evaluation result.

Assessment is an enriched output from the evaluation that provides more context, such as the rationale, source, and metadata for the evaluation result.

Example:

from mlflow.evaluation import Assessment

assessment = Assessment(
    name="answer_correctness",
    value=0.5,
    rationale="The answer is partially correct.",
)

property error_code: The error code.

property error_message: The error message.

classmethod from_dictionary(assessment_dict: dict) → mlflow.evaluation.assessment.Assessment[source]

Create an Assessment object from a dictionary.

Parameters: assessment_dict (dict) – Dictionary containing assessment information.
Returns: The Assessment object created from the dictionary.
Return type: Assessment

property metadata: The metadata associated with the assessment.

property name: The name of the assessment.

property rationale: The rationale / justification for the assessment.

property source: The source of the assessment.

to_dictionary() → dict[source]

property value: The assessment value.

class mlflow.evaluation.AssessmentSource(source_type: str, source_id: str, metadata: Optional[dict] = None)[source]

Note

Experimental: This class may change or be removed in a future release without warning.

Source of an assessment (human, LLM as a judge with GPT-4, etc).

classmethod from_dictionary(source_dict: dict) → mlflow.entities.assessment_source.AssessmentSource[source]

Create a AssessmentSource object from a dictionary.

Parameters: source_dict (dict) – Dictionary containing assessment source information.
Returns: The AssessmentSource object created from the dictionary.
Return type: AssessmentSource

property metadata: The additional metadata about the source.

property source_id: The identifier for the source.

property source_type: The type of the assessment source.

to_dictionary() → dict[source]

class mlflow.evaluation.AssessmentSourceType(source_type: str)[source]

Note

Experimental: This class may change or be removed in a future release without warning.

AI_JUDGE = 'AI_JUDGE'

CODE = 'CODE'

HUMAN = 'HUMAN'

class mlflow.evaluation.Evaluation(inputs: dict, outputs: Optional[dict] = None, inputs_id: Optional[str] = None, request_id: Optional[str] = None, targets: Optional[dict] = None, error_code: Optional[str] = None, error_message: Optional[str] = None, assessments: Optional[list] = None, metrics: Optional[Union[dict, list]] = None, tags: Optional[dict] = None)[source]

Note

Experimental: This class may change or be removed in a future release without warning.

Evaluation result data.

property assessments: The evaluation assessments.

property error_code: The evaluation error code.

property error_message: The evaluation error message.

classmethod from_dictionary(evaluation_dict: dict)[source]

Create an Evaluation object from a dictionary.

Parameters: evaluation_dict (dict) – Dictionary containing evaluation information.
Returns: The Evaluation object created from the dictionary.
Return type: Evaluation

property inputs: The evaluation inputs.

property inputs_id: The evaluation inputs ID.

property metrics: The evaluation metrics.

property outputs: The evaluation outputs.

property request_id: The evaluation request ID.

property tags: The evaluation tags.

property targets: The evaluation targets.

to_dictionary() → dict[source]

Convert the Evaluation object to a dictionary.

Returns: The Evaluation object represented as a dictionary.
Return type: dict

mlflow.evaluation.log_evaluations(*, evaluations: list, run_id: Optional[str] = None) → list[source]

Note

Experimental: This function may change or be removed in a future release without warning.

Logs one or more evaluations to an MLflow Run.

Parameters

evaluations (List[Evaluation]) – List of one or more MLflow Evaluation objects.
run_id (Optional[str]) – ID of the MLflow Run to log the evaluation. If unspecified, the current active run is used, or a new run is started.

Returns

The logged Evaluation objects.

Return type

List[EvaluationEntity]