mlflow.evaluation

class mlflow.evaluation.Assessment(name: str, source: Optional[mlflow.entities.assessment_source.AssessmentSource] = None, value: Optional[Union[bool, float, str]] = None, rationale: Optional[str] = None, metadata: Optional[dict] = None, error_code: Optional[str] = None, error_message: Optional[str] = None)[source]

Note

Experimental: This class may change or be removed in a future release without warning.

Assessment data associated with an evaluation result.

Assessment is an enriched output from the evaluation that provides more context, such as the rationale, source, and metadata for the evaluation result.

Example:

from mlflow.evaluation import Assessment

assessment = Assessment(
    name="answer_correctness",
    value=0.5,
    rationale="The answer is partially correct.",
)
property error_code

The error code.

property error_message

The error message.

classmethod from_dictionary(assessment_dict: dict)mlflow.evaluation.assessment.Assessment[source]

Create an Assessment object from a dictionary.

Parameters

assessment_dict (dict) – Dictionary containing assessment information.

Returns

The Assessment object created from the dictionary.

Return type

Assessment

property metadata

The metadata associated with the assessment.

property name

The name of the assessment.

property rationale

The rationale / justification for the assessment.

property source

The source of the assessment.

to_dictionary()dict[source]
property value

The assessment value.

class mlflow.evaluation.AssessmentSource(source_type: str, source_id: str, metadata: Optional[dict] = None)[source]

Note

Experimental: This class may change or be removed in a future release without warning.

Source of an assessment (human, LLM as a judge with GPT-4, etc).

classmethod from_dictionary(source_dict: dict)mlflow.entities.assessment_source.AssessmentSource[source]

Create a AssessmentSource object from a dictionary.

Parameters

source_dict (dict) – Dictionary containing assessment source information.

Returns

The AssessmentSource object created from the dictionary.

Return type

AssessmentSource

property metadata

The additional metadata about the source.

property source_id

The identifier for the source.

property source_type

The type of the assessment source.

to_dictionary()dict[source]
class mlflow.evaluation.AssessmentSourceType(source_type: str)[source]

Note

Experimental: This class may change or be removed in a future release without warning.

AI_JUDGE = 'AI_JUDGE'
CODE = 'CODE'
HUMAN = 'HUMAN'
class mlflow.evaluation.Evaluation(inputs: dict, outputs: Optional[dict] = None, inputs_id: Optional[str] = None, request_id: Optional[str] = None, targets: Optional[dict] = None, error_code: Optional[str] = None, error_message: Optional[str] = None, assessments: Optional[list] = None, metrics: Optional[Union[dict, list]] = None, tags: Optional[dict] = None)[source]

Note

Experimental: This class may change or be removed in a future release without warning.

Evaluation result data.

property assessments

The evaluation assessments.

property error_code

The evaluation error code.

property error_message

The evaluation error message.

classmethod from_dictionary(evaluation_dict: dict)[source]

Create an Evaluation object from a dictionary.

Parameters

evaluation_dict (dict) – Dictionary containing evaluation information.

Returns

The Evaluation object created from the dictionary.

Return type

Evaluation

property inputs

The evaluation inputs.

property inputs_id

The evaluation inputs ID.

property metrics

The evaluation metrics.

property outputs

The evaluation outputs.

property request_id

The evaluation request ID.

property tags

The evaluation tags.

property targets

The evaluation targets.

to_dictionary()dict[source]

Convert the Evaluation object to a dictionary.

Returns

The Evaluation object represented as a dictionary.

Return type

dict

mlflow.evaluation.log_evaluations(*, evaluations: list, run_id: Optional[str] = None)list[source]

Note

Experimental: This function may change or be removed in a future release without warning.

Logs one or more evaluations to an MLflow Run.

Parameters
  • evaluations (List[Evaluation]) – List of one or more MLflow Evaluation objects.

  • run_id (Optional[str]) – ID of the MLflow Run to log the evaluation. If unspecified, the current active run is used, or a new run is started.

Returns

The logged Evaluation objects.

Return type

List[EvaluationEntity]