mlflow.deployments

Exposes functionality for deploying MLflow models to custom serving tools.

Note: model deployment to AWS Sagemaker can currently be performed via the mlflow.sagemaker module. Model deployment to Azure can be performed by using the azureml library.

MLflow does not currently provide built-in support for any other deployment targets, but support for custom targets can be installed via third-party plugins. See a list of known plugins here.

This page largely focuses on the user-facing deployment APIs. For instructions on implementing your own plugin for deployment to a custom serving tool, see plugin docs.

class mlflow.deployments.BaseDeploymentClient(target_uri)[source]

Base class exposing Python model deployment APIs.

Plugin implementors should define target-specific deployment logic via a subclass of BaseDeploymentClient within the plugin module, and customize method docstrings with target-specific information.

Note

Subclasses should raise mlflow.exceptions.MlflowException in error cases (e.g. on failure to deploy a model).

abstract create_deployment(name, model_uri, flavor=None, config=None, endpoint=None)[source]

Deploy a model to the specified target. By default, this method should block until deployment completes (i.e. until it’s possible to perform inference with the deployment). In the case of conflicts (e.g. if it’s not possible to create the specified deployment without due to conflict with an existing deployment), raises a mlflow.exceptions.MlflowException or an HTTPError for remote deployments. See target-specific plugin documentation for additional detail on support for asynchronous deployment and other configuration.

Parameters
  • name – Unique name to use for deployment. If another deployment exists with the same name, raises a mlflow.exceptions.MlflowException

  • model_uri – URI of model to deploy

  • flavor – (optional) Model flavor to deploy. If unspecified, a default flavor will be chosen.

  • config – (optional) Dict containing updated target-specific configuration for the deployment

  • endpoint – (optional) Endpoint to create the deployment under. May not be supported by all targets

Returns

Dict corresponding to created deployment, which must contain the ‘name’ key.

create_endpoint(name, config=None)[source]

Create an endpoint with the specified target. By default, this method should block until creation completes (i.e. until it’s possible to create a deployment within the endpoint). In the case of conflicts (e.g. if it’s not possible to create the specified endpoint due to conflict with an existing endpoint), raises a mlflow.exceptions.MlflowException or an HTTPError for remote deployments. See target-specific plugin documentation for additional detail on support for asynchronous creation and other configuration.

Parameters
  • name – Unique name to use for endpoint. If another endpoint exists with the same name, raises a mlflow.exceptions.MlflowException.

  • config – (optional) Dict containing target-specific configuration for the endpoint.

Returns

Dict corresponding to created endpoint, which must contain the ‘name’ key.

abstract delete_deployment(name, config=None, endpoint=None)[source]

Delete the deployment with name name from the specified target.

Deletion should be idempotent (i.e. deletion should not fail if retried on a non-existent deployment).

Parameters
  • name – Name of deployment to delete

  • config – (optional) dict containing updated target-specific configuration for the deployment

  • endpoint – (optional) Endpoint containing the deployment to delete. May not be supported by all targets

Returns

None

delete_endpoint(endpoint)[source]

Delete the endpoint from the specified target. Deletion should be idempotent (i.e. deletion should not fail if retried on a non-existent deployment).

Parameters

endpoint – Name of endpoint to delete

Returns

None

explain(deployment_name=None, df=None, endpoint=None)[source]

Generate explanations of model predictions on the specified input pandas Dataframe df for the deployed model. Explanation output formats vary by deployment target, and can include details like feature importance for understanding/debugging predictions.

Parameters
  • deployment_name – Name of deployment to predict against

  • df – Pandas DataFrame to use for explaining feature importance in model prediction

  • endpoint – Endpoint to predict against. May not be supported by all targets

Returns

A JSON-able object (pandas dataframe, numpy array, dictionary), or an exception if the implementation is not available in deployment target’s class

abstract get_deployment(name, endpoint=None)[source]

Returns a dictionary describing the specified deployment, throwing either a mlflow.exceptions.MlflowException or an HTTPError for remote deployments if no deployment exists with the provided ID. The dict is guaranteed to contain an ‘name’ key containing the deployment name. The other fields of the returned dictionary and their types may vary across deployment targets.

Parameters
  • name – ID of deployment to fetch.

  • endpoint – (optional) Endpoint containing the deployment to get. May not be supported by all targets.

Returns

A dict corresponding to the retrieved deployment. The dict is guaranteed to contain a ‘name’ key corresponding to the deployment name. The other fields of the returned dictionary and their types may vary across targets.

get_endpoint(endpoint)[source]

Returns a dictionary describing the specified endpoint, throwing a py:class:mlflow.exception.MlflowException or an HTTPError for remote deployments if no endpoint exists with the provided name. The dict is guaranteed to contain an ‘name’ key containing the endpoint name. The other fields of the returned dictionary and their types may vary across targets.

Parameters

endpoint – Name of endpoint to fetch

Returns

A dict corresponding to the retrieved endpoint. The dict is guaranteed to contain a ‘name’ key corresponding to the endpoint name. The other fields of the returned dictionary and their types may vary across targets.

abstract list_deployments(endpoint=None)[source]

List deployments.

This method is expected to return an unpaginated list of all deployments (an alternative would be to return a dict with a ‘deployments’ field containing the actual deployments, with plugins able to specify other fields, e.g. a next_page_token field, in the returned dictionary for pagination, and to accept a pagination_args argument to this method for passing pagination-related args).

Parameters

endpoint – (optional) List deployments in the specified endpoint. May not be supported by all targets

Returns

A list of dicts corresponding to deployments. Each dict is guaranteed to contain a ‘name’ key containing the deployment name. The other fields of the returned dictionary and their types may vary across deployment targets.

list_endpoints()[source]

List endpoints in the specified target. This method is expected to return an unpaginated list of all endpoints (an alternative would be to return a dict with an ‘endpoints’ field containing the actual endpoints, with plugins able to specify other fields, e.g. a next_page_token field, in the returned dictionary for pagination, and to accept a pagination_args argument to this method for passing pagination-related args).

Returns

A list of dicts corresponding to endpoints. Each dict is guaranteed to contain a ‘name’ key containing the endpoint name. The other fields of the returned dictionary and their types may vary across targets.

abstract predict(deployment_name=None, inputs=None, endpoint=None)[source]

Compute predictions on inputs using the specified deployment or model endpoint.

Note that the input/output types of this method match those of mlflow pyfunc predict.

Parameters
  • deployment_name – Name of deployment to predict against.

  • inputs – Input data (or arguments) to pass to the deployment or model endpoint for inference.

  • endpoint – Endpoint to predict against. May not be supported by all targets.

Returns

A mlflow.deployments.PredictionsResponse instance representing the predictions and associated Model Server response metadata.

predict_stream(deployment_name=None, inputs=None, endpoint=None)[source]

Submit a query to a configured provider endpoint, and get streaming response

Parameters
  • deployment_name – Name of deployment to predict against.

  • inputs – The inputs to the query, as a dictionary.

  • endpoint – The name of the endpoint to query.

Returns

An iterator of dictionary containing the response from the endpoint.

abstract update_deployment(name, model_uri=None, flavor=None, config=None, endpoint=None)[source]

Update the deployment with the specified name. You can update the URI of the model, the flavor of the deployed model (in which case the model URI must also be specified), and/or any target-specific attributes of the deployment (via config). By default, this method should block until deployment completes (i.e. until it’s possible to perform inference with the updated deployment). See target-specific plugin documentation for additional detail on support for asynchronous deployment and other configuration.

Parameters
  • name – Unique name of deployment to update.

  • model_uri – URI of a new model to deploy.

  • flavor – (optional) new model flavor to use for deployment. If provided, model_uri must also be specified. If flavor is unspecified but model_uri is specified, a default flavor will be chosen and the deployment will be updated using that flavor.

  • config – (optional) dict containing updated target-specific configuration for the deployment.

  • endpoint – (optional) Endpoint containing the deployment to update. May not be supported by all targets.

Returns

None

update_endpoint(endpoint, config=None)[source]

Update the endpoint with the specified name. You can update any target-specific attributes of the endpoint (via config). By default, this method should block until the update completes (i.e. until it’s possible to create a deployment within the endpoint). See target-specific plugin documentation for additional detail on support for asynchronous update and other configuration.

Parameters
  • endpoint – Unique name of endpoint to update

  • config – (optional) dict containing target-specific configuration for the endpoint

Returns

None

class mlflow.deployments.DatabricksDeploymentClient(target_uri)[source]

Note

Experimental: This class may change or be removed in a future release without warning.

Client for interacting with Databricks serving endpoints.

Example:

First, set up credentials for authentication:

export DATABRICKS_HOST=...
export DATABRICKS_TOKEN=...

See also

See https://docs.databricks.com/en/dev-tools/auth.html for other authentication methods.

Then, create a deployment client and use it to interact with Databricks serving endpoints:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")
endpoints = client.list_endpoints()
assert endpoints == [
    {
        "name": "chat",
        "creator": "alice@company.com",
        "creation_timestamp": 0,
        "last_updated_timestamp": 0,
        "state": {...},
        "config": {...},
        "tags": [...],
        "id": "88fd3f75a0d24b0380ddc40484d7a31b",
    },
]
create_deployment(name, model_uri, flavor=None, config=None, endpoint=None)[source]

Warning

This method is not implemented for DatabricksDeploymentClient.

create_endpoint(name=None, config=None, route_optimized=False)[source]

Note

Experimental: This function may change or be removed in a future release without warning.

Create a new serving endpoint with the provided name and configuration.

See https://docs.databricks.com/api/workspace/servingendpoints/create for request/response schema.

Parameters
  • name

    The name of the serving endpoint to create.

    Warning

    Deprecated. Include name in config instead.

  • config – A dictionary containing either the full API request payload or the configuration of the serving endpoint to create.

  • route_optimized

    A boolean which defines whether databricks serving endpoint is optimized for routing traffic. Only used in the deprecated approach.

    Warning

    Deprecated. Include route_optimized in config instead.

Returns

A DatabricksEndpoint object containing the request response.

Example:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")
endpoint = client.create_endpoint(
    config={
        "name": "test",
        "config": {
            "served_entities": [
                {
                    "external_model": {
                        "name": "gpt-4",
                        "provider": "openai",
                        "task": "llm/v1/chat",
                        "openai_config": {
                            "openai_api_key": "{{secrets/scope/key}}",
                        },
                    },
                }
            ],
            "route_optimized": True,
        },
    },
)
assert endpoint == {
    "name": "test",
    "creator": "alice@company.com",
    "creation_timestamp": 0,
    "last_updated_timestamp": 0,
    "state": {...},
    "config": {...},
    "tags": [...],
    "id": "88fd3f75a0d24b0380ddc40484d7a31b",
    "permission_level": "CAN_MANAGE",
    "route_optimized": False,
    "task": "llm/v1/chat",
    "endpoint_type": "EXTERNAL_MODEL",
    "creator_display_name": "Alice",
    "creator_kind": "User",
}
delete_deployment(name, config=None, endpoint=None)[source]

Warning

This method is not implemented for DatabricksDeploymentClient.

delete_endpoint(endpoint)[source]

Note

Experimental: This function may change or be removed in a future release without warning.

Delete a specified serving endpoint. See https://docs.databricks.com/api/workspace/servingendpoints/delete for request/response schema.

Parameters

endpoint – The name of the serving endpoint to delete.

Returns

A DatabricksEndpoint object containing the request response.

Example:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")
client.delete_endpoint(endpoint="chat")
get_deployment(name, endpoint=None)[source]

Warning

This method is not implemented for DatabricksDeploymentClient.

get_endpoint(endpoint)[source]

Note

Experimental: This function may change or be removed in a future release without warning.

Get a specified serving endpoint. See https://docs.databricks.com/api/workspace/servingendpoints/get for request/response schema.

Parameters

endpoint – The name of the serving endpoint to get.

Returns

A DatabricksEndpoint object containing the request response.

Example:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")
endpoint = client.get_endpoint(endpoint="chat")
assert endpoint == {
    "name": "chat",
    "creator": "alice@company.com",
    "creation_timestamp": 0,
    "last_updated_timestamp": 0,
    "state": {...},
    "config": {...},
    "tags": [...],
    "id": "88fd3f75a0d24b0380ddc40484d7a31b",
}
list_deployments(endpoint=None)[source]

Warning

This method is not implemented for DatabricksDeploymentClient.

list_endpoints()[source]

Note

Experimental: This function may change or be removed in a future release without warning.

Retrieve all serving endpoints.

See https://docs.databricks.com/api/workspace/servingendpoints/list for request/response schema.

Returns

A list of DatabricksEndpoint objects containing the request response.

Example:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")
endpoints = client.list_endpoints()
assert endpoints == [
    {
        "name": "chat",
        "creator": "alice@company.com",
        "creation_timestamp": 0,
        "last_updated_timestamp": 0,
        "state": {...},
        "config": {...},
        "tags": [...],
        "id": "88fd3f75a0d24b0380ddc40484d7a31b",
    },
]
predict(deployment_name=None, inputs=None, endpoint=None)[source]

Note

Experimental: This function may change or be removed in a future release without warning.

Query a serving endpoint with the provided model inputs. See https://docs.databricks.com/api/workspace/servingendpoints/query for request/response schema.

Parameters
  • deployment_name – Unused.

  • inputs – A dictionary containing the model inputs to query.

  • endpoint – The name of the serving endpoint to query.

Returns

A DatabricksEndpoint object containing the query response.

Example:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")
response = client.predict(
    endpoint="chat",
    inputs={
        "messages": [
            {"role": "user", "content": "Hello!"},
        ],
    },
)
assert response == {
    "id": "chatcmpl-8OLm5kfqBAJD8CpsMANESWKpLSLXY",
    "object": "chat.completion",
    "created": 1700814265,
    "model": "gpt-4-0613",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello! How can I assist you today?",
            },
            "finish_reason": "stop",
        }
    ],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 9,
        "total_tokens": 18,
    },
}
predict_stream(deployment_name=None, inputs=None, endpoint=None)Iterator[dict][source]

Note

Experimental: This function may change or be removed in a future release without warning.

Submit a query to a configured provider endpoint, and get streaming response

Parameters
  • deployment_name – Unused.

  • inputs – The inputs to the query, as a dictionary.

  • endpoint – The name of the endpoint to query.

Returns

An iterator of dictionary containing the response from the endpoint.

Example:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")
chunk_iter = client.predict_stream(
    endpoint="databricks-llama-2-70b-chat",
    inputs={
        "messages": [{"role": "user", "content": "Hello!"}],
        "temperature": 0.0,
        "n": 1,
        "max_tokens": 500,
    },
)
for chunk in chunk_iter:
    print(chunk)
    # Example:
    # {
    #     "id": "82a834f5-089d-4fc0-ad6c-db5c7d6a6129",
    #     "object": "chat.completion.chunk",
    #     "created": 1712133837,
    #     "model": "llama-2-70b-chat-030424",
    #     "choices": [
    #         {
    #             "index": 0, "delta": {"role": "assistant", "content": "Hello"},
    #             "finish_reason": None,
    #         }
    #     ],
    #     "usage": {"prompt_tokens": 11, "completion_tokens": 1, "total_tokens": 12},
    # }
update_deployment(name, model_uri=None, flavor=None, config=None, endpoint=None)[source]

Warning

This method is not implemented for DatabricksDeploymentClient.

update_endpoint(endpoint, config=None)[source]

Warning

mlflow.deployments.databricks.DatabricksDeploymentClient.update_endpoint is deprecated. This method will be removed in a future release. Use update_endpoint_config, update_endpoint_tags, update_endpoint_rate_limits, or update_endpoint_ai_gateway instead.

Update a specified serving endpoint with the provided configuration. See https://docs.databricks.com/api/workspace/servingendpoints/updateconfig for request/response schema.

Parameters
  • endpoint – The name of the serving endpoint to update.

  • config – A dictionary containing the configuration of the serving endpoint to update.

Returns

A DatabricksEndpoint object containing the request response.

Example:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")
endpoint = client.update_endpoint(
    endpoint="chat",
    config={
        "served_entities": [
            {
                "name": "test",
                "external_model": {
                    "name": "gpt-4",
                    "provider": "openai",
                    "task": "llm/v1/chat",
                    "openai_config": {
                        "openai_api_key": "{{secrets/scope/key}}",
                    },
                },
            }
        ],
    },
)
assert endpoint == {
    "name": "chat",
    "creator": "alice@company.com",
    "creation_timestamp": 0,
    "last_updated_timestamp": 0,
    "state": {...},
    "config": {...},
    "tags": [...],
    "id": "88fd3f75a0d24b0380ddc40484d7a31b",
}

rate_limits = client.update_endpoint(
    endpoint="chat",
    config={
        "rate_limits": [
            {
                "key": "user",
                "renewal_period": "minute",
                "calls": 10,
            }
        ],
    },
)
assert rate_limits == {
    "rate_limits": [
        {
            "key": "user",
            "renewal_period": "minute",
            "calls": 10,
        }
    ],
}
update_endpoint_ai_gateway(endpoint, config)[source]

Note

Experimental: This function may change or be removed in a future release without warning.

Update the AI Gateway configuration of a specified serving endpoint.

Parameters
  • endpoint (str) – The name of the serving endpoint to update.

  • config (dict) – A dictionary containing the AI Gateway configuration to update.

Returns

A dictionary containing the updated AI Gateway configuration.

Return type

dict

Example:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")
name = "test"

gateway_config = {
    "usage_tracking_config": {"enabled": True},
    "inference_table_config": {
        "enabled": True,
        "catalog_name": "my_catalog",
        "schema_name": "my_schema",
    },
}

updated_gateway = client.update_endpoint_ai_gateway(
    endpoint=name, config=gateway_config
)
assert updated_gateway == {
    "usage_tracking_config": {"enabled": True},
    "inference_table_config": {
        "catalog_name": "my_catalog",
        "schema_name": "my_schema",
        "table_name_prefix": "test",
        "enabled": True,
    },
}
update_endpoint_config(endpoint, config)[source]

Note

Experimental: This function may change or be removed in a future release without warning.

Update the configuration of a specified serving endpoint. See https://docs.databricks.com/api/workspace/servingendpoints/updateconfig for request/response request/response schema.

Parameters
  • endpoint – The name of the serving endpoint to update.

  • config – A dictionary containing the configuration of the serving endpoint to update.

Returns

A DatabricksEndpoint object containing the request response.

Example:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")
updated_endpoint = client.update_endpoint_config(
    endpoint="test",
    config={
        "served_entities": [
            {
                "name": "gpt-4o-mini",
                "external_model": {
                    "name": "gpt-4o-mini",
                    "provider": "openai",
                    "task": "llm/v1/chat",
                    "openai_config": {
                        "openai_api_key": "{{secrets/scope/key}}",
                    },
                },
            }
        ]
    },
)
assert updated_endpoint == {
    "name": "test",
    "creator": "alice@company.com",
    "creation_timestamp": 1729527763000,
    "last_updated_timestamp": 1729530896000,
    "state": {"ready": "READY", "config_update": "NOT_UPDATING"},
    "config": {...},
    "id": "44b258fb39804564b37603d8d14b853e",
    "permission_level": "CAN_MANAGE",
    "route_optimized": False,
    "task": "llm/v1/chat",
    "endpoint_type": "EXTERNAL_MODEL",
    "creator_display_name": "Alice",
    "creator_kind": "User",
}
update_endpoint_rate_limits(endpoint, config)[source]

Note

Experimental: This function may change or be removed in a future release without warning.

Update the rate limits of a specified serving endpoint. See https://docs.databricks.com/api/workspace/servingendpoints/put for request/response schema.

Parameters
  • endpoint – The name of the serving endpoint to update.

  • config – A dictionary containing the updated rate limit configuration.

Returns

A DatabricksEndpoint object containing the updated rate limits.

Example:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")
name = "databricks-dbrx-instruct"
rate_limits = {
    "rate_limits": [{"calls": 10, "key": "endpoint", "renewal_period": "minute"}]
}
updated_rate_limits = client.update_endpoint_rate_limits(
    endpoint=name, config=rate_limits
)
assert updated_rate_limits == {
    "rate_limits": [{"calls": 10, "key": "endpoint", "renewal_period": "minute"}]
}
update_endpoint_tags(endpoint, config)[source]

Note

Experimental: This function may change or be removed in a future release without warning.

Update the tags of a specified serving endpoint. See https://docs.databricks.com/api/workspace/servingendpoints/patch for request/response schema.

Parameters
  • endpoint – The name of the serving endpoint to update.

  • config – A dictionary containing tags to add and/or remove.

Returns

A DatabricksEndpoint object containing the request response.

Example:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")
updated_tags = client.update_endpoint_tags(
    endpoint="test", config={"add_tags": [{"key": "project", "value": "test"}]}
)
assert updated_tags == {"tags": [{"key": "project", "value": "test"}]}
class mlflow.deployments.DatabricksEndpoint[source]

A dictionary-like object representing a Databricks serving endpoint.

endpoint = DatabricksEndpoint(
    {
        "name": "chat",
        "creator": "alice@company.com",
        "creation_timestamp": 0,
        "last_updated_timestamp": 0,
        "state": {...},
        "config": {...},
        "tags": [...],
        "id": "88fd3f75a0d24b0380ddc40484d7a31b",
    }
)
assert endpoint.name == "chat"
class mlflow.deployments.MlflowDeploymentClient(target_uri)[source]

Note

Experimental: This class may change or be removed in a future release without warning.

Client for interacting with the MLflow AI Gateway.

Example:

First, start the MLflow AI Gateway:

mlflow gateway start --config-path path/to/config.yaml

Then, create a client and use it to interact with the server:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("http://localhost:5000")
endpoints = client.list_endpoints()
assert [e.dict() for e in endpoints] == [
    {
        "name": "chat",
        "endpoint_type": "llm/v1/chat",
        "model": {"name": "gpt-4o-mini", "provider": "openai"},
        "endpoint_url": "http://localhost:5000/gateway/chat/invocations",
    },
]
create_deployment(name, model_uri, flavor=None, config=None, endpoint=None)[source]

Warning

This method is not implemented for MlflowDeploymentClient.

create_endpoint(name, config=None)[source]

Warning

This method is not implemented for MlflowDeploymentClient.

delete_deployment(name, config=None, endpoint=None)[source]

Warning

This method is not implemented for MlflowDeploymentClient.

delete_endpoint(endpoint)[source]

Warning

This method is not implemented for MlflowDeploymentClient.

get_deployment(name, endpoint=None)[source]

Warning

This method is not implemented for MLflowDeploymentClient.

get_endpoint(endpoint)Endpoint[source]

Note

Experimental: This function may change or be removed in a future release without warning.

Gets a specified endpoint configured for the MLflow AI Gateway.

Parameters

endpoint – The name of the endpoint to retrieve.

Returns

An Endpoint object representing the endpoint.

Example:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("http://localhost:5000")
endpoint = client.get_endpoint(endpoint="chat")
assert endpoint.dict() == {
    "name": "chat",
    "endpoint_type": "llm/v1/chat",
    "model": {"name": "gpt-4o-mini", "provider": "openai"},
    "endpoint_url": "http://localhost:5000/gateway/chat/invocations",
}
list_deployments(endpoint=None)[source]

Warning

This method is not implemented for MlflowDeploymentClient.

list_endpoints()list[Endpoint][source]

Note

Experimental: This function may change or be removed in a future release without warning.

List endpoints configured for the MLflow AI Gateway.

Returns

A list of Endpoint objects.

Example:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("http://localhost:5000")

endpoints = client.list_endpoints()
assert [e.dict() for e in endpoints] == [
    {
        "name": "chat",
        "endpoint_type": "llm/v1/chat",
        "model": {"name": "gpt-4o-mini", "provider": "openai"},
        "endpoint_url": "http://localhost:5000/gateway/chat/invocations",
    },
]
predict(deployment_name=None, inputs=None, endpoint=None)dict[source]

Note

Experimental: This function may change or be removed in a future release without warning.

Submit a query to a configured provider endpoint.

Parameters
  • deployment_name – Unused.

  • inputs – The inputs to the query, as a dictionary.

  • endpoint – The name of the endpoint to query.

Returns

A dictionary containing the response from the endpoint.

Example:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("http://localhost:5000")

response = client.predict(
    endpoint="chat",
    inputs={"messages": [{"role": "user", "content": "Hello"}]},
)
assert response == {
    "id": "chatcmpl-8OLoQuaeJSLybq3NBoe0w5eyqjGb9",
    "object": "chat.completion",
    "created": 1700814410,
    "model": "gpt-4o-mini",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello! How can I assist you today?",
            },
            "finish_reason": "stop",
        }
    ],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 9,
        "total_tokens": 18,
    },
}

Additional parameters that are valid for a given provider and endpoint configuration can be included with the request as shown below, using an openai completions endpoint request as an example:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("http://localhost:5000")
client.predict(
    endpoint="completions",
    inputs={
        "prompt": "Hello!",
        "temperature": 0.3,
        "max_tokens": 500,
    },
)
update_deployment(name, model_uri=None, flavor=None, config=None, endpoint=None)[source]

Warning

This method is not implemented for MlflowDeploymentClient.

update_endpoint(endpoint, config=None)[source]

Warning

This method is not implemented for MlflowDeploymentClient.

class mlflow.deployments.OpenAIDeploymentClient(target_uri)[source]

Client for interacting with OpenAI endpoints.

Example:

First, set up credentials for authentication:

export OPENAI_API_KEY=...

See also

See https://mlflow.org/docs/latest/python_api/openai/index.html for other authentication methods.

Then, create a deployment client and use it to interact with OpenAI endpoints:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("openai")
client.predict(
    endpoint="gpt-4o-mini",
    inputs={
        "messages": [
            {"role": "user", "content": "Hello!"},
        ],
    },
)
create_deployment(name, model_uri, flavor=None, config=None, endpoint=None)[source]

Warning

This method is not implemented for OpenAIDeploymentClient.

create_endpoint(name, config=None)[source]

Warning

This method is not implemented for OpenAIDeploymentClient.

delete_deployment(name, config=None, endpoint=None)[source]

Warning

This method is not implemented for OpenAIDeploymentClient.

delete_endpoint(endpoint)[source]

Warning

This method is not implemented for OpenAIDeploymentClient.

get_deployment(name, endpoint=None)[source]

Warning

This method is not implemented for OpenAIDeploymentClient.

get_endpoint(endpoint)[source]

Get information about a specific model.

list_deployments(endpoint=None)[source]

Warning

This method is not implemented for OpenAIDeploymentClient.

list_endpoints()[source]

List the currently available models.

predict(deployment_name=None, inputs=None, endpoint=None)[source]

Query an OpenAI endpoint. See https://platform.openai.com/docs/api-reference for more information.

Parameters
  • deployment_name – Unused.

  • inputs – A dictionary containing the model inputs to query.

  • endpoint – The name of the endpoint to query.

Returns

A dictionary containing the model outputs.

update_deployment(name, model_uri=None, flavor=None, config=None, endpoint=None)[source]

Warning

This method is not implemented for OpenAIDeploymentClient.

update_endpoint(endpoint, config=None)[source]

Warning

This method is not implemented for OpenAIDeploymentClient.

mlflow.deployments.get_deploy_client(target_uri=None)[source]

Returns a subclass of mlflow.deployments.BaseDeploymentClient exposing standard APIs for deploying models to the specified target. See available deployment APIs by calling help() on the returned object or viewing docs for mlflow.deployments.BaseDeploymentClient. You can also run mlflow deployments help -t <target-uri> via the CLI for more details on target-specific configuration options.

Parameters

target_uri – Optional URI of target to deploy to. If no target URI is provided, then MLflow will attempt to get the deployments target set via get_deployments_target() or MLFLOW_DEPLOYMENTS_TARGET environment variable.

Example
from mlflow.deployments import get_deploy_client
import pandas as pd

client = get_deploy_client("redisai")
# Deploy the model stored at artifact path 'myModel' under run with ID 'someRunId'. The
# model artifacts are fetched from the current tracking server and then used for deployment.
client.create_deployment("spamDetector", "runs:/someRunId/myModel")
# Load a CSV of emails and score it against our deployment
emails_df = pd.read_csv("...")
prediction_df = client.predict_deployment("spamDetector", emails_df)
# List all deployments, get details of our particular deployment
print(client.list_deployments())
print(client.get_deployment("spamDetector"))
# Update our deployment to serve a different model
client.update_deployment("spamDetector", "runs:/anotherRunId/myModel")
# Delete our deployment
client.delete_deployment("spamDetector")
mlflow.deployments.get_deployments_target()str[source]

Returns the currently set MLflow deployments target iff set. If the deployments target has not been set by using set_deployments_target, an MlflowException is raised.

mlflow.deployments.run_local(target, name, model_uri, flavor=None, config=None)[source]

Deploys the specified model locally, for testing. Note that models deployed locally cannot be managed by other deployment APIs (e.g. update_deployment, delete_deployment, etc).

Parameters
  • target – Target to deploy to.

  • name – Name to use for deployment

  • model_uri – URI of model to deploy

  • flavor – (optional) Model flavor to deploy. If unspecified, a default flavor will be chosen.

  • config – (optional) Dict containing updated target-specific configuration for the deployment

Returns

None

mlflow.deployments.set_deployments_target(target: str)[source]

Sets the target deployment client for MLflow deployments

Parameters

target – The full uri of a running MLflow AI Gateway or, if running on Databricks, “databricks”.

class mlflow.deployments.PredictionsResponse[source]

Represents the predictions and metadata returned in response to a scoring request, such as a REST API request sent to the /invocations endpoint of an MLflow Model Server.

get_predictions(predictions_format='dataframe', dtype=None)[source]

Get the predictions returned from the MLflow Model Server in the specified format.

Parameters
  • predictions_format – The format in which to return the predictions. Either "dataframe" or "ndarray".

  • dtype – The NumPy datatype to which to coerce the predictions. Only used when the “ndarray” predictions_format is specified.

Raises

Exception – If the predictions cannot be represented in the specified format.

Returns

The predictions, represented in the specified format.

to_json(path=None)[source]

Get the JSON representation of the MLflow Predictions Response.

Parameters

path – If specified, the JSON representation is written to this file path.

Returns

If path is unspecified, the JSON representation of the MLflow Predictions Response. Else, None.