Tensorflow within MLflow
In this guide we will walk you through how to use Tensorflow within MLflow. We will demonstrate how to track your Tensorflow experiments and log your Tensorflow models to MLflow.
Autologging Tensorflow Experiments
Attention
Autologging is only supported when you are using the model.fit()
Keras API to train
the model. Additionally only Tensorflow >= 2.3.0 is supported. If you are using an older version
of Tensorflow or Tensorflow without Keras, please use manual logging.
MLflow can automatically log metrics and parameters from your Tensorflow training. To enable
autologging, simply run mlflow.tensorflow.autolog()
or mlflow.autolog()
.
import mlflow
import numpy as np
import tensorflow as tf
from tensorflow import keras
mlflow.tensorflow.autolog()
# Prepare data for a 2-class classification.
data = np.random.uniform(size=[20, 28, 28, 3])
label = np.random.randint(2, size=20)
model = keras.Sequential(
[
keras.Input([28, 28, 3]),
keras.layers.Conv2D(8, 2),
keras.layers.MaxPool2D(2),
keras.layers.Flatten(),
keras.layers.Dense(2),
keras.layers.Softmax(),
]
)
model.compile(
loss=keras.losses.SparseCategoricalCrossentropy(),
optimizer=keras.optimizers.Adam(0.001),
metrics=[keras.metrics.SparseCategoricalAccuracy()],
)
with mlflow.start_run():
model.fit(data, label, batch_size=5, epochs=2)
What is Logged by Autologging?
By default, autologging logs the following to MLflow:
The model summary as returned by
model.summary()
.Training hyperparamers, e.g., batch size and epochs.
Optimizer configs, e.g., optimizer name and learning rate.
Dataset information.
Training and validation metrics, including loss and any metrics specified in
model.compile()
.Saved model after training completes in the format of TF saved model (compiled graph).
You can customize autologging behavior by passing arguments to mlflow.tensorflow.autolog()
,
for example if you don’t want to log the dataset information, then you can run
mlflow.tensorflow.autolog(log_dataset_info=False)
. Please refer to the API documentation
mlflow.tensorflow.autolog()
for full customization options.
Understanding Autologging
The way we autolog Tensorflow is by registering a custom callback to the Keras model via monkey patch.
Briefly we attach a MLflow callback to the Keras model that works similarly to normal Keras callbacks.
At training start, training parameters including epochs, batch_size, learning_rate and model information
such as model summary will be logged. In addition, the callback will be triggered per every_n_iter
epochs to log the training metrics, and after the training finishes, the trained model will be saved to MLflow.
Logging to MLflow with Keras Callback
As discussed in the previous section, MLflow autologging for Tensorflow is simply using a Keras callback. If you wish to log additional information that isn’t provided by the base autologging implementation via this default callback, you can write your own callback to log custom information.
Using the Predefined Callback
MLflow offers a predefined callback mlflow.tensorflow.MlflowCallback
that you can use or
extend to log information to MLflow. The callback function provides the same functionality as autologging
and is suitable for users willing to have a better control of the experiment. Using mlflow.tensorflow.MlflowCallback
is the same as other Keras callbacks:
with mlflow.start_run():
model.fit(
data,
label,
batch_size=5,
epochs=2,
callbacks=[mlflow.tensorflow.MlflowCallback()],
)
You can change the logging frequency in mlflow.tensorflow.MlflowCallback
by setting
log_every_epoch
and log_every_n_steps
, by default metrics are logged per epoch. Please refer to
the API documentation for more details.
Customizing MLflow Logging
You can also write your own callback to log information to MLflow. To do that, you need to define
a class subclassing from keras.callbacks.Callback,
which provides hooks at various stages of training and validation, e.g., on_epoch_end
and
on_train_end
are called separately at the end of each epoch and when the training is finished.
You can then use the callback in model.fit()
. Here is a simple example for logging the training metrics
in log scale:
from tensorflow import keras
import math
import mlflow
class MlflowCallback(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
logs = logs or {}
for k, v in logs.items():
mlflow.log_metric(f"log_{k}", math.log(v), step=epoch)
At the conclusion of each epoch, the logs
object will contain loss
and metrics
as defined
in model.compile()
. For full documentation of the Keras callback API, please
read keras.callbacks.Callback.
Saving Your Tensorflow Model to MLflow
If you have turned on the autologging, your Tensorflow model will be automatically saved after the training
is done. If you prefer to explicitly save your model, you can instead manually call
mlflow.tensorflow.log_model()
. After saving, you can load back the model using
mlflow.tensorflow.load_model()
. The loaded model can be used for inference by calling
the predict()
method.
import mlflow
import tensorflow as tf
from tensorflow import keras
model = keras.Sequential(
[
keras.Input([28, 28, 3]),
keras.layers.Conv2D(8, 2),
keras.layers.MaxPool2D(2),
keras.layers.Flatten(),
keras.layers.Dense(2),
keras.layers.Softmax(),
]
)
save_path = "model"
with mlflow.start_run() as run:
mlflow.tensorflow.log_model(model, "model")
# Load back the model.
loaded_model = mlflow.tensorflow.load_model(f"runs:/{run.info.run_id}/{save_path}")
print(loaded_model.predict(tf.random.uniform([1, 28, 28, 3])))
Diving into Saving
Under the hood of saving, we are converting the Tensorflow model into a pyfunc model, which is a generic type of model in MLflow. The pyfunc model is saved to MLflow. You don’t need to learn the basics of pyfunc model to use Tensorflow flavor, but if you are interested, please refer to MLflow pyfunc model.
Saving Format
By default, MLflow saves your Tensorflow model in the format of a TF saved model (compiled graph), which is
suitable for deployment across platforms. You can also save your model in other formats, i.e., h5
and
keras
by setting the keras_model_kwargs
parameter in mlflow.tensorflow.log_model()
. For
example, if you want to save your model in h5
format (which only saves model weights instead of the
compiled graph) you can run:
import mlflow
import tensorflow as tf
from tensorflow import keras
model = keras.Sequential(
[
keras.Input([28, 28, 3]),
keras.layers.Conv2D(8, 2),
keras.layers.MaxPool2D(2),
keras.layers.Flatten(),
keras.layers.Dense(2),
keras.layers.Softmax(),
]
)
save_path = "model"
with mlflow.start_run() as run:
mlflow.tensorflow.log_model(
model, "model", keras_model_kwargs={"save_format": "h5"}
)
# Load back the model.
loaded_model = mlflow.tensorflow.load_model(f"runs:/{run.info.run_id}/{save_path}")
print(loaded_model.predict(tf.random.uniform([1, 28, 28, 3])))
For difference between the formats, please refer to Tensorflow Save and Load Guide. Please note that if you want to deploy your model, you will need to save your model in the TF saved model format.
Model Signature
A model signature is a description of a model’s input and output. If you have enabled autologging and provided a dataset, then the signature will be automatically inferred from the dataset. Otherwise, you need to provide a signature in order to have the signature information viewable within the MLflow UI. A model signature will be shown in the MLflow UI as follows:
To manually set the signature for your model, you can pass a signature
parameter to
mlflow.tensorflow.log_model()
. You will need to set the input schema by specifying the dtype
and shape
of the input tensors, and wrap it with mlflow.types.TensorSpec()
. For example,
import mlflow
import tensorflow as tf
import numpy as np
from tensorflow import keras
from mlflow.types import Schema, TensorSpec
from mlflow.models import ModelSignature
model = keras.Sequential([
keras.Input([28, 28, 3]),
keras.layers.Conv2D(8, 2),
keras.layers.MaxPool2D(2),
keras.layers.Flatten(),
keras.layers.Dense(2),
keras.layers.Softmax(),
])
input_schema = Schema(
[
TensorSpec(np.dtype(np.float32), (-1, 28, 28, 3), "input"),
]
)
signature = ModelSignature(inputs=input_schema)
with mlflow.start_run() as run:
mlflow.tensorflow.log_model(model, "model", signature=signature)
# Load back the model.
loaded_model = mlflow.tensorflow.load_model(f"runs:/{run.info.run_id}/{save_path}")
print(loaded_model.predict(tf.random.uniform([1, 28, 28, 3])))
Please note that a model signature is not necessary for loading a model. You can still load the model and perform inferenece if you know the input format. However, it’s a good practice to include the signature for better model understanding.