Introduction to Translation with Transformers and MLflow
In this tutorial, we delve into the world of language translation by leveraging the power of Transformers and MLflow. This guide is crafted for practitioners with a grasp of machine learning concepts who seek to streamline their translation model workflows. We will showcase the use of MLflow to log, manage, and serve a cutting-edge translation model - the google/flan-t5-base
from the 🤗 Hugging Face library.
Learning Objectives
Throughout this tutorial, you will:
Construct a translation pipeline using
flan-t5-base
from the Transformers library.Log the translation model and its configurations using MLflow.
Determine the input and output signature of the translation model automatically.
Retrieve a logged translation model from MLflow for direct interaction.
Emulate the deployment of the translation model using MLflow’s pyfunc model flavor for language translation tasks.
By the conclusion of this tutorial, you’ll gain a thorough insight into managing and deploying translation models with MLflow, thereby enhancing your machine learning operations for language processing.
Why was this model chosen?
The flan-t5-base offers a few benefits:
Size: it’s a relatively small model for the comparatively powerful performance.
Enhanced Language Coverage: Expanding on the original T5 model, the flan-t5 has a much larger breadth of languages that it supports.
Setting Up the Translation Environment
Begin by setting up the essential components for translation tasks using the google/flan-t5-base model.
Importing Libraries
We import the transformers
library for access to the translation model and tokenizer. Additionally, mlflow
is included for model tracking and management, creating a comprehensive environment for our translation tasks.
Initializing the Model
The google/flan-t5-base
model, known for its translation effectiveness, is loaded from the Hugging Face repository. This pre-trained model is a key component of our setup.
Setting Up the Tokenizer
We initialize the tokenizer corresponding to our model. The tokenizer plays a critical role in processing text input, making it understandable for the model.
Creating the Pipeline
A translation pipeline for English to French is created. This pipeline streamlines the process, allowing us to focus on inputting text and receiving translations without managing model and tokenizer interactions directly.
[1]:
# Disable tokenizers warnings when constructing pipelines
%env TOKENIZERS_PARALLELISM=false
import warnings
# Disable a few less-than-useful UserWarnings from setuptools and pydantic
warnings.filterwarnings("ignore", category=UserWarning)
env: TOKENIZERS_PARALLELISM=false
[2]:
import transformers
import mlflow
model_architecture = "google/flan-t5-base"
translation_pipeline = transformers.pipeline(
task="translation_en_to_fr",
model=transformers.T5ForConditionalGeneration.from_pretrained(
model_architecture, max_length=1000
),
tokenizer=transformers.T5TokenizerFast.from_pretrained(model_architecture, return_tensors="pt"),
)
Testing the Translation Pipeline
We perform a preliminary check on our translation pipeline to ensure its proper functioning before logging it with MLflow.
Model Verification
A test translation allows us to verify that the model accurately translates text, in this case from English to French, ensuring the model’s basic functionality.
Error Prevention
Identifying potential issues before model logging helps prevent future errors during deployment or inference, saving time and resources.
Resource Management
Testing minimizes wasteful use of resources, particularly important given the large size of these models and the resources needed to save and load them.
Pipeline Validation
This step confirms that both the model and tokenizer in the pipeline are correctly configured and capable of processing the input as expected.
[3]:
# Evaluate the pipeline on a sample sentence prior to logging
translation_pipeline(
"translate English to French: I enjoyed my slow saunter along the Champs-Élysées."
)
[3]:
[{'translation_text': "J'ai apprécié mon sajour lente sur les Champs-Élysées."}]
Evaluating the Translation Results
Upon running our initial translation through the pipeline, we observed that the output, while generally accurate, exhibited areas for improvement.
The initial translation output was:
`[{'translation_text': "J'ai apprécié mon sajour lente sur les Champs-Élysées."}]`
This translation captures the essence of the original English sentence but shows minor grammatical errors and word choice issues. For instance, a more refined translation might be:
`"J'ai apprécié ma lente promenade le long des Champs-Élysées."`
This version corrects grammatical gender and adds necessary articles, accentuation, and hyphenation. These subtle nuances enhance the translation quality significantly. The base model’s performance is encouraging, indicating the potential for more precise translations with further fine-tuning and context. MLflow’s tracking and management capabilities will be instrumental in monitoring the iterative improvements of such models.
In summary, while the pursuit of perfection in machine translation is ongoing, the initial results are a promising step towards achieving natural and accurate translations.
Setting Model Parameters and Inferring Signature
We establish crucial model parameters and infer the signature to ensure consistency and reliability in our model’s deployment.
Defining Model Parameters
Setting key parameters like max_length
is vital for controlling model behavior during inference. For example, a max_length
of 1000 ensures the model handles longer sentences effectively, crucial for maintaining context in translations.
Importance of Inferring Signature
The signature, defining the model’s input and output schema, is critical for MLflow’s understanding of the expected data structures. By inferring this signature, we document the types and structures of data that the model works with, enhancing its reliability and portability.
Benefits of This Process
Enhanced Portability: Properly defined parameters and signatures make the model more adaptable to different environments.
Error Reduction: This step minimizes the risk of encountering schema-related errors during deployment.
Clear Documentation: It serves as a clear guide for developers and users, simplifying the model’s integration into applications.
By establishing these parameters and signature, we lay a robust foundation for our model’s subsequent tracking, management, and serving via MLflow.
[4]:
# Define the parameters that we are permitting to be used at inference time, along with their default values if not overridden
model_params = {"max_length": 1000}
# Generate the model signature by providing an input, the expected output, and (optionally), parameters available for overriding at inference time
signature = mlflow.models.infer_signature(
"This is a sample input sentence.",
mlflow.transformers.generate_signature_output(translation_pipeline, "This is another sample."),
params=model_params,
)
Reviewing the Model Signature
After configuring the translation model and inferring its signature, it’s crucial to review the signature to confirm it matches our model’s input and output structures.
The model signature serves as a blueprint for MLflow to interact with the model, encompassing:
Inputs: The expected input types, such as a string for the text to be translated.
Outputs: The output data types, which in our case is a string representing the translated text.
Parameters: Additional configurable settings like
max_length
, determining the maximum length of the translation output.
Reviewing the signature through the signature
command allows us to validate the data formats and ensure that our model will function as expected when deployed. This step is vital for consistent model performance and avoiding errors in a production environment. Furthermore, the inclusion of parameters in the signature with default values ensures that any modifications during inference are deliberate and well-documented, contributing to the model’s predictability and transparency.
[5]:
# Visualize the model signature
signature
[5]:
inputs:
[string]
outputs:
[string]
params:
['max_length': long (default: 1000)]
Creating an experiment
We create a new MLflow Experiment so that the run we’re going to log our model to does not log to the default experiment and instead has its own contextually relevant entry.
[6]:
# If you are running this tutorial in local mode, leave the next line commented out.
# Otherwise, uncomment the following line and set your tracking uri to your local or remote tracking server.
# mlflow.set_tracking_uri("http://127.0.0.1:8080")
mlflow.set_experiment("Translation")
[6]:
<Experiment: artifact_location='file:///Users/benjamin.wilson/repos/mlflow-fork/mlflow/docs/source/llms/transformers/tutorials/translation/mlruns/996217394074032926', creation_time=1701286351921, experiment_id='996217394074032926', last_update_time=1701286351921, lifecycle_stage='active', name='Translation', tags={}>
Logging the Model with MLflow
We are now set to log our translation model with MLflow, ensuring its trackability and version control.
Starting an MLflow Run
We initiate the logging process by starting an MLflow run. This encapsulates all the model information, including artifacts and parameters, within a unique run ID.
Using mlflow.transformers.log_model
This function is integral to logging our model in MLflow. It records various aspects of the model:
Model Pipeline: The complete translation model pipeline, encompassing the model and tokenizer.
Artifact Path: The directory path in the MLflow run where the model artifacts are stored.
Model Signature: The pre-defined signature indicating the model’s expected input-output formats.
Model Parameters: Key parameters of the model, like
max_length
, providing insights into model behavior.
Outcome of Model Logging
Post logging, we obtain the model_info
object, which encompasses all the essential metadata about the logged model, such as its storage location. This metadata is vital for future deployment and performance analysis.
[7]:
with mlflow.start_run():
model_info = mlflow.transformers.log_model(
transformers_model=translation_pipeline,
artifact_path="french_translator",
signature=signature,
model_params=model_params,
)
Inspecting the Loaded Model Components
After loading the model from MLflow, we delve into its individual components to verify their setup and functionality.
Component Breakdown
The loaded model comprises several key components, each playing a crucial role in its operation:
Task: Defines the model’s specific use-case, confirming its suitability for the intended task.
Device Map: Details the hardware configuration, important for performance optimization.
Model Instance: The core
T5ForConditionalGeneration
model, central to the translation process.Tokenizer: The
T5TokenizerFast
, responsible for processing text inputs into a format understandable by the model.Framework: Indicates the underlying deep learning framework, essential for compatibility considerations.
Ensuring Component Integrity and Functionality
Inspecting these components ensures that:
The model aligns with our task requirements.
Hardware resources are optimally utilized.
Text inputs are correctly preprocessed for model consumption.
The model’s compatibility with the selected deep learning framework is confirmed.
This verification step is vital for the successful application of the model in practical scenarios, reinforcing the robustness and flexibility of MLflow.
[8]:
# Load our saved model as a dictionary of components, comprising the model itself, the tokenizer, and any other components that were saved
translation_components = mlflow.transformers.load_model(
model_info.model_uri, return_type="components"
)
# Show the components that made up our pipeline that we saved and what type each are
for key, value in translation_components.items():
print(f"{key} -> {type(value).__name__}")
2023/11/30 12:00:44 INFO mlflow.transformers: 'runs:/2357c12ca17a4f328b2f72cbb7d70343/french_translator' resolved as 'file:///Users/benjamin.wilson/repos/mlflow-fork/mlflow/docs/source/llms/transformers/tutorials/translation/mlruns/996217394074032926/2357c12ca17a4f328b2f72cbb7d70343/artifacts/french_translator'
task -> str
device_map -> str
model -> T5ForConditionalGeneration
tokenizer -> T5TokenizerFast
framework -> str
Understanding Model Flavors in MLflow
The model_info.flavors
attribute in MLflow provides insights into the model’s capabilities and deployment requirements across various platforms.
Flavors in MLflow represent different ways the model can be utilized and deployed. Key aspects include:
Python Function Flavor: Indicates the model’s compatibility as a generic Python function, including model binary, loader module, Python version, and environment specifications.
Transformers Flavor: Tailored for models from the Hugging Face Transformers library, covering transformers version, code dependencies, task, instance type, source model name, pipeline model type, framework, tokenizer type, components, and model binary.
This information guides how to interact with the model within MLflow, ensuring proper deployment with the right environment and dependencies, whether for inference or further model refinement.
[9]:
# Show the model parameters that were saved with our model to gain an understanding of what is recorded when saving a transformers pipeline
model_info.flavors
[9]:
{'python_function': {'model_binary': 'model',
'loader_module': 'mlflow.transformers',
'python_version': '3.8.13',
'env': {'conda': 'conda.yaml', 'virtualenv': 'python_env.yaml'}},
'transformers': {'transformers_version': '4.34.1',
'code': None,
'task': 'translation_en_to_fr',
'instance_type': 'TranslationPipeline',
'source_model_name': 'google/flan-t5-base',
'pipeline_model_type': 'T5ForConditionalGeneration',
'framework': 'pt',
'tokenizer_type': 'T5TokenizerFast',
'components': ['tokenizer'],
'model_binary': 'model'}}
Evaluating the Translation Output
After testing our pipeline with a challenging sentence, we assess the translation’s accuracy.
Assessing Translation Nuances
The model impressively interprets “Nice” correctly as a city name, rather than an adjective. This shows its ability to discern context and proper nouns. Furthermore, it cleverly substitutes the English play on words with the French adjective “bien,” maintaining the sentence’s intended sentiment.
Contextual Understanding
This translation exemplifies the model’s strength in understanding context and language subtleties, which is essential for practical applications where precision and contextual accuracy are key.
The Importance of Rigorous Testing
Testing with linguistically complex sentences is vital. It ensures the model can handle various linguistic challenges, an important aspect of deploying models in real-world scenarios.
[10]:
# Load our saved model as a transformers pipeline and validate the performance for a simple translation task
translation_pipeline = mlflow.transformers.load_model(model_info.model_uri)
response = translation_pipeline("I have heard that Nice is nice this time of year.")
print(response)
2023/11/30 12:00:45 INFO mlflow.transformers: 'runs:/2357c12ca17a4f328b2f72cbb7d70343/french_translator' resolved as 'file:///Users/benjamin.wilson/repos/mlflow-fork/mlflow/docs/source/llms/transformers/tutorials/translation/mlruns/996217394074032926/2357c12ca17a4f328b2f72cbb7d70343/artifacts/french_translator'
[{'translation_text': "J'ai entendu que Nice est bien cette période de l'année."}]
Assessing the Reconstructed Pipeline’s Translation
We now evaluate the performance of a pipeline reconstructed from loaded components.
Reconstruction and Testing
Using the dictionary of loaded components, we successfully reconstruct a new translation pipeline. This step is essential to confirm that our logged and retrieved components function cohesively when reassembled.
Translation Quality
The reconstructed pipeline adeptly translates English into French, maintaining both the syntactic accuracy and semantic coherence of the original sentence. This reflects the Transformers library’s ability to simplify the utilization of complex deep learning models.
Verifying Model Integrity
This test is key in verifying that the saved model and its components are not only retrievable but also function effectively post-deployment. It ensures the continued integrity and performance of the model in practical applications.
[11]:
# Verify that the components that we loaded can be constructed into a pipeline manually
reconstructed_pipeline = transformers.pipeline(**translation_components)
reconstructed_response = reconstructed_pipeline(
"transformers makes using Deep Learning models easy and fun!"
)
print(reconstructed_response)
[{'translation_text': "transformers simplifie l'utilisation des modèles de l'apprentissage profonde!"}]
Direct Utilization of Model Components
Explore the granular control over individual model components for custom translation processes.
Component Interaction
Interacting with the model’s individual components offers a deeper level of customization. This approach is particularly beneficial when integrating the model into larger systems or when specific manipulations of inputs and outputs are required.
Insight into Model Structure
Examining the keys of the translation_components
dictionary reveals the structure and components of our model. This includes the task, device mapping, core model, tokenizer, and framework information, each crucial for the translation process.
Benefits of Component-Level Control
Utilizing components individually allows for precise adjustments and custom integrations. It’s an effective way to tailor the translation process to specific needs, ensuring more control over the model’s behavior and output.
[12]:
# View the components that were saved with our model
translation_components.keys()
[12]:
dict_keys(['task', 'device_map', 'model', 'tokenizer', 'framework'])
Advanced Usage: Direct Interaction with Model Components
Direct interaction with a model’s components offers flexibility and control for advanced use cases in translation.
Using the model and tokenizer components directly, as opposed to the higher-level pipeline, allows for:
Customization of the tokenization process.
Specific tensor handling, including device specification (CPU, GPU, MPS, etc.).
Generation and adjustment of predictions on-the-fly.
Decoding outputs with options for post-processing.
This approach provides granular control, enabling interventions in the model’s operations, such as dynamic input adjustments or output post-processing. However, it also increases complexity, requiring a deeper understanding of the model and tokenizer and the management of more code. Opting for direct interaction over the pipeline means balancing ease of use against the level of control required for your application. It’s a critical decision, especially for advanced scenarios demanding precise manipulation of the translation process.
[13]:
# Access the individual components from the components dictionary
tokenizer = translation_components["tokenizer"]
model = translation_components["model"]
query = "Translate to French: Liberty, equality, fraternity, or death."
# This notebook was run on a Mac laptop, so we'll send the output tensor to the "mps" device.
# If you're running this on a different system, ensure that you're sending the tensor output to the appropriate device to ensure that
# the model is able to read it from memory.
inputs = tokenizer.encode(query, return_tensors="pt").to("mps")
outputs = model.generate(inputs).to("mps")
result = tokenizer.decode(outputs[0])
# Since we're not using a pipeline here, we need to modify the output slightly to get only the translated text.
print(result.replace("<pad> ", "\n").replace("</s>", ""))
La liberté, l'égalité, la fraternité ou la mort.
Tutorial Recap
This tutorial provided insights into combining MLflow with advanced language translation models, emphasizing streamlined model management and deployment.
We explored several key aspects:
- Setting up and testing a translation pipeline using Transformers.
- Logging the model and its configurations to MLflow for effective versioning and tracking.
- Inferring and examining the model's signature for ensuring input and output consistency.
- Interacting with logged model components for enhanced flexibility in deployment.
- Discussing the nuances of language translation and the role of context in achieving accurate results.
The Power of MLflow and Model Metadata
MLflow’s integration proved instrumental in managing and deploying the translation model. The tutorial highlighted how MLflow’s metadata, including the model’s signature and flavors, aids in consistent and reliable deployment, catering to production needs.
Reflection on the Translation Output
The final translation output, while not exact, captured the essence of the iconic French motto, highlighting the model’s effectiveness and the importance of contextual understanding in translations. Further exploration on the cultural significance of the phrase can be found on its Wikipedia Page.