MLflow Go
The MLflow project is a cornerstone of the machine learning community. It is highly flexible and makes machine learning workflows more reproducible, manageable and collaborative.
A key part of MLflow is experiment tracking, the system for logging and querying experiments. Developers can log parameters, metrics, inputs, and artifacts to keep track of different experiment runs, store them into a database, and then compare results.
This experiment tracking in MLflow is written in Python and has performance limitations when dealing with huge volumes of data. At G-Research, 10TB amount of data in 2 months amount of time is not uncommon. In these cases, our researchers can become frustrated with just how long it takes the system to execute operations.
To address this situation, we took apart the component parts of MLflow, and soon saw where the system could be improved – its backend. Writing to the Python-scripted database just wasn’t going to cut it for the sheer number of operations we had to perform. We started to imagine swapping out the experiment tracking with another backend. This train of thought evolved into the project called FastTrackML.
Fasttrack ML
Fasttrack ML was a huge success for G-Research, as its implementation in Go was significantly faster and able to deal with the high volumes of data. Go was chosen for this project because its concurrency model is based on lightweight goroutines and channels, which make it extremely efficient at handling multiple simultaneous tasks (e.g., logging experiment data, handling user requests). Go is a compiled language, and produces native machine code, resulting in faster execution compared to Python, which is interpreted. Go is statically typed which leads to a more memory efficient usage compared to Python's dynamic typing.
The success of FastTrackML, to a certain degree, unlocked the usage of MLflow within G-Research. It allowed quant researchers to process their datasets and experiment tracking in reasonable timeframes.
Back to MLflow
Then the Open Source team at G-Research looked at incorporating the lessons learned from FastTrackML into MLflow itself - the mlflow-go project is a first step in that direction. It is a Python package that is meant to be a drop-in replacement for MLflow's experiment tracking server. Once the mlflow-go package is proven technology, we hope it could be absorbed by MLflow.
Getting started with mlflow-go is as simple as installing MLflow, pip install mlflow-go-backend and then swapping mlflow for mlflow-go when running it from the command line. This will spin up a Go webserver with the exact API as the mlflow REST API.
At the time of this writing, users are required to pass a database connection string as '--backend-store-uri' argument, because the Go implementation is currently database-only. As the intention is to use mlflow-go for its performance, it makes sense to use an actual production worthy database and not the filesystem.
Closing thoughts
Although the backend we’ve built in MLflow-go is more performant for certain big data applications, it doesn’t necessarily mean it will replace the existing Python implementation. As the MLflow maintainers within Databricks are not specifically Golang developers, MLflow-go has to develop its own user base and established community of active contributors before it can become widely adopted.
We found that G-Research really liked MLflow but needed it to be more performant, notably for large datasets. We anticipate that other organizations, especially those with the massive data-processing needs, have run into similar issues. We are keen to learn if our efforts have benefited others, so please let us know how you’re using MLflow and if FastTrackML is faster for you on GitHub or Slack.
