In the world of Machine Learning, crafting the perfect solution is only half the battle. The true challenges often emerge during deployment and scaling. My recent project led me to MLFlow—a free & powerful tool designed to optimise the MLOps process.

Add to it Docker: a platform that enables applications to run into ‘containers’ which was a prerequisite of my project.

Join me as I walk you through the architecture I’ve proposed, for good and bad 🙂

System Architecture

The system comprises three major components. Here’s a quick breakdown:

MLFlow Server: A centralised hub for model repo, this server ensures that ML models are stored, versioned, and easily accessible for other components [named: mlflow_server]
API: Powered by FastAPI, this component is key for model serving. It allows manual reloading, fetches fresh training data from the database, and stands as a bridge between our frontend and backend operations [named: api]
Training Container: As the name suggests, this container is responsible for training the ML model. Once the training step is completed, the model is uploaded to the MLFlow server [named: mlops]

Setting Things UP

Before delving into the actual deployment, there are a couple of prerequisites:

Ensure you have docker and docker-compose installed
The Docker server needs to be up and running before initialising the containers

The code can be found in GitHub repo

Conda configs

The ./docker/ directory holds essential Dockerfiles and a docker-compose.yml for container orchestration
For virtual environment setups within containers, configurations are found in ./docker/conda-cfg
Every container utilizes its unique config file. Separate environment-<NAME>.yml files form the corresponding conda-lock-<NAME>.yml files. The run_all.sh script creates these lock files. Lock files are being copied by Docker during container initialisation
Unless your environment alters, these configurations remain maintenance-free

Installation of Docker Containers

Preparing the Environment: After pulling the code from the repository, navigate to the ./docker directory. This is where the magic begins
Deploying the MLflow Server & API Containers: From the ./docker directory, use the command docker compose up –build to build the primary containers. Once the containers are up and active, you can connect to the MLflow server at http://<IP>:5001 and the API at http://<IP>:5002. A successful connection to the MLflow UI or a message stating “Welcome to the API service!” from the API server confirms their operational status.
Launching the MLOps Container: This container is at the heart of model training. Running the command docker compose up mlops –build from the ./docker directory will kickstart the process. The code does not contain any files to initiate the process as I kept the code simple to show the Docker + MLFlow + Conda solution

# From within ./docker directory
docker compose up --build 
docker compose up mlops --build

About docker-compose.yml with declaration of three containers

version: "3.9"

services:
  server_mlflow:
    platform: linux/amd64
    build:
      context: ..
      dockerfile: docker/Dockerfile.server
    container_name: server_mlflow
    ports:
      - "5001:5001"
    volumes:
#      - ../mlartifacts:/mlartifacts
      - ../db:/db
      - ../src:/src
    networks:
      - A

  api:
    platform: linux/amd64
    depends_on:
      - server_mlflow
    build:
      context: ..
      dockerfile: docker/Dockerfile.api
    container_name: api
    ports:
      - "5002:5002"
    volumes:
      - ../src/api:/src/api
    networks:
      - A


  mlops:
    platform: linux/amd64
    depends_on:
      - server_mlflow
    build:
      context: ..
      dockerfile: docker/Dockerfile.mlops
    profiles: ['mlops']
    container_name: mlops
    volumes:
      - ..src/ml:/src/ml
    networks:
      - A

networks:
  A:
    driver: bridge

Container [api] depends on container [mlflow_server] so they start together. Container [mlops] runs occasionally and on demand so we start it separately by defying separate profile with profiles: [‘mlops’].

Docker config files – Dockerfile for MLFlow server

FROM continuumio/miniconda3:23.3.1-0
RUN echo **** Server-Repo ****

## Install jemalloc
#RUN apt-get update &&  apt-get install -y libjemalloc-dev
## Set LD_PRELOAD to use jemalloc
#ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so

ENV TZ=Europe/Warsaw
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

RUN apt-get update && apt-get -y upgrade

#RUN pip install --upgrade pip
RUN pip install conda-lock

COPY docker/conda-cfg/conda-lock-server.yml /tmp/conda-lock.yml

RUN conda-lock install --name c_env /tmp/conda-lock.yml && \
    conda clean -afy && \
    echo "source activate c_env" >> ~/.bashrc

SHELL ["/bin/bash", "--login", "-c"]

RUN find /opt/conda/ -follow -type f -name '*.a' -delete && \
 find /opt/conda/ -follow -type f -name '*.pyc' -delete && \
 find /opt/conda/ -follow -type f -name '*.js.map' -delete

WORKDIR /src

ENV PATH /opt/conda/envs/c_env/bin:$PATH

ENTRYPOINT ["mlflow", "server", \
           "--backend-store-uri", "sqlite:////db/mlruns.db", \
            "-p", "5001", "--host", "0.0.0.0"]

The most important is ENTRYPOINT – each docker has different command to start the service. Also, different conda-lock-<NAME>.yml file is being copied to initiate the container.

An option –backend-store-uri declares location of a database with experiences and stats.

When API is started inside a container and it calls to MLFlow that is also run form inside a docker, we call MLFlow server by container name: server_uri = ‘http://server_mlflow:5001’.

When however you start MLFLow server from inside docker but decide to run API from command line, MLFlow server will be called by IP and not container name

Difference in Dockerfile for API

ENTRYPOINT ["gunicorn", "--chdir", "api", "app:app", \
            "-k", "uvicorn.workers.UvicornWorker",\
             "-b", "0.0.0.0:5002", \
             "--timeout", "120"]

Difference in Dockerfile for MLOps

CMD ["python", "main.py"]

I have created a separate post to document API code that I included in the repository and can be found under /src/api

The Drawbacks of Local Storage in Containers

As you may have noticed, I did not define where MLFlow store artifacts (models) allowing it to store them in the default location.

So, what’s the challenge with such a local storage setup, especially when using MLFlow with Docker?

Contained Model Storage: One of the primary drawbacks is that the models end up being stored inside the Docker container. While Docker does provide the flexibility to store files on the host server where it runs, MLFlow seems to bypass this feature. As a result, the mount point, which typically allows for such external storage, gets overshadowed, making the store exclusively accessible within the container.

For my personal tests I was successful utilising AWS storage. However my project did not allow for extremal storage hence the models are stored inside the container.

In Retrospect

This project marked was my first when I combined MLFlow and Docker. Prior to this, my experiences were rooted in more straightforward projects, with my training primarily centered around basic API and model serving frameworks. As I continue to evolve and learn, I may revisit and update this post, sharing refinements and enhancements to the solution.

Ali Bińkowska

Dockerizing MLFlow with Conda for Efficient ML Workflows

System Architecture

Setting Things UP

Conda configs

Installation of Docker Containers

About docker-compose.yml with declaration of three containers

Docker config files – Dockerfile for MLFlow server

Difference in Dockerfile for API

Difference in Dockerfile for MLOps

The Drawbacks of Local Storage in Containers

In Retrospect

Read Next

Dynamic Model Reloading with FastAPI and MLFlow

Dockerizing MLFlow with Conda for Efficient ML Workflows

System Architecture

Setting Things UP

Conda configs

Installation of Docker Containers

About docker-compose.yml with declaration of three containers

Docker config files – Dockerfile for MLFlow server

Difference in Dockerfile for API

Difference in Dockerfile for MLOps

The Drawbacks of Local Storage in Containers

In Retrospect

Read Next

Dynamic Model Reloading with FastAPI and MLFlow

Sliding Sidebar