In the world of Machine Learning, crafting the perfect solution is only half the battle. The true challenges often emerge during deployment and scaling. My recent project led me to MLFlow—a free & powerful tool designed to optimise the MLOps process.
Add to it Docker: a platform that enables applications to run into ‘containers’ which was a prerequisite of my project.
Join me as I walk you through the architecture I’ve proposed, for good and bad 🙂
System Architecture
The system comprises three major components. Here’s a quick breakdown:
- MLFlow Server: A centralised hub for model repo, this server ensures that ML models are stored, versioned, and easily accessible for other components [named: mlflow_server]
- API: Powered by FastAPI, this component is key for model serving. It allows manual reloading, fetches fresh training data from the database, and stands as a bridge between our frontend and backend operations [named: api]
- Training Container: As the name suggests, this container is responsible for training the ML model. Once the training step is completed, the model is uploaded to the MLFlow server [named: mlops]
Setting Things UP
Before delving into the actual deployment, there are a couple of prerequisites:
- Ensure you have docker and docker-compose installed
- The Docker server needs to be up and running before initialising the containers
The code can be found in GitHub repo
Conda configs
- The
./docker/
directory holds essential Dockerfiles and adocker-compose.yml
for container orchestration - For virtual environment setups within containers, configurations are found in
./docker/conda-cfg
- Every container utilizes its unique config file. Separate
environment-<NAME>.yml
files form the correspondingconda-lock-<NAME>.yml
files. Therun_all.sh
script creates these lock files. Lock files are being copied by Docker during container initialisation - Unless your environment alters, these configurations remain maintenance-free
Installation of Docker Containers
- Preparing the Environment: After pulling the code from the repository, navigate to the ./docker directory. This is where the magic begins
- Deploying the MLflow Server & API Containers: From the ./docker directory, use the command docker compose up –build to build the primary containers. Once the containers are up and active, you can connect to the MLflow server at http://<IP>:5001 and the API at http://<IP>:5002. A successful connection to the MLflow UI or a message stating “Welcome to the API service!” from the API server confirms their operational status.
- Launching the MLOps Container: This container is at the heart of model training. Running the command docker compose up mlops –build from the ./docker directory will kickstart the process. The code does not contain any files to initiate the process as I kept the code simple to show the Docker + MLFlow + Conda solution
# From within ./docker directory
docker compose up --build
docker compose up mlops --build
About docker-compose.yml with declaration of three containers
version: "3.9"
services:
server_mlflow:
platform: linux/amd64
build:
context: ..
dockerfile: docker/Dockerfile.server
container_name: server_mlflow
ports:
- "5001:5001"
volumes:
# - ../mlartifacts:/mlartifacts
- ../db:/db
- ../src:/src
networks:
- A
api:
platform: linux/amd64
depends_on:
- server_mlflow
build:
context: ..
dockerfile: docker/Dockerfile.api
container_name: api
ports:
- "5002:5002"
volumes:
- ../src/api:/src/api
networks:
- A
mlops:
platform: linux/amd64
depends_on:
- server_mlflow
build:
context: ..
dockerfile: docker/Dockerfile.mlops
profiles: ['mlops']
container_name: mlops
volumes:
- ..src/ml:/src/ml
networks:
- A
networks:
A:
driver: bridge
Container [api] depends on container [mlflow_server] so they start together. Container [mlops] runs occasionally and on demand so we start it separately by defying separate profile with profiles: [‘mlops’].
Docker config files – Dockerfile for MLFlow server
FROM continuumio/miniconda3:23.3.1-0
RUN echo **** Server-Repo ****
## Install jemalloc
#RUN apt-get update && apt-get install -y libjemalloc-dev
## Set LD_PRELOAD to use jemalloc
#ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so
ENV TZ=Europe/Warsaw
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN apt-get update && apt-get -y upgrade
#RUN pip install --upgrade pip
RUN pip install conda-lock
COPY docker/conda-cfg/conda-lock-server.yml /tmp/conda-lock.yml
RUN conda-lock install --name c_env /tmp/conda-lock.yml && \
conda clean -afy && \
echo "source activate c_env" >> ~/.bashrc
SHELL ["/bin/bash", "--login", "-c"]
RUN find /opt/conda/ -follow -type f -name '*.a' -delete && \
find /opt/conda/ -follow -type f -name '*.pyc' -delete && \
find /opt/conda/ -follow -type f -name '*.js.map' -delete
WORKDIR /src
ENV PATH /opt/conda/envs/c_env/bin:$PATH
ENTRYPOINT ["mlflow", "server", \
"--backend-store-uri", "sqlite:////db/mlruns.db", \
"-p", "5001", "--host", "0.0.0.0"]
The most important is ENTRYPOINT – each docker has different command to start the service. Also, different conda-lock-<NAME>.yml file is being copied to initiate the container.
An option –backend-store-uri declares location of a database with experiences and stats.
When API is started inside a container and it calls to MLFlow that is also run form inside a docker, we call MLFlow server by container name: server_uri = ‘http://server_mlflow:5001’.
When however you start MLFLow server from inside docker but decide to run API from command line, MLFlow server will be called by IP and not container name
Difference in Dockerfile for API
ENTRYPOINT ["gunicorn", "--chdir", "api", "app:app", \
"-k", "uvicorn.workers.UvicornWorker",\
"-b", "0.0.0.0:5002", \
"--timeout", "120"]
Difference in Dockerfile for MLOps
CMD ["python", "main.py"]
I have created a separate post to document API code that I included in the repository and can be found under /src/api
The Drawbacks of Local Storage in Containers
As you may have noticed, I did not define where MLFlow store artifacts (models) allowing it to store them in the default location.
So, what’s the challenge with such a local storage setup, especially when using MLFlow with Docker?
Contained Model Storage: One of the primary drawbacks is that the models end up being stored inside the Docker container. While Docker does provide the flexibility to store files on the host server where it runs, MLFlow seems to bypass this feature. As a result, the mount point, which typically allows for such external storage, gets overshadowed, making the store exclusively accessible within the container.
For my personal tests I was successful utilising AWS storage. However my project did not allow for extremal storage hence the models are stored inside the container.
In Retrospect
This project marked was my first when I combined MLFlow and Docker. Prior to this, my experiences were rooted in more straightforward projects, with my training primarily centered around basic API and model serving frameworks. As I continue to evolve and learn, I may revisit and update this post, sharing refinements and enhancements to the solution.