Master Thesis: Reinforcement Learning for Optimizing Compute Clusters

Yann HOFFMANN

16 Oct 2024 — 2 min read

Main Contributions

Generalization of Q-learning to multi-objective environments
Implementation of a proof-of-concept on synthetic examples
Dashboard visualization of performance over time

Preview

Abstract

Hyperparameter tuning of complex systems is a problem that relies on many decision-making algorithms and hand-crafted heuristics. Developers are often called upon to tune these hyperparameters by hand using their best judgment. But over time, systems experience a natural drift and configurations become obsolete. The maintenance of such systems is expensive, does not scale well, and calls for what is known as autonomic computing — the ability of networks to tune themselves without outside intervention.
In this thesis, we explore the use of reinforcement learning (RL) as an autonomic computing solution for complex systems. We use reinforcement learning as a zeroth-order optimizer for systems abstracted as black boxes. Implementing RL agents is a two-step process that comprises tuning the underlying model and training the agent on the task. When properly executed, the RL agents are capable of successfully enhancing a wide range of heuristics and even creating new ones from scratch. Moreover, we show that decision tree approximators can be used on top of Q-learning to handle large state spaces or improve the ability of agents to generalize from unknown samples. Several adjustments were made to usual RL training algorithms to accommodate for decision tree ensembles in an online learning setting. The latter part of the thesis sets out to build a faithful simulation for streaming processing, an instance of a complex system. The simulation follows sound concurrency principles and decouples the environment from the agent by using a message-passing library. Once in the simulation, the pipelines are optimized by a series of agents subject this time to a multi-objective reward function. Throughput and latency are chosen as the competing metrics to optimize for. Using this formulation, we introduce the multi-dimensional equivalent of Q-learning, namely ◦Q-learning. After which we study the behavior of agents in the dynamic environment and rank them according to their performance. Agents are shown to converge quickly, adapt to the environment drift, and are robust to outside disturbances.

Full Thesis

thesis_final

thesis_final.pdf

1 MB

State of Research in Luxembourg (April 2025)

Luxembourg is betting big on research. Out of 6,000 students enrolled at university, around 600 are doctoral researchers. The country has developed a large number of research centers around its university. SnT, one of its biggest, is located on the Kirchberg university campus and employs 500 people. All this

Language Models Can Reduce Asymmetry in Information Markets – Slides

Language Models Can Reduce Asymmetry in Information Markets In this presentation, I explore how language models (LMs) can help address information asymmetry in digital marketplaces. The core challenge in such markets is that buyers need to inspect information before purchasing, while sellers must limit access to prevent unauthorized use. This

Analyzing Multi-Dimensional Data for Clinical Tests: A Practical Case Study

Purpose: The purpose of this article is to give a technical overview of the steps that can be taken to analyze temporal, multi-dimensional data to apply classification, clustering, and sequence analysis in the context of a clinical test. Introduction: In the spring of 2023 before starting my PhD, I worked

Using Compute Power to Iterate Faster Through Machine Learning Experiments

I have created a Python template that makes running ML experiments on AWS convenient so you too can take advantage of massive cloud compute for your projects. This cuts down the experiment cycle time, allows you to batch multiple experiments together, and boosts overall productivity. Motivation Waiting for scripts to