Daniel Cortild: “Do what you truly enjoy”

Daniel Cortild: “Do what you truly enjoy”

Nederlandse vertaling

Daniel Cortild investigates why classic learning algorithms like the Stochastic Gradient Descent truly work. His fresh theoretical insights help explain what practitioners have observed for years.

What makes your research relevant?
That’s a broad question, but I think there are two sides to it. On the one hand, learning algorithms such as Stochastic Gradient Descent (SGD) are everywhere now – they’re essential for modern machine learning. On the other hand, understanding why and when they work is still surprisingly incomplete.
My research contributes to that theoretical understanding. Many algorithms perform well in practice, but the mathematical guarantees behind them are often missing. If we can prove under which conditions they really converge or remain stable, we can identify which methods are reliable – and where we might still be missing something. So in a way, the theory helps guide the practice.

You felt there was a gap in existing research?
Yes. Most analyses of SGD make very strong assumptions about the variance of the stochastic gradients – assumptions that are almost never satisfied in real-world machine learning. So we wanted to see what happens if you remove those assumptions entirely. The gap was precisely there: between what works in theory under idealized conditions, and what actually happens in practice.

And you worked on this together with a PhD student?
Yes, exactly. During my Master’s I collaborated closely with a PhD student who was studying a similar problem. We worked side by side and exchanged ideas constantly. It was a very natural and productive collaboration.

What challenges did you encounter while trying to remove those assumptions?
It was extremely challenging. Many of the standard mathematical tools used to analyse SGD rely directly on those strong assumptions. Once you remove them, a lot of familiar inequalities and techniques simply don’t apply anymore.
The difficulty was figuring out which tools we still had access to, and how far we could push them. Eventually we realised that by using these remaining tools very carefully, we could still prove new, tight bounds for SGD. They’re not identical to the classical results, but they come surprisingly close — showing that meaningful theoretical guarantees are possible even without the usual assumptions.

How can your results be applied?
SGD itself is not new – it’s been around since the 1950s – but it remains the foundation of most modern machine learning. Our results don’t change how people implement SGD, but they explain why certain things happen when we use it. That deeper understanding can help design new algorithms inspired by SGD but more complex, for which theoretical guarantees are still lacking.
In addition, our analysis shows that the algorithm can actually work well with larger step sizes than previously thought, which could make training faster in practice. So it’s a small but concrete improvement.

Where does your fascination for mathematics and optimization come from?
I’ve always been drawn to mathematics because it allows you to build entire worlds from a few basic rules. You start with a handful of assumptions, and from there you explore what must logically follow — it’s a beautiful, self-contained system.
Optimization attracted me in particular because it sits between theory and application. It’s very abstract, but it directly connects to real problems — from the electricity field to machine learning. I like that balance: working on something that’s fundamentally mathematical, yet useful in practice.

Would you ever want to move toward more applied work?
Maybe later. For now I’m mainly focused on theory, but I do like the idea that the mathematics I work on has applications. Knowing that my results might one day improve real systems makes the theoretical work more meaningful.

What does the future hold?
My Master’s thesis has been turned into a research paper, which feels like closing a chapter. But the methods we developed can also be applied within other scenario’s, so the collaboration with my co-author continues in some form.
At Oxford I’m now starting a PhD in optimization, focusing on the complexity average-case analysis of deterministic algorithms — in other words, understanding why certain algorithms perform better in practice than their theoretical guarantees predict. That’s something I’m really curious about.

And after Oxford?
I’m keeping an open mind. I’d love to stay close to research, whether in academia or in a research-oriented company. What matters to me is working on problems that are intellectually challenging and have a clear purpose. I’m less drawn to the start-up world — it’s usually more about rapid development than deep research — but who knows what the future brings.

Finally, do you have any advice for younger students?
Yes: do what you truly enjoy. Don’t choose your studies only for job prospects — there will always be jobs. What matters most is that you enjoy what you’re doing every day. I’ve followed that philosophy myself, and it has made my studies and research very rewarding.