The overall goal is to bring students up to speed on the most prominent approaches in the theory of deep learning. The course starts with background material on statistical learning theory. Next, it focuses on the theory of neural networks and, specifically, on their approximation, optimization and generalization properties. Deep learning is characterized by phenomena – e.g., benign overfitting and double descent – that seem to defy traditional statistical theory. Thus, one major goal of the course will be to provide a precise, mathematical characterization of such phenomena. Particular emphasis will be placed on gradient descent methods and their implicit bias, which will be analyzed in two regimes: the mean-field regime, and the Neural Tangent Kernel (NTK) regime.

Target group: Interns, PhD students of any year, anyone who is interested.

Prerequisites: Strong background in probability and linear algebra.

Evaluation: None

Teaching format: None

ECTS: 3 Year: 2024

Track segment(s):
Elective

Teacher(s):
Marco Mondelli

Teaching assistant(s):
Simone Bombari