Compulsory courses, M2 First semester, courses
Computational science
Statistical learning and modeling of complex systems
Summary of the course:
Complex systems formed by a large number of degrees of freedom can display a surprisingly rich set of different metastable structures, connected among them by time evolution with a hierarchy of timescales. Despite the variety of transformation phenomena observed in physics, chemistry, and biology – including phase transitions, reactions and protein folding – statistical mechanics and machine learning provide today very general frameworks to study them all. This course introduces computational methods for the analysis, modeling and simulation of complex systems. At the same time, modern neural networks are themselves complex systems made of many interacting degrees of freedom, exhibiting collective phenomena and phase transitions in how they learn, represent, and generate data. The same statistical-mechanics tools used to describe physical and biological systems are useful to illuminate the inner workings of these models.
The first part focuses on statistical inference and machine learning, with emphasis on Bayesian inference, information theory, high-dimensional statistics, maximum-entropy models, energy-based models, regression, representation learning, autoencoders, variational autoencoders, diffusion models and transformers. Several of these models — notably energy-based models and Boltzmann machines — are inspired by statistical-mechanics systems, and the course examines the learning process itself through the lens of phase transitions, as in the retarded learning transition in PCA. The lectures are complemented by practical sessions in Python, including applications to generative modeling and dimensionality reduction.
The second part of the course develops simulation algorithms to generate atomic trajectories of complex systems, together with data-driven approaches to learn stochastic models from the trajectories. Topics include molecular dynamics and Monte Carlo simulations, kinetic theories, order parameters, free-energy landscapes, Langevin models, and master equations. Practical computer sessions will allow the students to apply the theoretical methods on realistic trajectories of systems formed by thousands of atoms, investigating phenomena like protein folding and crystal nucleation.
![]() |
![]() |
| Jorge Fernandez-de-Cossio-Diaz | Fabio Pietrucci |
|---|---|
| IPhT, CEA | Sorbonne Université |

First part : Statistical inference and machine learning
• Information theory: Entropy, KL divergence, Mutual information
• High-dimensional statistics, asymptotic inference. Retarded learning phase transition in PCA.
• Maximum Entropy principle. Energy based models: Boltzmann machines.
• Restricted Boltzmann machines. Representation learning. Relation between RBMs, Hopfield models, and PCA.
• Linear regression, regularization, variance-bias decomposition.
• Autoencoders. How autoencoders implement PCA. Variational autoencoders.
• Modern generative architectures: diffusion models & transformers
Practical sessions:
• Intro to Pytorch, Autoencoders, Variational Autoencoders, Ensemble methods, Dimensional reduction, Clustering, Classifiers.
Second part : Simulation methods and dynamical models
Lectures:
• Trajectories of complex systems (biomolecules, nanostructure complexes, solutions): local equilibrium, hierarchy of timescales, barriers.
• Simulation algorithms: molecular dynamics, Metropolis Monte Carlo, enhanced sampling.
• Projection on order parameters: metastability, relaxation as entropy production, correlation functions, free-energy landscapes, kinetic rate theories.
• Analytical forms and intuitive meaning of order parameters in different fields.
• Accuracy of Langevin models of projected trajectories: the meaning of friction and noise, Markovian and overdamped approximations, reading noise in the original trajectory.
• Parametrization of Langevin models: Kramers-Moyal vs likelihood maximization, which type of data and how much do we need?
• Machine-learning the optimal order parameter: committor function, kinetic variational principle, entropic variational principle.
• Parametrization of master equations: partitioning configurations via binning or clustering, meaning of the main eigenvalues and eigenvectors of the transition matrix, network of kinetic rates connecting metastable states.
• Introduction to a set of articles for the bibliographic project.
Practical sessions:
• Association and dissociation of complexes in solution, dynamics of homopolymer chains, folding of the Trp-cage protein, crystal nucleation from the liquid, chemical reactions in solution.
Comments are closed





