Introduction for Non-Experts


On this page, you can find my understanding of the role applied mathematics plays in the sciences as well as some of the specific projects I have worked on. I have tried to limit jargon and mathematical prerequisites to a minimum, but in some paragraphs it may be helpful to remember a little high school calculus.

The general idea

By training, I am an applied analyst, which means that I use the tools of calculus (integrals, derivatives, limits) on problems from other scientific fields. Sometimes, probability theory features, too. In my case, most projects have been motivated from machine learning, materials science, or biology. I also have an interest in applications in physics, engineering, chemistry, and social sciences.

What can applied analysis do?

Applied mathematicians work on similar problems as scientists from other fields, but usually from a different perspective. Mathematicians are trained in numerical methods and precise argumentation. They are not usually experts in the topic under investigation, but can apply an arsenal of abstract techniques in different situations. Often, their training includes some background in science, computer science or economics.

A mathematicians' training can be helpful in the sciences in several ways:

  • Coming up with new mathematical models. A mathematical model is useful in predicting outcomes, understanding the effect of a variable in a model, and in setting up simulations when experiments are hard to do or expensive. Without a mathematical model, quantitative predictions are impossible. A good model can influence design decisions, for example for artificial neural networks in computer science or virtually anything tested in simulations by engineers.

  • Simplifying models. Models with many parameters are complicated. Sometimes, when a parameter is very large or small, it may be possible to eliminate it from the original model, or it turns out that a certain variable does not have a major impact on a process. For example, we would interpret a thin sheet as two-dimensional or describe a fluid by a few variables (velocity, pressure, viscosity) rather than by the positions of all its water molecules.

  • Understanding the properties of mathematical models. Many situations are described by different models - on different scales (quantum, molecular, continuum), at different temperatures, or just because different scientists came up with different models. It can be hard to decide which model to use, and their predictions may vary. To decide which model is most suitable suitable, a solid mathematical understanding of the model and the effects it incorporates may be crucial. Unlike a simulation, a mathematical proof cannot be argued with.

  • Understanding how to use mathematical models. Complex mathematical models can be hard to use in practice. Choosing efficient and stable ways of computation so that the result of a simulation can be trusted is not easy - overly optimistic choices lead to the break down of the program (bad) or return inaccurate results without warning labels (worse), while too conservative choices are computationally expensive or even unfeasible. Clever algorithms can improve performance drastically without any loss of accuracy, and knowing how large computational errors may be is necessary to interpret results.

  • Applying mathematical models. Most applied mathematical research focuses on developing new tools, understanding techniques and objects more deeply, or adapting tools to different situations. Outside of that immediate context, an applied mathematician may use known techniques in real life applications to implement simulations, analyze data, or conduct statistical analyses.

An applied mathematician is a team player who takes inspiration from other sciences and returns useful results to the scientific community of various fields.

Why analysis?

Sir Isaac Newton invented calculus to describe and solve problems in science, specifically physics. Many natural phenomena are well described by balancing rates of change, which are expressed mathematically as derivatives. The classical example would be balancing acceleration (time derivative of velocity) and velocity (time derivative of position) with an external force in a suitable way. A (much) more complicated example is balancing reaction rates of chemical compounds (time derivative of concentration) and their diffusion rates in space (a sort of second spatial derivative).

In other applications, we want to find the minimum of a function (for example the parameters that reduce an error function in machine learning), which is related to first and second derivatives (e.g. gradient descent-based algorithms in machine learning).

For me personally, analysis was one of my favorite undergraduate courses. It has many beautiful connections to almost all areas of mathematics and science.

Foundations of Deep Learning

Deep Learning refers to machine learning using deep neural networks. Artificial neural network models had been proposed as a computational tool as early as the 1950s, but due to limitations in computing power and early setbacks using shallow neural networks, they did not receive much attention for a time, despite occasional bursts of activity. Breakthroughs in the 2010s on novel architectures and computing on graphics cards for faster performance revitalized the field.

Modern neural networks commonly have millions of parameters, and cutting edge models in tasks like natural language processing can have billions (or even trillions) of parameters. Tuning these parameters or, in the parlance of deep learning, 'training the network' requires massive data sets and enormous computational resources.

In the current age of supply chain disruptions and increasing energy costs, understanding the foundations of deep learning becomes more important than ever. Only if we truly understand models can we improve them in order to reduce training times and reduce costs, both financial and environmental.

A graphical representation of some topics I am interested in and their connections can be found here.

Themes in my research

There have been a few distinct themes in my research. My current focus is on the foundations of deep learning and it use in data science.

  • Artificial neural networks. In classical applications of mathematics in the sciences, we build a mathematical model from physical principles. In the field of machine learning on the other hand, we provide a computer with vast amounts of data and let it build its own model. Since the computer comes up with the model, this is sometimes referred to as 'artificial intelligence'. Machine learning is particularly useful when the objective is hard to capture precisely, for example to recognize whether a picture contains a bird. We may not be able to write down a precise formula for this, but given enough images, a computer may be able to recognize recurring patterns.

A typical challenge in machine learning is the high dimensionality of the data - for example a color image with 256x256 pixels has 3x256x256 = 196,608 dimensions, where we should think of dimension as the number of independent variables, not dimensions of physical space.

As a starting point, we give the computer a function of a specific form (a 'model') where it only optimizes ('trains') certain parameters. Artificial neural networks are a specific structure to build such a function which is inspired by the architecture of the human brain. Neural networks have a layered structure which can help to decompose one complicated operation into several simpler steps. Modern networks can have hundreds or even thousands of layers ('deep learning') and trillions of parameters. They have proven very successful in applications and deal particularly well with high dimensions, but there are many open questions about why they work so well, and how to make them work even better.

In my research, I combine tools of mathematical analysis and simulations to further investigate and demonstrate effects in practice.

  • Lipid bilayers. Lipid bilayers are thin biological membranes. They are highly dynamic and best thought of as liquid, which means they can flow to accommodate different shapes. Due to this, their elastic behavior is unusual as there is no stretching, only bending.

  • Crystal dislocations. We think of crystalline materials as highly ordered materials where atoms sit at certain places in a periodic lattice. Real crystals, on the other hand, have defects which have an immense impact on their mechanical and electric properties. They occur naturally when a crystal is formed, for example because the material that it is formed from is not pure, or because it starts solidifying at two places at once and looks like a nice grid around both places, but where the grids meet, they have an angle.

In particular, I am interested in a type of grid defect that lives along curves in three dimensions, so called crystal dislocations. The motion of dislocations through a crystalline material is the main mechanism behind its plastic deformation: As we bend a material, it will first snap back into its original position when we remove the force (elastic behavior). As we bend it more, it will eventually react differently and not return to the original shape (plastic behavior). In the elastic regime, the crystal grid stays in tact, while in the plastic regime, we change which atoms are neighboring in the grid and distort the underlying material. This is most easily accomplished by shifting dislocations through the crystal.

  • Phase-field models. Some of my work has been on a specific type of mathematical model called phase field models. Here, instead of looking at a lower-dimensional object (a curve or a surface in space) we give it a small positive thickness. Phase field models are often more stable and easier to work with, but we pay for that convenience by having to run our computations one dimension higher. However, in realistic models where bulk effects (i.e. processes inside a cell) are coupled to surface effects (exchange over and diffusion on a membrane), this is always the case.

A few words on my publications

This is a selected list of a few projects of mine.

  • Neural networks are very expressive function models. In practice, this means that they usually have the capacity to memorize a data set rather than uncovering a meaningful structure in it. In Part I and Part II of an article on Stochastic gradient descent with noise of machine learning type, I study training algorithms, analyzing how they find parameters that fit the training data, and which parameters they choose from all the parameters that perform well on known data. This is crucial when we rely on purely on a the training procedure to find a neural network which understands the pattern in data rather than memorize a few known data samples. In Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality I show how neural networks interpolate between known data samples if they are given an explicit preference towards networks which have low complexity in a specific way.

  • It has been understood for almost thirty years that neural networks with even one hidden layer can approximate basically any reasonable function very well. However, we show in Kolmogorov Width Decay and Poor Approximators in Machine Learning that some parameters must be huge to approximate even many fairly nice functions. When this is the case, we demonstrate in Can Shallow Neural Networks Beat the Curse of Dimensionality? that it can take a very long time to properly train these models in high dimension. However, in On the Convergence of Gradient Descent Training for Two-layer ReLU-networks in the Mean Field Regime I discuss a condition under which training eventually converges (even if slowly).

  • More generally, we show in Kolmogorov Width Decay and Poor Approximators in Machine Learning that machine models which can be learned using few data points (a desirable statistical property) are not able to solve some learning problems well in high dimension (i.e. there is a trade off between learnability and expressiveness of a model). Even very deep neural networks are unable to overcome this restriction in high dimension.

  • At the moment, there is little theoretical justification for how to choose the hyperparameters (e.g. number of layers and neurons per layer) of a neural network. In Representation formulas and pointwise properties for Barron functions we describe all functions which can be expressed by two-layer networks of infinite width, but with coefficient bounds which are favorable from the perspective of statistical learning theory. In particular, we give an easy to check criterion to see that increasing the number of neurons in the hidden layer does not help and we require a deeper network.

  • In A priori estimates for classification problems using neural networks, we show that we can guarantee certain error bounds for classification problems under some conditions (e.g. when deciding whether a 256x256 image contains a bird). As one would imagine: when a problem is easier to solve, the error is smaller. The analysis also tells us something about how to proceed in practice to achieve good performance.

To see how these articles fit into the larger framework of deep learning, see the following graphical representation.

A brief summary of my articles on deep learning. For co-authors and links, please see the linked articles above. The full-size image is available here.

Much of my work before researching artificial neural networks was set motivated by geometric problems motivated by materials science.

  • The main article of my PhD thesis Phase Field Models for Thin Elastic Structures with Topological Constraint concerns phase-field simulations for lipid bilayers. While such simulations are notoriously hard to perform if we treat the surface as two-dimensional, the phase-field approach has the drawback that one membrane can break up into multiple pieces. In this article, we develop a method which avoids this issue efficiently and cheaply. Some relevant mathematical theory for this purpose is developed in Uniform Regularity and Convergence of Phase Fields for Willmore's Energy and the implementation is described in Keeping it together.

  • In Approximation of the relaxed perimeter functional under a connectedness constraint by phase-fields we use the method we previously developed to keep lipid bilayers connected and apply it in a different situation to a problem in image segmentation. More precisely, we work on identifying an object in an image which may be partially obscured and appear to have two or more separate parts.

  • In Confined elasticae and the buckling of cylindrical shells I consider the following problem: Take two pipes, one inside of the other. If the outer one contracts enough (because of temperature, or in a biological system), the inner one has two choices: To compress and remain tangent to the outer one, or to buckle away. I find a precise criterion in terms of material parameters for whether compression or buckling is favorable.

  • In The Effect of Forest Dislocations on the Evolution of a Phase-Field Model for Crystal Dislocations we show that if we take a certain engineering model for crystal dislocations (link) which looks at dislocations in a single plane of the crystal grid, but accounts for dislocations penetrating that plane e.g. orthogonally, we see evidence of plastic rather than elastic behavior. More precisely, the presence of the 'forest dislocations' makes it harder to increase slip, but does not make it easier to decrease it. Physically, this may be linked to a phenomenon known as 'cross hardening' where deforming a crystal in one direction (generating forest dislocations) makes it harder to deform it in the orthogonal direction.

  • In a more recent project The motion of curved dislocations in three dimensions: Simplified linearized elasticity we look at fully three-dimensional dislocations (in a simplified model) and attempt to understand their motion (not interacting with any other dislocations). Essentially, we conclude that their limiting behavior should be such that they become shorter in the most efficient way.