What are artificial neural networks?

The class of artificial neural networks is a collection of parametrized functions. In the simplest case, such a function is given by

f(x) = A^(L) o sigma o A^(L-1) o sigma o ... o sigma o A^(1) x

where x in R^d is the input and the A^l : R^(d_(l-1)) -> R^(d_l) are affine linear maps. d_0 = d is the input dimension and d_L the output dimension.

  • The parameters of the function are the entries of the affine linear maps A^(l). We refer to them as weights (for the linear part) and biases (for the affine shift).

  • The function sigma : R-> R is called the activation function of the neural network. By an abuse of notation, we also consider sigma : R^k -> R^k with sigma(z_1, ..., z_k) = ( sigma(z_1), ..., sigma(z_k) ).

  • The architecture of a neural network refers to the choice of neural network type (we have chosen a standard feedforward network), the depth L, the choice of widths = intermediate dimensions d_l and the activation function, i.e. all important information which is not weights or biases.