Mixture Density Networks (MDN) are a very special type of neural network, which are typically employed on data with a lot of noise or when the relationship between features and labels is one-to-many. Unlike a traditional neural network, which produces a point-estimate equal to the mode of the learned conditional distribution P(Y|X), an MDN predicts the parameters of a gaussian mixture model representing the same distribution. This allows MDNs to handle cases where the conditional probabilities are multi-modal in nature. In this example, we will explore how to build and train an MDN from scratch using the Flux and Distributions packages.