Normal Shaped

How could machine learning restore the original quality of compressed audio?

There’s a strong connection between machine learning and compression. It’s most obvious when you examine autoencoders: those are deep neural networks where the input and output are the same size, but the hidden layers get smaller as you approach the center layer, which is called the bottleneck. The network is trained so that the output matches the input exactly despite the requirement for all the data to pass through the bottleneck layer.

If you were to read off the values of the nodes at the bottleneck layer, you’d have what’s called an “embedding” of the original data. That term refers to the reduced dimensionality of the data: 44,100 samples of audio (about one second of sound) is a 44,100-dimensional vector, so if the bottleneck layer of an autoencoding network has only 4,000 nodes, the embedding would be a 4,000-dimensional vector, corresponding to a compression of about 11:1. If such an autoencoder were successfully trained to reproduce the input with an acceptable accuracy, it would represent a 11:1 compression algorithm.

In short, compression and decompression are tasks that neural networks can be very good at. Because of this, they can also do well at the related task of cleaning up poorly-compressed audio. You’d do this with a related model, the denoising autoencoder.

To train one of those, rather than using the clean audio as input and the same clean audio as the target, you would pass audio with compression artifacts as input, but still use the clean audio as the target. It couldn’t perform magic—information that is lost is lost, and the laws of information theory still apply—but you could expect the resulting model to make educated guesses at how to fill in any missing information required to approximate high-quality audio.

Most visual aids I could find online demonstrate the effectiveness and usage of denoising autoencoders do so with images (such as this video), but check that out. The technique extends to audio as well. :)