A generative model for raw audio. • Autoregressive generation. • Built using stack of convolutional layers with residual and skip connections. • Input taken by network is raw audio waveform, after the network process the input, output produced is a waveform sample. • The model is not conditioned i.e. it was not fed the structure of the speech, so it does not generate any meaningful audio. • Experiment - It was trained on human speaking data, the output produced was sounded like human but was really just blabbering, humming and then taking pauses at random times. • After feeding a lot of data, it produced some high quality voice and sentences that makes sense, but it’s very expensive to train. • It took 4 minutes to generate one second of audio.