Designing a generic audio API

Designing a generic audio API

Digital audio is omnipresent in our lives, from your cellphone’s ringtone to streaming music on Spotify. The main audio formats were created in the early 90s stood the test of time and survived through technology advancements. During the first part of this talk will take a look at the anatomy of these popular formats, why they were designed like that and what makes them durable. Then we will talk about the challenges and lessons learned designing and implementing a generic Go API for digital audio. This talk was designed for curious attendees of all levels. The focus on audio format is mainly there as an excuse for us to reason about API design in an unfamiliar/unbiased context.

C69521d6e22fc0bbd69337ec8b1698df?s=128

Matt Aimonetti

November 05, 2016
Tweet

Transcript

  1. M U S I C M A D E B

    E T T E R D E S I G N I N G A G E N E R I C AU D I O A P I M AT T A I M O N E T T I @ m a t tet t i
  2. Francesc Campoy Flores stuck in Chicago on Nov 3rd 2016

  3. C TO / C O - F O U N

    D E R }
  4. Music creation 
 Bedroom producers are the new rockstars. 


    Remixing & collaboration is trending. 
 And It’s not just electronic music.
  5. Audio consumption

  6. Designing an API, what’s a design? Design (Oxford dictionary): To

    form a plan to arrange or conceive in the mind... For later execution
  7. Audio Signal Basics

  8. Sampling continuous signal discrete signal

  9. Sampling amount of red dots per sec = sample rate

    resolution of each sample = bit depth
  10. Sampling sampled the analog signal 44100 times in a second

    each sample is on a scale from -32,768 through 32,767 CD resolution 44,100 Hz 16bit
  11. Nyquist–Shannon sampling theorem Harry Nyquist Claude Shannon

  12. Sampling Humans can hear frequencies from 20Hz to 20kHz

  13. 162 (16 bit integer) bytes: A2 00 base 2: 1010001000000000

    (little endian) sound.wav Digital storage
  14. audio file reader-writer

  15. Design is a process • Goal: consume audio stream •

    Desiderata: • unified API • fast / low memory usage • extensible
  16. sound.wav Digital storage xs := 162, 49, -23, 0, 99,

    9241, 0, 0, 32400 func effect(xs []int) { for i := 0; i < len(xs)-1; i++ { xs[i] = (xs[i] + xs[i+1])/2 } } xs
  17. sound.wav WAV format data properties chunk file properties chunk PCM

    data chunk
  18. NumChannels ! audio frame = point in time value across

    all channels
  19. NumChannels

  20. Exemplars ex·em·plar /iɡˈzemplər,iɡˈzemplär/ noun a person or thing serving as

    a typical example or excellent model. "he became the leading exemplar of conservative philosophy" Few designs are all-new • even novel designs derive from earlier artifacts intended for similar purposes • The besetting of expert designers is not designing the thing wrong, but designing the wrong thing. - Fred Brooks
  21. wavefile = wave.open('sound.wav', 'rb') nChannels = file.getnchannels() length = waveFile.getnframes()

    for i in range(0,length): waveBytes = waveFile.readframes(1) // I need to hand decode the values wavefile.close() python std lib
  22. rate, data = scipy.io.wavfile.read('sound.wav') for i in range(len(data)): print data[i][0]

    // how about other channels python sci py
  23. data formats ☠ 8, 16, 24, 32, 64 bit integers

    (signed or not) or 32/64 floats or bytes
  24. sf_count_t sf_read_short (SNDFILE *sndfile, short *ptr, sf_count_t items); sf_count_t sf_read_int

    (SNDFILE *sndfile, int *ptr, sf_count_t items); sf_count_t sf_read_float (SNDFILE *sndfile, float *ptr, sf_count_t items); sf_count_t sf_read_double (SNDFILE *sndfile, double *ptr, sf_count_t items); sf_count_t sf_readf_short (SNDFILE *sndfile, short *ptr, sf_count_t frames); sf_count_t sf_readf_int (SNDFILE *sndfile, int *ptr, sf_count_t frames); sf_count_t sf_readf_float (SNDFILE *sndfile, float *ptr, sf_count_t frames); sf_count_t sf_readf_double (SNDFILE *sndfile, double *ptr, sf_count_t frames); C: libsndfile
  25. API design constraints • different sample rate values (44.1kHz, 96kHz..)

    • different bit depth values (8bit, 16bit…) • different literal types (int, unsigned int, float, bytes) • different channel values (mono, stereo…)
  26. API design challenges • different encapsulations (wave, aiff…) • processing

    might require type conversion • different usages (playback, info, conversion, manipulation, analysis…) • has to be fast
  27. struct AudioBuffer { UInt32 mNumberChannels; UInt32 mDataByteSize; void* mData; };

    Objective-C: Core Audio
  28. overall design

  29. the buffer interactions

  30. // PCMBuffer encapsulates uncompressed audio data // and provides useful

    methods to read/manipulate this PCM data. type PCMBuffer struct { // Format describes the format of the buffer data. Format *Format // Ints is a store for audio sample data as integers. Ints []int // Floats is a store for audio samples data as float64. Floats []float64 // Bytes is a store for audio samples data as raw bytes. Bytes []byte // DataType indicates the primary format used for the underlying data. // The consumer of the buffer might want to look at this //value to know what store to use to optimally retrieve data. DataType DataFormat // framePos is the position of the last frame we read framePos int64 }
  31. // Format is a high level representation of the underlying

    data. type Format struct { // NumChannels is the number of channels contained in the data NumChannels int // SampleRate is the sampling rate in Hz SampleRate int // BitDepth is the number of bits of data for each sample BitDepth int // Endianness indicate how the byte order of underlying bytes Endianness binary.ByteOrder }
  32. freq := audio.RootA fs := 44100 // generate a sine

    wave osc := generator.NewOsc(generator.WaveSine, freq, fs) data := osc.Signal(fs * 4) buf := audio.NewPCMFloatBuffer(data, audio.FormatMono4410016bBE) // drop the sample rate if err := transforms.Decimate(buf, *factorFlag); err != nil { panic(err) } fmt.Println("bit crushing to 8 bit sound") transforms.BitCrush(buf, 8) // upsample transforms.Resample(buf, float64(fs)) // encode the sound file o, err := os.Create("resampled.wav") if err != nil { panic(err) } defer o.Close() e := wav.NewEncoder(o, buf.Format.SampleRate, buf.Format.BitDepth, 1, 1) if err := e.Write(buf); err != nil { panic(err) } e.Close()
  33. freq := audio.RootA fs := 44100 // generate a sine

    wave osc := generator.NewOsc(generator.WaveSine, freq, fs) data := osc.Signal(fs * 4) buf := audio.NewPCMFloatBuffer(data, audio.FormatMono4410016bBE) create + fill a buffer
  34. // drop the sample rate if err := transforms.Decimate(buf, *factorFlag);

    err != nil { panic(err) } transforms.BitCrush(buf, 8) transforms.Resample(buf, float64(fs)) process the buffer
  35. // encode the sound file o, err := os.Create("resampled.wav") if

    err != nil { panic(err) } defer o.Close() e := wav.NewEncoder(o, buf.Format.SampleRate, buf.Format.BitDepth, 1, 1) if err := e.Write(buf); err != nil { panic(err) } e.Close() encode + write data
  36. API design - summary • Creating a feeling of simplicity

    is one the hardest thing to do • Understanding the problem domain + edge cases is key to design the right thing • Learn from previous attempts • Keep the scope small & iterate
  37. OBRIGADO @ m a t t e t t i