Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Designing a generic audio API

Designing a generic audio API

Digital audio is omnipresent in our lives, from your cellphone’s ringtone to streaming music on Spotify. The main audio formats were created in the early 90s stood the test of time and survived through technology advancements. During the first part of this talk will take a look at the anatomy of these popular formats, why they were designed like that and what makes them durable. Then we will talk about the challenges and lessons learned designing and implementing a generic Go API for digital audio. This talk was designed for curious attendees of all levels. The focus on audio format is mainly there as an excuse for us to reason about API design in an unfamiliar/unbiased context.

Matt Aimonetti

November 05, 2016
Tweet

More Decks by Matt Aimonetti

Other Decks in Programming

Transcript

  1. M U S I C M A D E B

    E T T E R D E S I G N I N G A G E N E R I C AU D I O A P I M AT T A I M O N E T T I @ m a t tet t i
  2. Music creation 
 Bedroom producers are the new rockstars. 


    Remixing & collaboration is trending. 
 And It’s not just electronic music.
  3. Designing an API, what’s a design? Design (Oxford dictionary): To

    form a plan to arrange or conceive in the mind... For later execution
  4. Sampling amount of red dots per sec = sample rate

    resolution of each sample = bit depth
  5. Sampling sampled the analog signal 44100 times in a second

    each sample is on a scale from -32,768 through 32,767 CD resolution 44,100 Hz 16bit
  6. 162 (16 bit integer) bytes: A2 00 base 2: 1010001000000000

    (little endian) sound.wav Digital storage
  7. Design is a process • Goal: consume audio stream •

    Desiderata: • unified API • fast / low memory usage • extensible
  8. sound.wav Digital storage xs := 162, 49, -23, 0, 99,

    9241, 0, 0, 32400 func effect(xs []int) { for i := 0; i < len(xs)-1; i++ { xs[i] = (xs[i] + xs[i+1])/2 } } xs
  9. Exemplars ex·em·plar /iɡˈzemplər,iɡˈzemplär/ noun a person or thing serving as

    a typical example or excellent model. "he became the leading exemplar of conservative philosophy" Few designs are all-new • even novel designs derive from earlier artifacts intended for similar purposes • The besetting of expert designers is not designing the thing wrong, but designing the wrong thing. - Fred Brooks
  10. wavefile = wave.open('sound.wav', 'rb') nChannels = file.getnchannels() length = waveFile.getnframes()

    for i in range(0,length): waveBytes = waveFile.readframes(1) // I need to hand decode the values wavefile.close() python std lib
  11. data formats ☠ 8, 16, 24, 32, 64 bit integers

    (signed or not) or 32/64 floats or bytes
  12. sf_count_t sf_read_short (SNDFILE *sndfile, short *ptr, sf_count_t items); sf_count_t sf_read_int

    (SNDFILE *sndfile, int *ptr, sf_count_t items); sf_count_t sf_read_float (SNDFILE *sndfile, float *ptr, sf_count_t items); sf_count_t sf_read_double (SNDFILE *sndfile, double *ptr, sf_count_t items); sf_count_t sf_readf_short (SNDFILE *sndfile, short *ptr, sf_count_t frames); sf_count_t sf_readf_int (SNDFILE *sndfile, int *ptr, sf_count_t frames); sf_count_t sf_readf_float (SNDFILE *sndfile, float *ptr, sf_count_t frames); sf_count_t sf_readf_double (SNDFILE *sndfile, double *ptr, sf_count_t frames); C: libsndfile
  13. API design constraints • different sample rate values (44.1kHz, 96kHz..)

    • different bit depth values (8bit, 16bit…) • different literal types (int, unsigned int, float, bytes) • different channel values (mono, stereo…)
  14. API design challenges • different encapsulations (wave, aiff…) • processing

    might require type conversion • different usages (playback, info, conversion, manipulation, analysis…) • has to be fast
  15. // PCMBuffer encapsulates uncompressed audio data // and provides useful

    methods to read/manipulate this PCM data. type PCMBuffer struct { // Format describes the format of the buffer data. Format *Format // Ints is a store for audio sample data as integers. Ints []int // Floats is a store for audio samples data as float64. Floats []float64 // Bytes is a store for audio samples data as raw bytes. Bytes []byte // DataType indicates the primary format used for the underlying data. // The consumer of the buffer might want to look at this //value to know what store to use to optimally retrieve data. DataType DataFormat // framePos is the position of the last frame we read framePos int64 }
  16. // Format is a high level representation of the underlying

    data. type Format struct { // NumChannels is the number of channels contained in the data NumChannels int // SampleRate is the sampling rate in Hz SampleRate int // BitDepth is the number of bits of data for each sample BitDepth int // Endianness indicate how the byte order of underlying bytes Endianness binary.ByteOrder }
  17. freq := audio.RootA fs := 44100 // generate a sine

    wave osc := generator.NewOsc(generator.WaveSine, freq, fs) data := osc.Signal(fs * 4) buf := audio.NewPCMFloatBuffer(data, audio.FormatMono4410016bBE) // drop the sample rate if err := transforms.Decimate(buf, *factorFlag); err != nil { panic(err) } fmt.Println("bit crushing to 8 bit sound") transforms.BitCrush(buf, 8) // upsample transforms.Resample(buf, float64(fs)) // encode the sound file o, err := os.Create("resampled.wav") if err != nil { panic(err) } defer o.Close() e := wav.NewEncoder(o, buf.Format.SampleRate, buf.Format.BitDepth, 1, 1) if err := e.Write(buf); err != nil { panic(err) } e.Close()
  18. freq := audio.RootA fs := 44100 // generate a sine

    wave osc := generator.NewOsc(generator.WaveSine, freq, fs) data := osc.Signal(fs * 4) buf := audio.NewPCMFloatBuffer(data, audio.FormatMono4410016bBE) create + fill a buffer
  19. // drop the sample rate if err := transforms.Decimate(buf, *factorFlag);

    err != nil { panic(err) } transforms.BitCrush(buf, 8) transforms.Resample(buf, float64(fs)) process the buffer
  20. // encode the sound file o, err := os.Create("resampled.wav") if

    err != nil { panic(err) } defer o.Close() e := wav.NewEncoder(o, buf.Format.SampleRate, buf.Format.BitDepth, 1, 1) if err := e.Write(buf); err != nil { panic(err) } e.Close() encode + write data
  21. API design - summary • Creating a feeling of simplicity

    is one the hardest thing to do • Understanding the problem domain + edge cases is key to design the right thing • Learn from previous attempts • Keep the scope small & iterate