Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rubyで音を視る

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.
Avatar for ydah ydah
June 06, 2026

 Rubyで音を視る

松江Ruby会議12「Rubyで音を視る」の発表スライド
https://matsue.rubyist.net/matrk12/ #matrk12

Avatar for ydah

ydah

June 06, 2026

More Decks by ydah

Other Decks in Technology

Transcript

  1. 1

  2. 3

  3. Samples RMS amplitude Sample RMS 夏 圧 0 [0.5, -0.5,

    0.5, -0.5] (0.5 + -0.5 + 0.5 + -0.5) / 4 = 0 RMS = 1 N N ∑ i=1 x2 i
  4. Samples FFT magnitudes / peak_frequency FFT 圧 WN = e−2πi/N

    Ek = N/2−1 ∑ m=0 x2m Wkm N/2 Ok = N/2−1 ∑ m=0 x2m+1 Wkm N/2 Xk = Ek + Wk N Ok Xk+N/2 = Ek − Wk N Ok (k = 0,1,…, N 2 − 1)
  5. magnitudes BandSplitter sub / low / mid / high FFT

    Vizcore sub, low, mid, high sub: 20Hz . . 60Hz low: 60Hz . . 250Hz mid: 250Hz . . 4000Hz high: 4000Hz . . 20000Hz
  6. beat history BPMEstimator bpm / beat_phase / bar_phase BPM beat

    frame lag BPM beat beat beat = beat_pulse = bpm = beat_phase = bar_phase = bpm = 60.0 * frame_rate / best_lag
  7. bands + onset drums con fi dence 圧 band onset

    kick = sublow_strength * onset # sub/low ͷڧ͞ × onset snare = mid_strength * onset # mid ͷڧ͞ × onset hihat = high_strength * onset # high ͷڧ͞ × onset
  8. magnitudes / spectrum spectral features FFT Spectral features FFT 圧

    amplitude bands spectral features centroid: / rollo ff : / fl atness: / fl ux: /
  9. raw features smoother audio features frame を EMA EMAt =

    αxt + (1 − α)EMAt−1 amplitude: @sm.smooth(:amplitude, normalized[:amplitude]) bands: @sm.smooth_hash(normalized[:bands], namespace: :bands) fft: @sm.smooth_array(normalized[:fft], namespace: :fft)
  10. amplitude bass / low high / treble beat beat_pulse bpm

    / phase spectral_* / / / / / を 長
  11. audio features MappingResolver layer params map beat_pulse, to: :radius, gain:

    160.0, min: 56, max: 164, attack: 1.0, release: 0.2
  12. layer params audio_frame audio scene layers audio frame { “audio”:

    { "amplitude": 0.42, "beat": true, "beat_pulse": 1.0, "bands": { "low": 0.8 } }, "scene": { "name": "main", "layers": [ { "name": "rings", "params": { "radius": 140 } …
  13. 6/4

  14. BGM

  15. burst amplitude_delta = [ current[:amplitude] - previous[:amplitude], # લϑϨʔϜ͔ΒͷԻྔ૿Ճ 0.0

    ].max burst_strength = [ current[:onset], # Իͷ্ཱ͕ͪΓ current[:spectral_flux], # εϖΫτϧมԽྔ current.dig(:onsets, :mid), # தҬͷ্ཱ͕ͪΓ current.dig(:onsets, :high), # ߴҬͷ্ཱ͕ͪΓ amplitude_delta, # Իྔͷ૿Ճྔ recent_onset_max # ௚ۙ૭ͷ࠷େ্ཱ͕ͪΓ ].max
  16. spectral fl atness s_score = weighted_average([ high_ratio, # ߴҬͷڧ͞ flatness,

    # ϊΠζͬΆ͞ zcr, # ೾ܗͷࡉ͔͍༳Ε centroid # Իͷ໌Δ͞ ]) n_score = weighted_average([ nasal_score, # ඓԻͬΆ͞ 1.0 - high_ratio, # ߴҬ͕গͳ͍ low_mid_ratio, # ௿ʙதҬͷڧ͞ 1.0 - zcr # ༳Ε͕গͳ͍ ])
  17. attack feature attack = { strength: attack[:attack_strength], # ग़ͩ͠ͷڧ͞ centroid:

    attack[:centroid_norm], # ग़ͩ͠ͷ໌Δ͞ high: attack[:high_ratio], # ߴҬͷڧ͞ low: attack[:low_ratio], # ௿Ҭͷڧ͞ flatness: attack[:spectral_flatness],# ϊΠζͬΆ͞ zcr: attack[:zero_crossing_rate], # ࡉ͔͍༳Ε f1 : attack[:f1_frequency], # F1 ͬΆ͍ࢁ f2 : attack[:f2_frequency], # F2 ͬΆ͍ࢁ nasal: attack_nasal_score # ඓԻͬΆ͞ }
  18. 母 × kana_score = [ vowel_score * 0.55, # ฼ԻΒ͠͞

    consonant_score * 0.35, # ࢠԻΒ͠͞ burst_bonus * 0.10 # ग़ͩ͠ͷิਖ਼ ].sum candidate = { text: kana_table[consonant][vowel], # ͔ͳจࣈ vowel: vowel, # ਪఆ฼Ի consonant: consonant, # ਪఆࢠԻ conf i dence: kana_score # ީิͷࣗ৴౓ }