ニューラルFM音源

635e53b96114c922fa5486b418895960?s=47 Fadis
November 18, 2017

 ニューラルFM音源

畳み込みニューラルネットワークを使って与えられた音に近い音色を奏でるFM音源のパラメータを探す方法を解説します
これは2017年11月18日に行われた 第8回 カーネル/VM探検隊@関西の発表資料です。

635e53b96114c922fa5486b418895960?s=128

Fadis

November 18, 2017
Tweet

Transcript

  1. 5.

    $ $ $ $ ඵ ඵ ඵ ඵ ඵ ඵ

    ϐΞϊͷ$ͷԻͷप೾਺ղੳͷ݁Ռ $ $
  2. 6.

    $ $ $ $ ඵ ඵ ඵ ඵ ඵ ඵ

    $ $ $ͷप೾਺)[ͷ੒෼ جԻ ϐΞϊͷ$ͷԻͷप೾਺ղੳͷ݁Ռ
  3. 7.

    $ $ $ $ ඵ ඵ ඵ ඵ ඵ ඵ

    $ $ )[ͷ੔਺ഒͷ੒෼ ഒԻ ϐΞϊͷ$ͷԻͷप೾਺ղੳͷ݁Ռ
  4. 8.

    $ $ $ $ ඵ ඵ ඵ ඵ $ $

    ؚ·Ε͍ͯΔഒԻ͕มΘΔͱ ผͷָثͷԻʹฉ͑͜Δ $ͷप೾਺)[ͷ੒෼ ϚϦϯόͷ$ͷԻͷप೾਺ղੳͷ݁Ռ
  5. 9.

    $ $ $ $ ඵ ඵ ඵ ඵ ඵ $

    $ Ի৭ʹؚ·ΕΔഒԻ͸࣌ؒͰมԽ͢Δ େ఍ͷָث͸جԻ͔Β཭ΕͨഒԻ͕ઌʹফ͑Δ ϐΞϊͷ$ͷԻͷप೾਺ղੳͷ݁Ռ
  6. 13.

          $$ $ $ $

    ЋΛม͑ͳ͕Βप೾਺ղੳΛߦͳͬͨ݁Ռ Ћ ৴߸೾ͷৼ෯͕େ͖͍΄Ͳ೾ܗ͕࿪ΜͰഒԻ͕ग़Δ
  7. 14.

    ഒ ഒ ഒ ഒ ഒ ഒ $$ $ $ $

    ТTΛม͑ͳ͕Βप೾਺ղੳΛߦͳͬͨ݁Ռ ТT ৴߸೾ͷप೾਺ʹΑͬͯഒԻ͕ग़ΔҐஔ͕มΘΔ
  8. 18.
  9. 25.

    $ cufind_fm_params -i ../Piano.mf.C4.aiff -o out -n 48 -m 6

    -t 40 -w -8 -s 4 -c 1000 real 163m42.037s user 139m52.715s sys 23m34.418s $ ࣌ؒ෼ඵ 04 (FOUPP-JOVYY@ MJOVY ίϯύΠϥ OWDD ''5ʹ࢖༻ͨ͠ϥΠϒϥϦ $V''5 $16 *OUFM$PSFJ 4LZMBLF ()[ίΞ (16 OWJEJB(F'PSDF(59()[$6%"ίΞ ϝϞϦ IPTU(#EFWJDF(# αϯϓϦϯάσʔλͷ௕͞ ඵ L)[ ऴྃ৚݅ ੈ୅໨ΛٻΊͨΒऴྃ ໰୊఺͕͔͔࣌ؒΔ
  10. 26.
  11. 32.

    ޡࠩٯ఻೻ Y Y Y ॏΈX ॏΈX ॏΈX ग़ྗZP 㱠ظ଴͢Δग़ྗZ ͋ΔॏΈXͷܗࣜχϡʔϩϯʹ

    ஋YΛ௨ͯ͠ग़͖ͯͨ஋ZP͕ग़͖ͯͯཉ͍͠஋ͱҧͬͨ࣌ ͜ͷχϡʔϩϯͷৼΔ෣͍Λظ଴͢Δ΋ͷʹ͚ۙͮΔͷ͸ YͱZΛఆ਺ͱΈͳͯ͠ ZͱZPͷڑ཭͕࠷খͱͳΔΑ͏ͳ ॏΈͷ஋Λ୳Δ໰୊ʹͳΔ
  12. 35.

    ޡࠩٯ఻೻ ZP Y X X Τϥʔؔ਺ʹର͢Δ XͷӨڹ Τϥʔؔ਺ʹର͢Δ XͷӨڹ ط஌

    ͜ͷ෦෼Λग़ྗଆ͔Βॱ൪ʹٻΊ͍͚ͯ͹ྑ͍ ܗࣜχϡʔϩϯ͕Կ૚΋ॏͳ͍ͬͯͯ΋ ॏΈͷमਖ਼͕Ͱ͖Δ
  13. 48.

    ೖग़ྗͷؔ܎ΛҰҙʹ͢Δ ΦϖϨʔλ ΦϖϨʔλ ΦϖϨʔλ ग़ྗ ΦϖϨʔλ ΦϖϨʔλ ΦϖϨʔλ ग़ྗ ΦϖϨʔλͷฒͼॱ͕มΘ͚ͬͨͩͰ

    ಉ͡ܭࢉ͕ߦΘΕΔΑ͏ͳ৔߹ ෳ਺ͷύϥϝʔλ͔Βಉ͡Ի͕ग़ͯ͘Δ ೖग़ྗͷؔ܎͕ҰҙͰͳ͍Α͏ͳσʔλ͸ ֶश͕ऩଋ͠ͳ͍ݪҼͱͳΔ
  14. 53.

    ϓʔϦϯά૚         

     ϓʔϦϯά ݶΒΕͨྖҬͷதͰ࠷΋େ͖ͳ஋Λฦͨ͠ χϡʔϩϯͷ஋͚ͩΛ࣍ͷ૚ʹ఻͑Δ ݶΒΕͨྖҬͷதͰ࠷΋େ͖ͳ஋Λฦͨ͠ χϡʔϩϯͷ஋͚ͩΛ࣍ͷ૚ʹ఻͑Δ
  15. 55.

    Shallow vs. Deep Sum-Product Networks [Olivier Delalleau and Yoshua Bengio,

    2011] https://papers.nips.cc/paper/4350-shallow-vs-deep-sum-product-networks ಉ͡χϡʔϩϯΛ૿΍͢ͳΒ ૚Λ޿͘͢ΔΑΓ૚ͷ਺Λ૿΍͢ํ͕ ΑΓෳࡶͳؔ਺ΛදݱͰ͖Δͱ͢Δ࿦จ
  16. 59.

    GoogLeNet ෳ਺Օॴͷग़ྗΛΤϥʔؔ਺ʹ௨ͯ͠ ௕͍ωοτϫʔΫͷ్தͰ ޯ഑Λิڅ͢Ε͹ྑ͍ ͔͜͜Βޡࠩٯ఻೻ ͔͜͜Β΋ ͬͪ͜΋ Going Deeper with

    Convolutions [Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, 2015] https://research.google.com/pubs/pub43022.html
  17. 60.

    ϒϩοΫΛ௨͖ͬͯͨ஋ͱ ϒϩοΫͷೖྗΛࠞͥͯग़ྗ͢Δ ResNet Deep Residual Learning for Image Recognition [Kaiming

    He and Xiangyu Zhang and Shaoqing Ren and Jian Sun, 2015] https://arxiv.org/abs/1512.03385
  18. 61.

    ResNet Deep Residual Learning for Image Recognition [Kaiming He and

    Xiangyu Zhang and Shaoqing Ren and Jian Sun, 2015] https://arxiv.org/abs/1512.03385 ৞ΈࠐΈY ৞ΈࠐΈY 3F-6 3F-6 Ճࢉ Yͷ৞ΈࠐΈͭΛ ϒϩοΫͱ͢Δ Լͷ৞ΈࠐΈͷޙ ׆ੑԽؔ਺ʹ௨͢લʹ ೖྗΛࠞͥΔ 3F-6͸׆ੑԽ͍ͯ͠ΔݶΓ େ͖ͳޯ഑ΛऔΔͨΊ ੺ͷϧʔτͷޯ഑͸্૚Ͱ΋େ͖͘ͳΔ
  19. 63.
  20. 66.

    void training_data_generator<T,operator_count,resource_type>::poll() { bool active=true; constexpr size_t batch_size=15; constexpr size_t

    image_size=400*length; constexpr size_t config_size=config_length; const auto window=softfm::generate_window(1<<13); std::random_device seed_gen; resource_type resources(batch_size,192000,seed_gen()); generate_training_data<T,operator_count,resource_type> generator; while(1) { if(active) { std::vector<T> pixels_(image_size*batch_size,0); std::vector<T> configs_(config_size*batch_size,0); generator(window,pixels_.data(),configs_.data(),batch_size,resources); std::lock_guard<std::mutex> lock(guard); configs.emplace(std::move(configs_)); pixels.emplace(std::move(pixels_)); if(pixels.size()==100)active=false; if(end)break; } else { std::lock_guard<std::mutex>lock(guard); if(pixels.size()< 100)active=true; if(end) break; } } } FMԻݯͷεϖΫτϧը૾ͱઃఆͷϖΞΛ ແݶʹ࡞ΔεϨου
  21. 67.

    template <typename Dtype> void FMAudio8Layer<Dtype>::DataLayerSetUp( const vector<Blob<Dtype>*>& bottom,const vector<Blob<Dtype>*>& top){

    CHECK_GT(batch_size_*size_,0)<<"oops"; labels_.resize(batch_size_,0); auto &gen=softfm::training_data_generator<Dtype,8,r>::get(); data_[pos_]=std::move(gen.get_pixels()); vector<int> label_shape(1,batch_size_); top[0]->Reshape(batch_size_,channels_,height_,width_); top[1]->Reshape(label_shape); added_data_.Reshape(batch_size_,channels_,height_,width_); added_label_.Reshape(label_shape); added_data_.cpu_data(); added_label_.cpu_data(); } template <typename Dtype> void FMAudio8Layer<Dtype>::Forward_cpu( const vector<Blob<Dtype>*>& bottom,const vector<Blob<Dtype>*>& top){ top[0]->Reshape(batch_size_,channels_,height_,width_); top[1]->Reshape(batch_size_,1,1,1); top[0]->set_cpu_data(data_[pos_].data()); top[1]->set_cpu_data(labels_.data()); pos_=(pos_+1)%n_; auto &gen=softfm::training_data_generator<Dtype,8,r>::get(); data_[pos_]=std::move(gen.get_pixels()); } ੜ੒͞ΕͨFMԻݯͷεϖΫτϧը૾Λर͏ೖྗ૚
  22. 68.

    layer { name: "data" type: "FMAudio8" top: "data" top: "dummy1"

    } layer { name: "label" type: "FMConfig8" top: "label" top: "dummy2" } layer { name: "conv_1" type: "Convolution" bottom: "data" top: "conv_1" ࣮૷ͨ͠ೖྗ૚Λ ωοτϫʔΫఆٛϑΝΠϧʹ ฒ΂Δ εϖΫτϧը૾͕ ແݶʹ༙͍ͯ͘Δ૚ '.Իݯͷઃఆ͕ ແݶʹ༙͍ͯ͘Δ૚
  23. 69.

    layer { name: "conv_1" type: "Convolution" bottom: "data" top: "conv_1"

    param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 64 pad: 3 kernel_size: 7 stride: 2 #4%ϥΠηϯεͰ഑෍͞Ε͍ͯΔ3FT/FUͷఆ͕ٛ طʹ͋ΔͷͰ ໯͖ͬͯͯઌ΄Ͳͷೖྗ૚ʹ͚ͬͭ͘Δ https://github.com/jay-mahadeokar/pynetbuilder
  24. 70.

    layer { name: "loss/classifier_" type: "InnerProduct" bottom: "conv_1by1_1000" top: "loss/classifier_"

    param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 104 weight_filler { type: "xavier" } ग़ྗͷ਺͕ '.Իݯύϥϝʔλͷ਺ʹ ͳΔΑ͏ʹ಺ੵΛͱͬͯ
  25. 71.

    ࠷ޙʹ'.Իݯͷઃఆͱͷ ϢʔΫϦουڑ཭ΛٻΊΔ num_output: 104 weight_filler { type: "xavier" } bias_filler

    { type: "constant" value: 0 } } } layer { name: "loss/loss" type: "EuclideanLoss" bottom: "loss/classifier_" bottom: "label" top: “loss/loss" loss_weight: 1 }
  26. 74.

    $ nvidia-smi Thu Nov 16 00:54:35 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI

    384.90 Driver Version: 384.90 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1070 Off | 00000000:01:00.0 Off | N/A | | 41% 69C P2 118W / 151W | 6557MiB / 8114MiB | 99% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 28991 C ./resume 6547MiB | +-----------------------------------------------------------------------------+ ϝϞϦ͕8GB͔͠ͳ͍ೈऑͳGPUͰ͸ ResNet-36ͷ࣍ʹখ͍͞ResNet-50͸ ϝϞϦʹ৐Βͳ͔ͬͨ(όοναΠζ15ͷ৔߹) ΋͏73".ͳ͍ ͳͥResNet-36ͳͷ͔
  27. 75.

    CaffeͷSolverͰֶश void run(){ caffe::SolverParameter solver_param; caffe::ReadProtoFromTextFileOrDie(FLAGS_solver, &solver_param); std::shared_ptr<caffe::Solver<float>> solver(caffe::SolverRegistry<float>::CreateSolver(solver_param)); if

    (FLAGS_snapshot.size()) { LOG(INFO) << "Resuming from " << FLAGS_snapshot; solver->Restore(FLAGS_snapshot.c_str()); } else if (FLAGS_weights.size()) { CopyLayers(solver.get(), FLAGS_weights); } const auto net = solver->net(); LOG(INFO) << "Solve start."; solver->Solve(); }
  28. 76.

    return; } if( FLAGS_output.empty() ) { LOG(INFO) << "Required parameter

    output is not set." << FLAGS_snapshot; return; } const auto net = solver->net(); float dummy = 0.f; constexpr size_t image_size = 400*400; std::vector< float > image( image_size, 0.f ); softfm::get_spectrum_image_flat( std::vector< std::pair< int, std::string > >{ { FLAGS_note, FLAGS_input } }, image.data(), FLAGS_threshold ); const auto input_layer = boost::dynamic_pointer_cast< caffe::MemoryDataLayer< float > >( net->layer_by_name("data") ); input_layer->Reset( image.data(), &dummy, 1 ); const auto result = net->Forward(); const auto data = result[ 1 ]->cpu_data(); const std::vector< float > values( data, std::next( data, result[ 1 ]->count() ) ); namespace karma = boost::spirit::karma; std::string serialized; std::ofstream out( FLAGS_output, std::ios::out|std::ios::binary ); std::ostreambuf_iterator< char > obuf( out.rdbuf() ); karma::generate( obuf, karma::float_ % ',' , values ); } ຊ෺ͷָثͷԻΛ ֶशͨ͠ωοτϫʔΫʹ௨ͯ͠ධՁ
  29. 84.

    ֶशࡁΈͷϞσϧ͕͋Δঢ়ଶ͔ΒͳΒ Ҩ఻తΞϧΰϦζϜͱൺֱͯ͠ѹ౗తʹ୹࣌ؒ ܭࢉʹཁͨ࣌ؒ͠ 04 (FOUPP-JOVYY@B MJOVY (FOUPP-JOVYY@B MJOVY ίϯύΠϥ OWDD

    HDD ''5ʹ࢖༻ͨ͠ $V''5 ''58 ୯ਫ਼౓ɺ0QFO.1ରԠɺ"79ରԠ %//ϑϨʔϜϫʔΫ  $B⒎F $16 *OUFM$PSFJ 4LZMBLF ()[ίΞ (16 OWJEJB(F'PSDF(59()[$6%"ίΞ ϝϞϦ IPTU(#EFWJDF(# σʔλͷ௕͞ ඵ L)[ ऴྃ৚݅ ੈ୅໨ΛٻΊͨΒऴྃ ॱํ޲ͷܭࢉΛճߦͬͯऴྃ ܭࢉʹཁͨ࣌ؒ͠ ࣌ؒ෼ඵ ඵ ݁Ռ