$30 off During Our Annual Pro Sale. View Details »

-SSII技術マップを通して見る過去・現在,そして未来-

 -SSII技術マップを通して見る過去・現在,そして未来-

SSII30周年記念パネルディスカッション
2024年6月12日(水)

https://confit.atlas.jp/guide/event/ssii2024/static/specialsession

Hironobu Fujiyoshi

June 12, 2024
Tweet

More Decks by Hironobu Fujiyoshi

Other Decks in Research

Transcript

  1. ࣗݾ঺հɿ౻٢߂࿱ 2 த෦େֶϩΰ த෦େֶϩΰ  ֶྺɿ ೥த෦େֶେֶӃम࢜՝ఔमྃ ೥த෦େֶେֶӃത࢜ޙظ՝ఔຬظୀֶʢത࢜ʣ ݚڀ׆ಈɿ ೥ถΧʔωΪʔϝϩϯେֶϩϘοτ޻ֶݚڀॴϙευΫݚڀһʢ೥ʣ

    ೥த෦େֶ޻ֶ෦ߨࢣ ೥த෦େֶ޻ֶ෦।ڭत ೥ถΧʔωΪʔϝϩϯେֶϩϘοτ޻ֶݚڀॴ٬һݚڀһʢ೥ʣ ೥த෦େֶ޻ֶ෦ڭत ~  ೥44**ϓϩάϥϜҕһ௕ ೥44**࣮ߦҕһ௕ ೥44**࣮ߦҕһ௕ɼػց஌֮ϩϘςΟΫεݚڀάϧʔϓ ݱࡏʹࢸΔ ֶ֎׆ಈɿ ೔ຊσΟʔϓϥʔχϯάڠձཧࣄ ΫϩεΞϙΠϯτϝϯτʢσϯιʔʣ vol.162 ౻٢߂࿱ʢத෦େֶʣʮ“+AI”ͰมΘΔະདྷʯ IUUQTXXXZPVUVCFDPNXBUDI W[K&0J7)6 ౻٢߂࿱ ஑ాӯࣿ͞Μ (೫໦ࡔ46) ઒ాेເ͞Μ
  2. w 44**ͷप೥اը  ਐలͷ໨֮·͍͠ը૾ηϯγϯ グ ٕज़Λ੔ཧ͠ɼ͔ͦ͜Β֎ૠͯ͠ະདྷΛ プ Ϩ ビ ϡʔ

    44**ٕज़Ϛοϓ   2000 2005 2010 2015 幾 何 情 報 取 得 光 情 報 取 得 基 本 機 能 向 上 ( ) (PS ) / 光 取得 能( ) / (( ) 機 Pulse count Imspector(Specim, 99) / Entertainment Vision Sensor(SONY, 01/02) Safety Laser Scanner (SICK, 02) 光 TOF) Photon Ics (PMDTech, 03) 光 TOF)向 2 03 Equiox (Canesta, 04) 光 TOF Cartesia(Space Vision, 04) PMD(98) Pulse count (01) (CMOS) (04) LED LIDAR(05) TOF(12) 能up MF-PD TOF(2014) / 向 (03) (05) Catadioptic Imaging(06 (09 (10 光 光 (91 (95) RF(94) (01 光 光 (03 LSI VGA 光 (04) 光 (04 (06 Temporal Dithering(08 向 Binary Dithering(13 2 (13) FB PS(03) PS(09) + (99) DFD Primal-Dual Coding(12) 1 光 LT情報取得 (12) 3 6 HDTV(04) 2 (08) Coded Aperture + (10 Non-scanning (04 向 Light Stage2(02) 楕円面ミラー+プロジェクタ方式(07) 入射・出射方向のサンプリング簡略化 SVBRDF (13) BSSRDF (13) (05 (07 光 情報 1 取得 光 (01) / DR (05) 光 Panchromatic pixels(12) Bayar S/N向上 Jitter Camera(04) (11) Multi Camara Array(04 Pleoptic Camera(05 MLA (11) 光 AWARE Gigapixel Imaging(13) 光 光 (06 3500fps 512x512 Femto Photography(11 能 能 Coded Aperture(07/09) 光 Focal Sweep(08) 能( ) (12) ( ) HDL LIDAR(Velodyne, 07) 光 TOF Grambel Precision, 08) ProFusion(PointGray, 08 Kinect (Prime Sense, 09) CAMERACHIP(Omnivision, 08) Wavefront Coding Gelsight Benchtop Scanner(Gelsight, 10) Photometric Stereo D-Imager Panasonic, 10) 光 TOF Raytrix R11(Raytrix, 10) Lytro Light Field Camera(Lytro, 12) Kinect2 (Microsoft, 13) 光 TOF TVS 3.0(3 , 14) 6500 12100 1000 9800 (00) 光 (00) 16 Spatially Varying Pixel Exposures(00) 光 Wavefront Coding(95/03/08) 2014/05/06版 PGP (95) TOMBO(01 ಛ ௃ ந ग़ ύ λ ồ ϯ Ϛ ỽ ν ϯ ά ಛ ௃ ఺ ݕ ग़ ɾ ه ड़ ౷ ܭ త ֶ श ๏ ࠷ ۙ ๣ ୳ ࡧ ೥୅ SVM(95) Ϛʔδϯ࠷େԽ SIFT(99) εέʔϧෆม ಛ௃఺ݕग़ɾهड़ SURF(06) ੵ෼ը૾ ΞϧΰϦζϜʹΑΔߴ଎Խ GPU SIFT(06) ϋʔυ΢ΣΞʹΑΔߴ଎Խ FAST(06) ػցֶश ίʔφʔݕग़ BRIEF(10) ֶशແ͠ ϥϯμϜαϯϓϦϯά ORB(11) ڭࢣແֶ͠श D-BRIEF(12) ڭࢣ͋Γֶश Bin-Boost(13) ڭࢣ͋Γֶश Random Forests(01) ΞϯαϯϒϧֶशʴϥϯμϜֶश DPM(08) Ϟσϧͷ෼ׂ latent SVM ʹΑΔࣝผ Exemplar SVM(11) ࣄྫϕʔεͷ SVM WTA Hashing(11) ௒ߴ଎Խ HOG(05) ޯ഑৘ใ HLAC(88) ߴ࣍ͷࣗݾ૬ؔ CHLAC(04) HLAC ʹ࣌ؒ࣠ͷ௥Ճ Haar-like(01) box ϑΟϧλ DOT(10) ޯ഑৘ใͷςϯϓϨʔτϚονϯά VLAD(10) ؔ࿈͢Δ VW ͷಛ௃ྔΛ࢖༻ Fisher Vector(07) ֬཰ີ౓ؔ਺ʹΑΔಛ௃ྔͷදݱ Deep Learning(08) ଟ૚χϡʔϥϧωοτϫʔΫ දݱֶश ௒ଟΫϥεࣝผ໰୊  ສΧςΰϦ Crowdsourcing(13) ਓͷ஌ݟͷಋೖ ৄࡉը૾ࣝผ ॲ ཧ Ϩ ϕ ϧ ERT(06) RF ͷϥϯμϜੑΛ࠷େԽ Fern(06) RF ͷ෼ذ৚݅Λ֊૚Ͱ౷Ұ 5 ೥ޙ texton(01) ϑΟϧλͷόϯΫ ਓମύʔπࣝผ ߴ଎Խ ߴਫ਼౓Խ ΞϓϦέʔγϣϯ 2000 2005 2010 ෺ମݕग़ ଟΫϥε ࣗݾҐஔਪఆ ϚγϯϏδϣϯ ը૾ݕࡧ إݕग़ ਓݕग़ ಛఆ෺ମೝࣝ ը૾෼ྨ ಛ௃ྔͷࣗಈੜ੒ ਓ͕஫໨ͨ͠Ґஔ͔Βಛ௃நग़ ϚʔΧೝࣝ ೋ஋ಛ௃ ਓͱͷϋΠϒϦουʹΑΔ ׭ೳݕࠪɺܽؕݕग़ ૿෼ූ߸૬ؔ (00) ً౓ͷ૿ݮΛೋ஋Ͱը૾Խ RRF(03)  ํ޲ͷೱ౓มԽ ૄςϯϓϨʔτϚονϯά (05)  छྨͷϞσϧͷ࢖͍෼͚ ݻ༗෼ղςϯϓϨʔτϚονϯά (11) ճసมԽʹؤ݈ͳ৘ใΛར༻ Co-Occurrence Template Matching(10) ݦஶੑͷߴ͍ըૉͰর߹ ߴ଎ଟΫϥεࣝผ Harris-Affine(02) ΞϑΟϯෆมಛ௃఺ݕग़ MSER(02) ߴ଎ͳΞϑΟϯෆม఺ಛ௃ Object Bank(10) ଟΫϥεͷཁૉΛಛ௃ྔԽ Relative attribute(11) ࣮਺ʹΑΔؔ࿈ཁૉͷදݱ zero-shot transfer(09) ؔ࿈ཁૉ͔ΒඇֶशΫϥεͷݕग़ zero-shot learning ʹΑΔ ֶश֎αϯϓϧ΁ͷదԠ సҠֶश ܭྔֶश ੜଶௐࠪ ηϚϯςΟοΫηάϝϯςʔγϣϯ CoHOG(09) )0( ͷڞىදݱ MLP(86) ଟ૚ύʔηϒτϩϯ Online PA(06) ೖྗαϯϓϧʹԠͯ͡ॏΈϕΫτϧߋ৽ εύʔεಛ௃ྔ (06) Haar-like ϐΫηϧࠩ෼ ৄࡉը૾هड़ άϥεϚϯଟ༷ମ (08) ઢܗ෦෼ۭؒͷू߹ମ product quantization(11) αϒϕΫτϧʹΑΔྔࢠԽ εϖΫτϧཧ࿦ʹΑΔεέʔϧ୳ࡧ (12) ಛ௃ۭؒͷࣹӨ ෆมੑͷ֫ಘ DAISY(08) هड़ۭؒͷվྑ BOF(04) ಛ௃ྔͷࣙॻԽ େن໛إೝࣝ Deep Neural Network ͷ ߴ଎Խ ௚ަ੍໿૬ޓۭؒ๏ (06) ௚ަߦྻʹΑΔۭؒͷؔ܎Λ௚ߦԽ ΧʔωϧτϦοΫ (00) ಛ௃ۭؒͷࣹӨ ݻ༗ۭؒ๏ (96)  ࣍ݩը૾ʹΑΔ  ࣍ݩ෺ମೝࣝ ૬ޓ෦෼ۭؒ (85) ෦෼ۭؒಉ࢜ͷਖ਼४֯ LBP(94) ہॴྖҬͷೋ஋Խ CARD(11) ಛ௃ྔΛ  ஋Խ Decision Jungles(13) ύεڞ༗ʹΑΔলϝϞϦͳܾఆ໦ ੍໿૬ޓ෦෼ۭؒ๏ (99) ࣝผʹ༗ޮͳۭؒ΁ͷࣹӨ Pentium3(99) Pentium4(00) Xeon5100 Intel Core2 Core i 7(11) CPU (06) Core i 5(09) Core i 3(10) SSE SSE(99) SSE2(00) SSE3(04) SSE4.1(08) SSE4.2(11) GPU GeForce2(00) GeForce3(01) GeForce4,FX(02) GeForce6(04) GeForce7(05) GeForce8(06) GeForce9(08) GeForce200(08) GeForce400(10) GeForce500(11) GeForce600(12) GeForce700(13) CUDA Tesla(07) R100(00) R200,300(02) R400(04) Northern Islands(11) R500(05) R600(07) R700(08) Evergreen(10) Southern Islands(12) Volcanic Islands(13) 2014/06/09 ൛ CNN(89) ϓʔϦϯάͱ৞ΈࠐΈ ʹΑΔಛ௃நग़ AdaBoost(95) Ξϯαϯϒϧֶश αϯϓϧॏΈͷஞ࣍ߋ৽ ηϚϯςΟοΫө૾ѹॖ KE` ÏßÙ°º« ¬¬ŽŸ™ ćĔêĔüčùëĒíĕ êĉčlC+ 1 ¥À¹Ë½¶À¹Œ ¢µÀ½¶ÅµÇ½ÃŽŸŸ ¥µ·ÇÃŽ͵ǽÃŽŸ˜ ®™®Žž— ®Å½ÃŽǽ͹¸Œ˜£™£Œ ÁµÇ·¼½Â»Ž—– ¡È¸À¹Œ ¸¾ÈÆÇÁ¹ÂnjŽŸŸ «ÈÀǽÄÀ¹Œ¶µÆ¹À½Â¹ŽŸ™ ¯¹»ÈÀµÅ½ÍµÇ½ÃŽž› ®ÀµÂ¹ŒÆʹ¹ÄŽŸŸ ¸µÄǽɹŒ ʽ¸ÃʎŸš ±¹Çŵ¼¹¸ÅµŒ·µÅɽ»Ž– £Ã¦ē°¨¥±ŽŸŸ §µÅŽƎžž ¥ °±Ž–œ £ª±Žš ®Åþ¹·Ç½É¹Œ¥”ŽŸœ £ ¨°´Ž–ž ¢ÅÃÆƓ¶µÆ¹¸ŒÀ÷µÀŒ ÆÈÄÄÃÅǎ–Ÿ ¦ÀöµÀŒÁµÇ·¼½Â»Œ íčĀêùüŽ–— ¡Å½¹ºŒ®ÅÃĵ»µÇ½ÃŽ–™ ¬Ã“®µÅµÁ¹ÇŽ·Œ ¢µÀ½¶ÅµÇ½ÃŽ–› °Äµ·¹Œ·µÅɽ»Ž–– ª©Žž— ®± «Ž– ¯ŒÇÃÃÀ¿½ÇŽŸŸ Ė1ėØ~ÜØ7ÎäďïĎôĈÐ ‹h2ŒµÂ¸•ÃŌđþóüŒµÂ¸•ÃŌ‹| ¡ Œ½ÂŒÀµÅ»¹Ž—– ĀèüĉüĎÖÙoŽ—™ ruOÙñĔĒ% NŽ—˜’—™ ‹w 28 ğ "\uOÙ°º«Ž—™ đĔĎĒíñċùöĔ ŠÙwRŽ—™ ¡È½À¸½Â»Œ¯ÃÁµŽ–Ÿ aÙčĒýćĔì ÏßÙlC+Ž—˜’—™ *mûĔöŽÄŽÃŏ٠_Ė—™ė 6ğ ĞÄǓ¥«µÇŽˌŽž— ĜÄǓ¤«µÇŽˌŽ–š ÄǓ¥«µÇŽˌŽŸš ęɽ¹ÊŒ°º« êĉč l+ ¬“ɽ¹ÊŒ°º« Y9Nē y{ē ćùøĒí ðĔĀçó ^= êĉčMQ ąìõďe„àÑ ěˆÕe„s[ĕI€ &vU;$Ù_ 0ˆóëċĒØÞágTĕ‹| đþóü×y{(â_ îóüĖ‚Ûėâ€:bØxf ®«³°Ž– Q‰Ù_ ªÃ·µÀŒÆÈÄÄÃÅÇÙ_ îóüÙI€ °¹¹¸ŒŒ¦ÅÃʌăĔóÙ đþóüćùøĒí đĔêďÿùø ‘°°£ ‡XÚÙA3 ®Å¹“¹ÁÄǽɹŽ–™ ¯¤¢­¬Ž—— ° ¢ÙjW † ÙbR+ ĊûďĀåùúæĒíĕ&DR ,H„]I€ ,H„êĉčüčùëĒí ĕe„U8 ` NiÖÙoĕ xfÚÙI€ ÀµÅ»¹“Æ·µÀ¹Õٮ® ¢¦ÄŹ·Ã¸½Ç½ÃµÅØÞá 'uO‹| êĉčpf êĉčlf êĉčlēe„UI€ 0ˆMQÿöĔĒâ_ÎÒ gGëċĎāĐĔñČĒ êĉčÿčĉĔöÙA3 £½Å¹·ÇŒµÀ½»ÂÁ¹Âǎ– £± «Ž—— ³ÃÀÈÁ¹ÇŽ·Œ»ÅµÄ¼“·ÈÇƎ–› íčĀêùüÕI€×ðĔĀçó@ ,H„«³°ûĂóćùĂâ _ÑÒêĉčüčùëĒí ¨¢¨ ØÞá‹|ĕ` Õ ĐòóüĐĔñČĒ £¹Âƹ óúĐé ¿¸JăĔóÙ ‹|Iz Bi Y9ف? ©ª±ŽŸš ûĀãìüóöĒ÷Ĕý Ø×ÓÔÎáÝÙ ĐĒ÷ĎĒí ¢Ã¼¹Å¹Ânj½Áµ»¹ÆŽ—˜ §Ì¶Å½¸ŒÁø¹À½Â»ŽŸœ !(wØÞá8 ĖäĀåĒêĉčĊûďė .5êĉčĊûďÚÙA3 vU )úìóøċćùÍ ĊûďăĔóýóúĐé ¨ÀÀÈÁ½ÂµÇ½ÃŒ·Ã¹ŽŸž čĒþĔüˆĕ<US VF` Ùk4= &vU` ‘«³°ĘUĄä Ïß.Zâxf ®¼ÃÇÃÇÃÈŽÆÁŽ–œ ĚP 0ˆzĊûďÞá 'ƒûĔöÙĐĒ÷ĎĒí « ³’Œ² ³ÚÙ:_ Ž—˜’—™ ½Á˜»ÄƎ–ž Ė` ėNiÕ n/lC+ ¡È¸À¹ÅŽ–œ «ÃÂ𪠫Ž–™ cZbĚ£ćùĂs[ ĖêĉčĆĔôĕY9ė ¬Ã“Ž»½¸Œ°º«ŒŽ–ž ¯¯° ¢Ž–ž ¯ ¬° ¢Žž— ¯ÃÀÀ½Â»Œ°¼ÈÇǹŠ¢µÁ¹ÅµŒ®Åþ¹·Ç½ÃŽ–› #dpf ĝUÕ#dpf #Lpf ¢«­°êĉč-: Tuesday, May 20, 14 Πϝʔ ジ ϯ グ  ɹೝࣝɹ ࡾ࣍ݩγʔϯͷ෮ݩ ਡ๚ਖ਼थʢΦϜϩϯ 049ʣ ౻٢߂࿱ʢத෦େֶʣ ௗډळ඙ʢ౦ژ޻ۀେֶʣ ը૾ηϯγϯάͱͯ͠ͷೖྗܥΛޫͷ࣋ͭ ༷ʑͳಛੑ͔Β၆ᛌ ಛ௃఺ɾهड़ɼಛ௃ྔɼࣝผثͷมભΛ·ͱΊɼ ͦͷಈ޲Λ঺հ δΦϝτϦɼϚονϯάɼαʔϑΣεੜ੒ͷมભΛ ·ͱΊɼͦͷಈ޲Λ঺հ ೔Ӝ৻࡞ʢฌݿݝཱେʣ
  3. 1 SSII2014「イメージング」 技術マップの振り返り 1.イメージング技術分類の3本柱(幾何学情報取得/光情報取得/基本性能向上)について l ToF, LiDARや偏光カメラ等の認知・利⽤の広がり →ToF/LiDARは、完全に産業化・社会実装フェーズに移⾏(ADAS, ⾃動⾛⾏、スマホ搭載等が牽引) →とくにADAS向けLiDARにおいては光効率を担保しつつさらなるローコスト化を進めるにあたり、従来のメカニカルスキャン型に代わる

    ソリッドステート型への市場期待は2014年以降依然として⾼いものの普及には⾄らず l 幾何学情報・光情報取得で活⽤される投光制御デバイス(プロジェクタ)の普及が急加速 →当時はDLP(DMD)⽅式の孤軍奮闘を想定していたが、LCOS(反射型液晶)⽅式の映画館やシアター採⽤が起爆剤となり、 ファクトリーオートメーション⽤途(検査装置や計測センサなど)への展開・普及が進む l 基本性能向上における「撮像素⼦の⾼解像度化にともなうリソース再配分」については、ライトフィールドカメラなど特化型製品としての 市場形成には⾄らず 2.新動向 l 超⾼分解能の時刻情報(SPAD, イベントカメラ)を距離計測以外に応⽤する研究の進展 →NLOS(Non-line-of-sight︓死⾓を観測する技術) →ToFヒストグラムやイベントフレームなどのフレーム時間内積分を⾏わない処理法の追求 l イメージング対象の拡⼤ →⾳場のイメージング(偏光カメラや⾼速カメラの活⽤)や触覚のイメージング →触覚のイメージングについては視触覚(Visuotactile)センサのロボットハンド制御での認知・利⽤が拡⼤ 44**ٕज़Ϛοϓ  ʮΠϝʔδϯάʯͷৼΓฦΓ  ਡ๚ਖ਼थʢΦϜϩϯ 049ʣ ೔Ӝ৻࡞ʢฌݿݝཱେʣ CONFIDENTIAL EXTENDED ABSTRACT. DO NOT DISTRIBUTE ANYWHERE. ୈ 26 ճ ը૾ͷೝࣝɾཧղγϯϙδ΢Ϝ NLOS-NeuS: ඇࢹઢํ޲ࡱӨʹ͓͚Δ χϡʔϥϧӄؔ਺ද໘ ౻ଜ ༑و1,a) ᷱా و߂1,b) ᢠ෌ ୎࠸1,c) ޲઒ ߁ത1,d) ֓ཁ ඇࢹઢํ޲ࡱӨʢNon-line-of-sight (NLOS) imagingʣͱ ͸ɼΧϝϥ͔Βݟ͑ͳ͍γʔϯΛ஌֮͢Δٕज़Ͱ͋Δɽຊ ݚڀͰ͸ߴ࣌ؒ෼ղσʔλΛೖྗͱ͠ɼ χϡʔϥϧӄؔ਺ද ໘ʹΑΔ NLOS γʔϯͷ 3 ࣍ݩܗঢ়෮ݩख๏ΛఏҊ͢Δɽ χϡʔϥϧӄؔ਺ද໘ͱͯ͠ූ߸෇͖ڑ཭ؔ਺ʢSigned distance function (SDF)ʣΛଟ૚ύʔηϓτϩϯͰදݱ͠ɼ NLOS γʔϯʹ͓͍ͯ SDF Λਖ਼ֶ͘͠श͢ΔͨΊͷ੍໿ Λಋೖ͢ΔɽϘΫηϧදݱΛ༻͍ͨैདྷख๏ʹൺ΂ͯɼৄ ࡉ͔ͭ׈Β͔ͳܗঢ়෮ݩ͕ՄೳͰ͋Δ͜ͱΛࣔ͢ɽ 1. ͸͡Ίʹ ۙ೥ίϯϐϡʔλϏδϣϯͷ෼໺Ͱ͸ɼ ଟ͘ͷΞϓϦέʔ Relay wall NLOS scene Collocated light source and detector (a) NLOS setup f photons /POMJOFPGTJHIU /-04 JNBHJOH
  4. ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ 2 ʮೝࣝʯͰऔΓѻ͏ٕज़ྖҬ ಛ ௃ ந ग़ ύ λ ồ

    ϯ Ϛ ỽ ν ϯ ά ಛ ௃ ఺ ݕ ग़ ɾ ه ड़ ౷ ܭ త ֶ श ๏ ࠷ ۙ ๣ ୳ ࡧ ೥୅ SVM(95) Ϛʔδϯ࠷େԽ SIFT(99) εέʔϧෆม ಛ௃఺ݕग़ɾهड़ SURF(06) ੵ෼ը૾ ΞϧΰϦζϜʹΑΔߴ଎Խ GPU SIFT(06) ϋʔυ΢ΣΞʹΑΔߴ଎Խ FAST(06) ػցֶश ίʔφʔݕग़ BRIEF(10) ֶशແ͠ ϥϯμϜαϯϓϦϯά ORB(11) ڭࢣແֶ͠श D-BRIEF(12) ڭࢣ͋Γֶश Bin-Boost(13) ڭࢣ͋Γֶश Random Forests(01) ΞϯαϯϒϧֶशʴϥϯμϜֶश DPM(08) Ϟσϧͷ෼ׂ latent SVM ʹΑΔࣝผ Exemplar SVM(11) ࣄྫϕʔεͷ SVM WTA Hashing(11) ௒ߴ଎Խ HOG(05) ޯ഑৘ใ HLAC(88) ߴ࣍ͷࣗݾ૬ؔ CHLAC(04) HLAC ʹ࣌ؒ࣠ͷ௥Ճ Haar-like(01) box ϑΟϧλ DOT(10) ޯ഑৘ใͷςϯϓϨʔτϚονϯά VLAD(10) ؔ࿈͢Δ VW ͷಛ௃ྔΛ࢖༻ Fisher Vector(07) ֬཰ີ౓ؔ਺ʹΑΔಛ௃ྔͷදݱ Deep Learning(08) ଟ૚χϡʔϥϧωοτϫʔΫ දݱֶश ௒ଟΫϥεࣝผ໰୊  ສΧςΰϦ Crowdsourcing(13) ਓͷ஌ݟͷಋೖ ৄࡉը૾ࣝผ ERT(06) RF ͷϥϯμϜੑΛ࠷େԽ Fern(06) RF ͷ෼ذ৚݅Λ֊૚Ͱ౷Ұ texton(01) ϑΟϧλͷόϯΫ ਓମύʔπࣝผ ߴ଎Խ ߴਫ਼౓Խ ΞϓϦέʔγϣϯ 2000 2005 2010 ෺ମݕग़ ଟΫϥε ࣗݾҐஔਪఆ ϚγϯϏδϣϯ ը૾ݕࡧ إݕग़ ਓݕग़ ಛఆ෺ମೝࣝ ը૾෼ྨ ಛ௃ྔͷࣗಈੜ੒ ਓ͕஫໨ͨ͠Ґஔ͔Βಛ௃நग़ ϚʔΧೝࣝ ೋ஋ಛ௃ ૿෼ූ߸૬ؔ (00) ً౓ͷ૿ݮΛೋ஋Ͱը૾Խ RRF(03)  ํ޲ͷೱ౓มԽ ૄςϯϓϨʔτϚονϯά (05)  छྨͷϞσϧͷ࢖͍෼͚ ݻ༗෼ղςϯϓϨʔτϚονϯά (11) ճసมԽʹؤ݈ͳ৘ใΛར༻ Co-Occurrence Template Matching(10) ݦஶੑͷߴ͍ըૉͰর߹ Harris-Affine(02) ΞϑΟϯෆมಛ௃఺ݕग़ MSER(02) ߴ଎ͳΞϑΟϯෆม఺ಛ௃ Object Bank(10) ଟΫϥεͷཁૉΛಛ௃ྔԽ Relative attribute(11) ࣮਺ʹΑΔؔ࿈ཁૉͷදݱ zero-shot transfer(09) ؔ࿈ཁૉ͔ΒඇֶशΫϥεͷݕग़ ηϚϯςΟοΫηάϝϯςʔγϣϯ CoHOG(09) )0( ͷڞىදݱ MLP(86) ଟ૚ύʔηϒτϩϯ Online PA(06) ೖྗαϯϓϧʹԠͯ͡ॏΈϕΫτϧߋ৽ εύʔεಛ௃ྔ (06) Haar-like ϐΫηϧࠩ෼ άϥεϚϯଟ༷ମ (08) ઢܗ෦෼ۭؒͷू߹ମ product quantization(11) αϒϕΫτϧʹΑΔྔࢠԽ εϖΫτϧཧ࿦ʹΑΔεέʔϧ୳ࡧ (12) ಛ௃ۭؒͷࣹӨ DAISY(08) هड़ۭؒͷվྑ BOF(04) ಛ௃ྔͷࣙॻԽ ௚ަ੍໿૬ޓۭؒ๏ (06) ௚ަߦྻʹΑΔۭؒͷؔ܎Λ௚ߦԽ ΧʔωϧτϦοΫ (00) ಛ௃ۭؒͷࣹӨ ݻ༗ۭؒ๏ (96)  ࣍ݩը૾ʹΑΔ  ࣍ݩ෺ମೝࣝ ૬ޓ෦෼ۭؒ (85) ෦෼ۭؒಉ࢜ͷਖ਼४֯ LBP(94) ہॴྖҬͷೋ஋Խ CARD(11) ಛ௃ྔΛ  ஋Խ Decision Jungles(13) ύεڞ༗ʹΑΔলϝϞϦͳܾఆ໦ ੍໿૬ޓ෦෼ۭؒ๏ (99) ࣝผʹ༗ޮͳۭؒ΁ͷࣹӨ CNN(89) ϓʔϦϯάͱ৞ΈࠐΈ ʹΑΔಛ௃நग़ AdaBoost(95) Ξϯαϯϒϧֶश αϯϓϧॏΈͷஞ࣍ߋ৽ w ಛ௃఺ݕग़ɾهड़ͷಈ޲ 44**ٕज़Ϛοϓ  ʮೝࣝʯͷৼΓฦΓ  ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ 1 ʮೝࣝʯͰऔΓѻ͏ٕज़ྖҬ ಛ௃఺ݕग़ɾهड़ͷಈ޲ ಛ ௃ ந ग़ ύ λ ồ ϯ Ϛ ỽ ν ϯ ά ಛ ௃ ఺ ݕ ग़ ɾ ه ड़ ౷ ܭ త ֶ श ๏ ࠷ ۙ ๣ ୳ ࡧ ೥୅ SIFT(99) εέʔϧෆม ಛ௃఺ݕग़ɾهड़ SURF(06) ੵ෼ը૾ ΞϧΰϦζϜʹΑΔߴ଎Խ GPU SIFT(06) ϋʔυ΢ΣΞʹΑΔߴ଎Խ FAST(06) ػցֶश ίʔφʔݕग़ BRIEF(10) ֶशແ͠ ϥϯμϜαϯϓϦϯά ORB(11) ڭࢣແֶ͠श D-BRIEF(12) ڭࢣ͋Γֶश Bin-Boost(13) ڭࢣ͋Γֶश 2000 2005 2010 ը૾ݕࡧ Harris-Affine(02) ΞϑΟϯෆมಛ௃఺ݕग़ MSER(02) ߴ଎ͳΞϑΟϯෆม఺ಛ௃ εϖΫτϧཧ࿦ʹΑΔεέʔϧ୳ࡧ (12) ಛ௃ۭؒͷࣹӨ DAISY(08) هड़ۭؒͷվྑ CARD(11) ಛ௃ྔΛ  ஋Խ DOT(10) ޯ഑৘ใͷςϯϓϨʔτϚονϯά ೋ஋ಛ௃ ૿෼ූ߸૬ؔ (00) ً౓ͷ૿ݮΛೋ஋Ͱը૾Խ RRF(03)  ํ޲ͷೱ౓มԽ ૄςϯϓϨʔτϚονϯά (05)  छྨͷϞσϧͷ࢖͍෼͚ HOG(05) ޯ഑৘ใ HLAC(88) ߴ࣍ͷࣗݾ૬ؔ CHLAC(04) HLAC ʹ࣌ؒ࣠ͷ௥Ճ Haar-like(01) box ϑΟϧλ VLAD(10) ؔ࿈͢Δ VW ͷಛ௃ྔΛ࢖༻ Fisher Vector(07) ֬཰ີ౓ؔ਺ʹΑΔಛ௃ྔͷදݱ Crowdsourcing(13) ਓͷ஌ݟͷಋೖ ৄࡉը૾ࣝผ texton(01) ϑΟϧλͷόϯΫ ϚγϯϏδϣϯ إݕग़ ਓݕग़ ಛఆ෺ମೝࣝ ը૾෼ྨ ಛ௃ྔͷࣗಈੜ੒ ਓ͕஫໨ͨ͠Ґஔ͔Βಛ௃நग़ ϚʔΧೝࣝ ݻ༗෼ղςϯϓϨʔτϚονϯά (11) ճసมԽʹؤ݈ͳ৘ใΛར༻ Co-Occurrence Template Matching(10) ݦஶੑͷߴ͍ըૉͰর߹ Object Bank(10) ଟΫϥεͷཁૉΛಛ௃ྔԽ CoHOG(09) )0( ͷڞىදݱ εύʔεಛ௃ྔ (06) Haar-like ϐΫηϧࠩ෼ BOF(04) ಛ௃ྔͷࣙॻԽ ௚ަ੍໿૬ޓۭؒ๏ (06) ௚ަߦྻʹΑΔۭؒͷؔ܎Λ௚ߦԽ ΧʔωϧτϦοΫ (00) ಛ௃ۭؒͷࣹӨ ݻ༗ۭؒ๏ (96)  ࣍ݩը૾ʹΑΔ  ࣍ݩ෺ମೝࣝ ૬ޓ෦෼ۭؒ (85) ෦෼ۭؒಉ࢜ͷਖ਼४֯ LBP(94) ہॴྖҬͷೋ஋Խ ੍໿૬ޓ෦෼ۭؒ๏ (99) ࣝผʹ༗ޮͳۭؒ΁ͷࣹӨ SVM(95) Ϛʔδϯ࠷େԽ Random Forests(01) ΞϯαϯϒϧֶशʴϥϯμϜֶश DPM(08) Ϟσϧͷ෼ׂ latent SVM ʹΑΔࣝผ Exemplar SVM(11) ࣄྫϕʔεͷ SVM WTA Hashing(11) ௒ߴ଎Խ Deep Learning(08) ଟ૚χϡʔϥϧωοτϫʔΫ දݱֶश ௒ଟΫϥεࣝผ໰୊  ສΧςΰϦ ERT(06) RF ͷϥϯμϜੑΛ࠷େԽ Fern(06) RF ͷ෼ذ৚݅Λ֊૚Ͱ౷Ұ ਓମύʔπࣝผ ෺ମݕग़ ଟΫϥε ࣗݾҐஔਪఆ Relative attribute(11) ࣮਺ʹΑΔؔ࿈ཁૉͷදݱ zero-shot transfer(09) ؔ࿈ཁૉ͔ΒඇֶशΫϥεͷݕग़ ηϚϯςΟοΫηάϝϯςʔγϣϯ MLP(86) ଟ૚ύʔηϒτϩϯ Online PA(06) ೖྗαϯϓϧʹԠͯ͡ॏΈϕΫτϧߋ৽ άϥεϚϯଟ༷ମ (08) ઢܗ෦෼ۭؒͷू߹ମ product quantization(11) αϒϕΫτϧʹΑΔྔࢠԽ Decision Jungles(13) ύεڞ༗ʹΑΔলϝϞϦͳܾఆ໦ CNN(89) ϓʔϦϯάͱ৞ΈࠐΈ ʹΑΔಛ௃நग़ AdaBoost(95) Ξϯαϯϒϧֶश αϯϓϧॏΈͷஞ࣍ߋ৽ 4*'5ͷߴ଎Խ ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ 7 ಛ௃఺ݕग़ͷߴ଎Խ εέʔϧεϖʔεͷߴ଎Խ ܾఆ໦ʹΑΔߴ଎Խ ɾ463'   ੵ෼ը૾Λ༻͍ͨۙࣅϔοηߦྻʹΑΔߴ଎ͳΩʔϙΠϯτݕग़ ɾ'"45   ػցֶशʢܾఆ໦ʣΛಋೖͯ͠ίʔφʔݕग़Λߴ଎Խ ɾεϖΫτϧཧ࿦   εϖΫτϧཧ࿦ʹΑΔεέʔϧ୳ࡧͷߴ଎Խͱߴਫ਼౓Խ ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ 9 ಛ௃఺هड़ͷߴ଎Խ ̎஋ಛ௃ྔͷಋೖ ɾ#3*&'  ɼ03#  $"3%   ڑ཭ܭࢉΛߟྀͨ͠2஋ʹΑΔಛ௃هड़ ɾ%#3*&'  #JO#PPTU   ڭࢣ͋ΓֶशʹΑΔ࠷దͳ2஋ύλʔϯͷ֫ಘ ϙΠϯτ ɹ—ಛ௃ྉΛ஋ʹ͢Δ͜ͱͰڑ཭ܭࢉʢϋϛϯάڑ཭ʣΛߴ଎Խɺ44&ͷར༻ ɹ—লϝϞϦԽ΋ಉ࣌ʹ࣮ݱ ϙδςΟϒαϯϓϧ ωΨςΟϒαϯϓϧ 03#ͷࢀরϖΞ %#3*&'ʹ͓͚Δڭࢣ͋Γֶश OS2Ͱؔ࿈ߨԋ͋Γʂ ʢ෺ମೝࣝͷ࠷લઢʣ ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ w %JTUJODUJWFJNBHFGFBUVSFTGSPNTDBMFJOWBSJBOULFZQPJOUT 4*'5 <-PXF> r εέʔϧɾճసʹෆมͳಛ௃఺ݕग़ɾهड़ 6 ಛ௃఺ݕग़ɾهड़ ϙΠϯτ ɹ—%0( %JGGFSFODFPG(BVTTJBO ʹΑΔΩʔϙΠϯτݕग़ ɹ—ޯ഑ํ޲ώετάϥϜʹΑΔಛ௃هड़ %P(ը૾ ฏ׈Խը૾ € σ0 € kσ0 € k2σ0 εέʔϧ € k3σ0 k⒢М⒨ - - - - ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ w %JTUJODUJWFJNBHFGFBUVSFTGSPNTDBMFJOWBSJBOULFZQPJOUT 4*'5 <-PXF> r εέʔϧɾճసʹෆมͳಛ௃఺ݕग़ɾهड़ 8 ಛ௃఺ݕग़ɾهड़ ϙΠϯτ ɹ—%0( %JGGFSFODFPG(BVTTJBO ʹΑΔΩʔϙΠϯτݕग़ ɹ—ޯ഑ํ޲ώετάϥϜʹΑΔಛ௃هड़ ෼ׂ ෼ׂ ํ޲ Ψ΢ε૭
  5. w ہॴಛ௃ྔɾ౷ܭతֶश๏ͷಈ޲ ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ 11 ಛ௃நग़ͱ౷ܭతֶश๏ ہॴը૾ಛ௃ྔͱ౷ܭతֶश๏ ɾإݕग़ˠ)BBSMJLF  εύʔεಛ௃ 

    ʴ"EB#PPTU  ɾาߦऀݕग़ˠ)0(  47.  ϙΠϯτ ɹ—໰୊ઃఆʹ߹Θͤͯಛ௃ྔ )BOEDSBGUFEGFBUVSF Λઃܭ ɹ—Ϋϥε໰୊͔ΒଟΫϥε໰୊΁ ɾਓମύʔπࣝผˠ3BOEPN'PSFTU  ɾը૾෼ྨˠ4*'5  #0'  ʴ47.  ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ w "%JTDSJNJOBUJWFMZ5SBJOFE .VMUJTDBMF %FGPSNBCMF1BSU.PEFM<'FM[FOT[XBMC> r -BUFOU47.Λ༻͍ͨύʔπϕʔεͷ෺ମݕग़ 12 %1.ɿύʔπϕʔεͷ෺ମݕग़ ϙΠϯτ ɹ—෺ମΛύʔπͷू߹ͱͯ͠දݱ %FGPSNBCMF1BSUT.PEFM  ɹ—ύʔπͷҐஔؔ܎Λߟྀ͢Δ͜ͱͰ࢟੎มಈʹରԠ ϧʔτϑΟϧλ ύʔπϑΟϧλ ύʔπϑΟϧλͷ Ґஔؔ܎ ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ w 'BTU "DDVSBUF%FUFDUJPOPG 0CKFDU$MBTTFTPOB4JOHMF.BDIJOF<%FBO> r ສछྨͷ෺ମΛඵҎԼͰݕग़ 13 όΠφϦίʔυΛ༻͍ͨ)BTIʹΑΔສछྨͷ෺ମݕग़ Locality-sensitive Hashing with WTA WTA codeをP個に分割 P個のコードそれぞれの Hashテーブルを参照 クラス毎の スコアヒストグラムを作成 各クラスのフィルタ応 答マップを作る HOG特徴量 111101010011 WAT code ϙΠϯτ ɹ—ଟΫϥε%1.ͷߴ଎Խ ɹ—ύʔπͷू߹ʹରͯ͠ɺ85")BTIΛར༻ͯ͠௒ଟΫϥεͷݕग़Λ࣮ݱ ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ 14 όΠφϦίʔυΛ༻͍ͨ)BTIʹΑΔສछྨͷ෺ମݕग़ w 'BTU "DDVSBUF%FUFDUJPOPG 0CKFDU$MBTTFTPOB4JOHMF.BDIJOF<%FBO> r ສछྨͷ෺ମΛඵҎԼͰݕग़ ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ 2 ʮೝࣝʯͰऔΓѻ͏ٕज़ྖҬ ηϚϯςΟοΫηάϝϯςʔγϣϯ RRF(03)  ํ޲ͷೱ౓มԽ ૄςϯϓϨʔτϚονϯά (05)  छྨͷϞσϧͷ࢖͍෼͚ HLAC(88) ߴ࣍ͷࣗݾ૬ؔ ૿෼ූ߸૬ؔ (00) ً౓ͷ૿ݮΛೋ஋Ͱը૾Խ ΧʔωϧτϦοΫ (00) ಛ௃ۭؒͷࣹӨ ݻ༗ۭؒ๏ (96)  ࣍ݩը૾ʹΑΔ  ࣍ݩ෺ମೝࣝ ૬ޓ෦෼ۭؒ (85) ෦෼ۭؒಉ࢜ͷਖ਼४֯ LBP(94) ہॴྖҬͷೋ஋Խ SIFT(99) εέʔϧෆม ಛ௃఺ݕग़ɾهड़ SURF(06) ੵ෼ը૾ ΞϧΰϦζϜʹΑΔߴ଎Խ GPU SIFT(06) ϋʔυ΢ΣΞʹΑΔߴ଎Խ FAST(06) ػցֶश ίʔφʔݕग़ BRIEF(10) ֶशແ͠ ϥϯμϜαϯϓϦϯά ORB(11) ڭࢣແֶ͠श D-BRIEF(12) ڭࢣ͋Γֶश Bin-Boost(13) ڭࢣ͋Γֶश ը૾ݕࡧ Harris-Affine(02) ΞϑΟϯෆมಛ௃఺ݕग़ MSER(02) ߴ଎ͳΞϑΟϯෆม఺ಛ௃ εϖΫτϧཧ࿦ʹΑΔεέʔϧ୳ࡧ (12) ಛ௃ۭؒͷࣹӨ DAISY(08) هड़ۭؒͷվྑ DOT(10) ޯ഑৘ใͷςϯϓϨʔτϚονϯά VLAD(10) ؔ࿈͢Δ VW ͷಛ௃ྔΛ࢖༻ Fisher Vector(07) ֬཰ີ౓ؔ਺ʹΑΔಛ௃ྔͷදݱ Crowdsourcing(13) ਓͷ஌ݟͷಋೖ ৄࡉը૾ࣝผ ϚγϯϏδϣϯ ಛఆ෺ମೝࣝ ಛ௃ྔͷࣗಈੜ੒ ਓ͕஫໨ͨ͠Ґஔ͔Βಛ௃நग़ ϚʔΧೝࣝ ೋ஋ಛ௃ ݻ༗෼ղςϯϓϨʔτϚονϯά (11) ճసมԽʹؤ݈ͳ৘ใΛར༻ Co-Occurrence Template Matching(10) ݦஶੑͷߴ͍ըૉͰর߹ Object Bank(10) ଟΫϥεͷཁૉΛಛ௃ྔԽ CoHOG(09) )0( ͷڞىදݱ Online PA(06) ೖྗαϯϓϧʹԠͯ͡ॏΈϕΫτϧߋ৽ Exemplar SVM(11) ࣄྫϕʔεͷ SVM Deep Learning(08) ଟ૚χϡʔϥϧωοτϫʔΫ දݱֶश ෺ମݕग़ ଟΫϥε ࣗݾҐஔਪఆ Relative attribute(11) ࣮਺ʹΑΔؔ࿈ཁૉͷදݱ zero-shot transfer(09) ؔ࿈ཁૉ͔ΒඇֶशΫϥεͷݕग़ άϥεϚϯଟ༷ମ (08) ઢܗ෦෼ۭؒͷू߹ମ product quantization(11) αϒϕΫτϧʹΑΔྔࢠԽ WTA Hashing(11) ௒ߴ଎Խ ௒ଟΫϥεࣝผ໰୊  ສΧςΰϦ ERT(06) RF ͷϥϯμϜੑΛ࠷େԽ Fern(06) RF ͷ෼ذ৚݅Λ֊૚Ͱ౷Ұ Decision Jungles(13) ύεڞ༗ʹΑΔলϝϞϦͳܾఆ໦ MLP(86) ଟ૚ύʔηϒτϩϯ CNN(89) ϓʔϦϯάͱ৞ΈࠐΈ ʹΑΔಛ௃நग़ ௚ަ੍໿૬ޓۭؒ๏ (06) ௚ަߦྻʹΑΔۭؒͷؔ܎Λ௚ߦԽ ੍໿૬ޓ෦෼ۭؒ๏ (99) ࣝผʹ༗ޮͳۭؒ΁ͷࣹӨ ಛ ௃ ந ग़ ύ λ ồ ϯ Ϛ ỽ ν ϯ ά ಛ ௃ ఺ ݕ ग़ ɾ ه ड़ ౷ ܭ త ֶ श ๏ ࠷ ۙ ๣ ୳ ࡧ ೥୅ SVM(95) Ϛʔδϯ࠷େԽ Random Forests(01) ΞϯαϯϒϧֶशʴϥϯμϜֶश DPM(08) Ϟσϧͷ෼ׂ latent SVM ʹΑΔࣝผ HOG(05) ޯ഑৘ใ Haar-like(01) box ϑΟϧλ texton(01) ϑΟϧλͷόϯΫ ਓମύʔπࣝผ 2000 2005 2010 إݕग़ ਓݕग़ ը૾෼ྨ εύʔεಛ௃ྔ (06) Haar-like ϐΫηϧࠩ෼ BOF(04) ಛ௃ྔͷࣙॻԽ CHLAC(04) HLAC ʹ࣌ؒ࣠ͷ௥Ճ CARD(11) ಛ௃ྔΛ  ஋Խ AdaBoost(95) Ξϯαϯϒϧֶश αϯϓϧॏΈͷஞ࣍ߋ৽ ہॴಛ௃ྔɾ౷ܭతֶश๏ͷಈ޲ 44**ٕज़Ϛοϓ  ʮೝࣝʯͷৼΓฦΓ  ϋϯυΫϥϑτಛ௃ͷਐԽ
  6. ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ 16 %FFQ-FBSOJOHʹΑΔಛ௃நग़ͱࣝผثͷࣗಈ֫ಘ ϙΠϯτ ɹ—৞ΈࠐΈχϡʔϥϧωοτϫʔΫͷֶशʹεύʔείʔσΟϯάΛར༻ ɹ—֤֊૚ͷग़ྗΛ͢΂ͯ౷߹͢Δํ๏ͰϩʔΧϧˍάϩʔόϧͳಛ௃Λநग़ ಛ௃நग़෦ ࣝผ෦ w 1FEFTUSJBO%FUFDUJPOXJUI6OTVQFSWJTFE.VMUJ4UBHF'FBUVSF-FBSOJOH<4FSNBOFU>

    r ৞ΈࠐΈ//Λ༻͍ͯਓݕग़ͷੑೳΛେ෯ʹ޲্ ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ w 1FEFTUSJBO%FUFDUJPOXJUI6OTVQFSWJTFE.VMUJ4UBHF'FBUVSF-FBSOJOH<4FSNBOFU> r ৞ΈࠐΈ//Λ༻͍ͯਓݕग़ͷੑೳΛେ෯ʹ޲্  %FFQ-FBSOJOHʹΑΔಛ௃நग़ͱࣝผثͷࣗಈ֫ಘ ϙΠϯτ ɹ—৞ΈࠐΈχϡʔϥϧωοτϫʔΫͷֶशʹεύʔείʔσΟϯάΛར༻ ɹ—֤֊૚ͷग़ྗΛ͢΂ͯ౷߹͢Δํ๏ͰϩʔΧϧˍάϩʔόϧͳಛ௃Λநग़ ৞ΈࠐΈ૚ͷϑΟϧλྫ ʢ*/3*"σʔληοτɼϑΟϧλαΠζɿYʣ ݕग़ੑೳ ˠಛ௃நग़աఔͷࣗಈ֫ಘ OS2Ͱؔ࿈ߨԋ͋Γʂ ʢ෺ମೝࣝͷ࠷લઢʣ ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ w 'JOF(SBJOFE$SPXETPVSDJOHGPS'JOF(SBJOFE3FDPHOJUJPO<%FOH> r ਓ͕஫໨ͨ͠ྖҬ͔Βಛ௃ྔΛهड़  ਓͷ஌ݟΛར༻ͨ͠ಛ௃நग़ $SPXETPVSDJOHʹΑΓ ଟ͘ͷܦݧΛ֫ಘ ߴείΞ࣌ͷબ୒ྖҬ͔Βಛ௃நग़ Χϥʔը૾Խ͢Δ໘ੵ͕ খ͍͞΄ͲߴείΞ ࣝผʹ༰қͳྖҬ͕ બ୒͞Ε͍ͯΔ ήʔϜͰߴείΞ  ϙΠϯτ ɹ—ਓ͕ࣝผʹ༰қͳྖҬΛબ୒͢Δ͜ͱͰࡉ͔ͳҧ͍Λࣝผ ɹ—$SPXETPVSDJOHΛར༻ͯ͠େྔͷܦݧσʔλΛ֫ಘ͢Δ ήʔϜܗࣜͰਖ਼ޡ൑ఆʹ࢖༻ͨ͠ྖҬΛબ୒ ΫϦοΫͨ͠࠲ඪपล͕ ϒϥʔը૾͔ΒΧϥʔը૾΁ OS2Ͱؔ࿈ߨԋ͋Γʂ ʢ෺ମೝࣝͷ࠷લઢʣ w ۙ೥ ೥࣌ʣͷಈ޲ ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ 3 ʮೝࣝʯͰऔΓѻ͏ٕज़ྖҬ ಛ ௃ ந ग़ ύ λ ồ ϯ Ϛ ỽ ν ϯ ά ಛ ௃ ఺ ݕ ग़ ɾ ه ड़ ౷ ܭ త ֶ श ๏ ࠷ ۙ ๣ ୳ ࡧ ೥୅ HOG(05) ޯ഑৘ใ Deep Learning(08) ଟ૚χϡʔϥϧωοτϫʔΫ දݱֶश Crowdsourcing(13) ਓͷ஌ݟͷಋೖ ৄࡉը૾ࣝผ 2000 2005 2010 ਓݕग़ ը૾෼ྨ ਓ͕஫໨ͨ͠Ґஔ͔Βಛ௃நग़ ϚʔΧೝࣝ BOF(04) ಛ௃ྔͷࣙॻԽ ௚ަ੍໿૬ޓۭؒ๏ (06) ௚ަߦྻʹΑΔۭؒͷؔ܎Λ௚ߦԽ VLAD(10) ؔ࿈͢Δ VW ͷಛ௃ྔΛ࢖༻ Fisher Vector(07) ֬཰ີ౓ؔ਺ʹΑΔಛ௃ྔͷදݱ ಛఆ෺ମೝࣝ Object Bank(10) ଟΫϥεͷཁૉΛಛ௃ྔԽ CoHOG(09) )0( ͷڞىදݱ εύʔεಛ௃ྔ (06) Haar-like ϐΫηϧࠩ෼ ੍໿૬ޓ෦෼ۭؒ๏ (99) ࣝผʹ༗ޮͳۭؒ΁ͷࣹӨ Haar-like(01) box ϑΟϧλ إݕग़ ΧʔωϧτϦοΫ (00) ಛ௃ۭؒͷࣹӨ ݻ༗ۭؒ๏ (96)  ࣍ݩը૾ʹΑΔ  ࣍ݩ෺ମೝࣝ ૬ޓ෦෼ۭؒ (85) ෦෼ۭؒಉ࢜ͷਖ਼४֯ SIFT(99) εέʔϧෆม ಛ௃఺ݕग़ɾهड़ SURF(06) ੵ෼ը૾ ΞϧΰϦζϜʹΑΔߴ଎Խ GPU SIFT(06) ϋʔυ΢ΣΞʹΑΔߴ଎Խ FAST(06) ػցֶश ίʔφʔݕग़ BRIEF(10) ֶशແ͠ ϥϯμϜαϯϓϦϯά ORB(11) ڭࢣແֶ͠श D-BRIEF(12) ڭࢣ͋Γֶश Bin-Boost(13) ڭࢣ͋Γֶश ը૾ݕࡧ Harris-Affine(02) ΞϑΟϯෆมಛ௃఺ݕग़ MSER(02) ߴ଎ͳΞϑΟϯෆม఺ಛ௃ εϖΫτϧཧ࿦ʹΑΔεέʔϧ୳ࡧ (12) ಛ௃ۭؒͷࣹӨ DAISY(08) هड़ۭؒͷվྑ HLAC(88) ߴ࣍ͷࣗݾ૬ؔ CHLAC(04) HLAC ʹ࣌ؒ࣠ͷ௥Ճ DOT(10) ޯ഑৘ใͷςϯϓϨʔτϚονϯά texton(01) ϑΟϧλͷόϯΫ ϚγϯϏδϣϯ ೋ஋ಛ௃ ૿෼ූ߸૬ؔ (00) ً౓ͷ૿ݮΛೋ஋Ͱը૾Խ RRF(03)  ํ޲ͷೱ౓มԽ ૄςϯϓϨʔτϚονϯά (05)  छྨͷϞσϧͷ࢖͍෼͚ ݻ༗෼ղςϯϓϨʔτϚονϯά (11) ճసมԽʹؤ݈ͳ৘ใΛར༻ Co-Occurrence Template Matching(10) ݦஶੑͷߴ͍ըૉͰর߹ LBP(94) ہॴྖҬͷೋ஋Խ CARD(11) ಛ௃ྔΛ  ஋Խ CNN(89) ϓʔϦϯάͱ৞ΈࠐΈ ʹΑΔಛ௃நग़ SVM(95) Ϛʔδϯ࠷େԽ Random Forests(01) ΞϯαϯϒϧֶशʴϥϯμϜֶश DPM(08) Ϟσϧͷ෼ׂ latent SVM ʹΑΔࣝผ Exemplar SVM(11) ࣄྫϕʔεͷ SVM WTA Hashing(11) ௒ߴ଎Խ ௒ଟΫϥεࣝผ໰୊  ສΧςΰϦ ERT(06) RF ͷϥϯμϜੑΛ࠷େԽ Fern(06) RF ͷ෼ذ৚݅Λ֊૚Ͱ౷Ұ ਓମύʔπࣝผ ෺ମݕग़ ଟΫϥε ࣗݾҐஔਪఆ Relative attribute(11) ࣮਺ʹΑΔؔ࿈ཁૉͷදݱ zero-shot transfer(09) ؔ࿈ཁૉ͔ΒඇֶशΫϥεͷݕग़ ηϚϯςΟοΫηάϝϯςʔγϣϯ MLP(86) ଟ૚ύʔηϒτϩϯ Online PA(06) ೖྗαϯϓϧʹԠͯ͡ॏΈϕΫτϧߋ৽ άϥεϚϯଟ༷ମ (08) ઢܗ෦෼ۭؒͷू߹ମ product quantization(11) αϒϕΫτϧʹΑΔྔࢠԽ Decision Jungles(13) ύεڞ༗ʹΑΔলϝϞϦͳܾఆ໦ AdaBoost(95) Ξϯαϯϒϧֶश αϯϓϧॏΈͷஞ࣍ߋ৽ ಛ௃ྔͷࣗಈੜ੒ ۙ೥ͷಈ޲ ਓͷ஌ݟͷಋೖ ಛ௃நग़ͷࣗಈԽ 44**ٕज़Ϛοϓ  ʮೝࣝʯͷৼΓฦΓ  $//ʹΑΔಛ௃நग़ͱਓͷ஌ݟͷར༻
  7. w ʮೝࣝʯͷ೥ޙʢ೥࣌ʣ 8(ʹΑΔ౰࣌ͷ༧ଌͱݕূ  ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ w ߴ଎ଟΫϥεࣝผ r 85")BTIJOHʹΑΔສΧςΰϦࣝผͷߴ଎Խ w

    ৄࡉը૾هड़ r ສΧςΰϦࣝผ ؔ࿈ཁૉʹΑΔֶश֎αϯϓϧͷϥϕϧಋग़ w [FSPTIPUMFBSOJOHʹΑΔֶश֎αϯϓϧ΁ͷదԠ r ؔ࿈ཁૉʹΑΔֶश֎αϯϓϧͷϥϕϧಋग़ సҠֶशɺܭྔֶश w େن໛إೝࣝ r %FFQ/FVSBM/FUXPSLͷߴ଎Խ w ਓͱػցͷϋΠϒϦουʹΑΔ׭ೳݕࠪɺܽؕݕग़ r ਓͷ஌ݟΛಋೖͨ͠ৄࡉը૾ࣝผ w ΩʔϙΠϯτʹ͓͚Δෆมੑͷ֫ಘ r εέʔϧ୳ࡧɺΞϑΟϯมԽ΁ͷରԠ 4 ʮೝࣝʯͷ೥ޙ IUUQTXXXTMJEFTIBSFOFUTMJEFTIPXTTJJ ˚ը૾෼ྨ͸ଟΫϥεԽɼϞσϧ͸େن໛Խ ߴ଎Խͱͯ͠ࢬמΓɼ஌ࣝৠཹɼྔࢠԽ͕஫໨ ˚ プ ϩτλΠ プ ʹج づ ͍ͯ෼ྨ͢Δख๏ が ఏҊ͞Εɼ গ਺γϣοτֶशʹ͓͍ͯޮՌΛൃش ˚$-*1ʹΑΓ ゼ ϩγϣοτ෼ྨ͸Մೳͱͳͬͨ が ɼ 7-Ϟ デ ϧ で ࣮ݱ͢Δ͜ͱ͸༧૝ で ͖ͳ͔ͬͨ ˚ ビジ ϣϯج൫Ϟ デ ϧ が ୆಄͠େن໛Խͨ͠ が ɼ ߴ଎Խͱ͍͏؍఺ で ͸· だ ൃల్த ̋%-Ϟ デ ϧʹ͓͚Δ)VNBOJOUIFMPPQ が ੝Μʹ ݚڀ͞Ε͍ͯΔ ̋&OEUPFOEֶशʹΑΔಛ௃఺ͷ੍໿ʹറΒΕͳ͍ %FUFDUJPOGSFFͳ%-ख๏ が ఏҊ ̋ɿ༧ଌ͕ਖ਼͔ͬͨ͠ɹ˚ɿҰ෦ਖ਼͔ͬͨ͠ɹºɿਖ਼͘͠ͳ͔ͬͨ ೥ݱࡏʢ౻٢ͷࢲݟʣ
  8. w ʮೝࣝʯͷ೥ޙʢ೥࣌ʣ ༗ࣝऀʹΑΔ༧ଌͱݕূ  ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ Deep Learningʹ୅ද͞ΕΔਂ͍֊૚తߏ଄ͷֶशɾࣝผख๏ͷཧ࿦తղੳͱղੳʹجͮ͘֊૚ߏ଄ͷઃܭ࿦ʹয఺͕͋ͨΓɼ͜ͷ݁ ՌΛ΋ͱʹweb্ʹଘࡏ͢Δ౷੍ͷͱΕͨҰൠత෺ମͷը૾Ͱ͋Ε͹΄΅ؒҧ͍ͳ͘ೝࣝՄೳͱͳΔɽͦͷҰํͰɼΑΓ࣮ੈքدΓ ͷࡶવͱͨ͠ঢ়گʹ͓͚Δೝࣝ΁஫ྗ͞Εɼ·ͨɼνϟϨϯδϯάͳ՝୊΁޿͕ΓΛݟͤΔɽ ◦ࠓޙ޿͕ΓΛݟͤΔͰ͋Ζ͏՝୊

    ɾϥΠϑϩά౳ͷಈը૾ͷཁ໿ɼࣗવݴޠ෼໺ͱͷ༥߹ ΢ΣΞϥϒϧػثͷਁಁͱڞʹɼ࣌ܥྻσʔλΛ༗ޮʹ׆༻͢Δಈը૾ͷཁ໿ٕज़͕ਐల͢Δɽ࣌ܥྻ৘ใΛ׆༻ͨ͠લޙͷจ຺ཧ ղʹΑΔೝࣝਫ਼౓ͷ޲্ͷΈͳΒͣɼਓͷײੑʹ߹கͨ͠ڵຯਂ͍γϣοτਪఆ΍ɼࣗવݴޠ෼໺ͰഓΘΕͨจ๏త஌ࣝମܥ౳͕Ϗ δϣϯٕज़ͱ༥߹ͯ͠ैདྷ೉໰ͱ͞Ε͍ͯͨಈը૾ཁ໿ͷ΁ͷࢳޱͱͳΔɽ ɾίϯςϯπੜ੒ɼάϥϑΟΫεܥ෼໺ͱͷ༥߹ ը૾ೝࣝͱ͸਺ඦສϐΫηϧͷ৘ใΛ1ͭͷΧςΰϦʹԡ͠ࠐΊΔڀۃͷ৘ใѹॖٕज़ͱ͍͑Δɽࠓޙ͸ɼάϥϑΟΫεܥ෼໺ͷ༥߹ ʹΑΓɼѹॖ͞Εͨ৘ใ͔Βٯʹ࣮ੈքͷ৘ใ΁෮ݩ͢Δίϯςϯπੜ੒ٕज़͕ਐల͢Δɽ͜ΕʹΑΓ௕͍จষ৘ใཧղͤͣͱ΋ਤ ΛҰຕݟΔ͜ͱʹΑͬͯॠ࣌ʹ಺༰ΛཧղՄೳͳ৘ใఏٕࣔज़΁ͷࢳޱͱͳΔɽ ɾϩϘοτϏδϣϯɼϩϘςΟΫεʢ੍ޚʣͱͷ༥߹ ౷੍ͷͱΕͨೝࣝର৅Λఏࣔ͢ΔͷͰ͋Ε͹े෼ߴ͍ࣝผੑೳ͕࣮ݱ͞ΕΔҰํͰɼೝࣝର৅Λ͋Β͔͡ΊఆΊͣɼࡶવͱͨ͠ը૾ ͕ೖྗঢ়گʹ͓͍ͯ͸ैདྷͷҰൠత෺ମೝࣝख๏ͷ࿮૊ΈͰ͸ೝࣝਫ਼౓͕ѱ͘࢖͑Δٕज़ͱͯ͠΄Ͳԕ͍ɽϩϘοτͷ࣋ͭ਎ମΛ׆ ༻͢Δ͜ͱͰɼೝࣝ͢΂͖ର৅Λൃݟ͢Δ஫ࢹػೳͷ׆༻ͱϩϘοτͷ੍ޚٕज़ͷ༥߹ʹΑΓɼ࣮ੈքʹ͓͚ΔਅͷҙຯͰͷೳಈత ೝࣝɼֶशػೳ͕ൃల͢Δɽ ɾϓϥΠόγΞ΢ΣΞͳը૾ೝٕࣝज़ͷྲྀΕ ࠓޙ΢ΣΞϥϒϧػثͷൃల͕ݟࠐ·Ε͍ͯΔ͕ɼը૾ηϯαΛར༻ͨ͠৔߹ʹϓϥΠόγΛ৵֐͢Δը૾͕ҙਤͤͣऔಘ͞Εɼweb Ͱڞ༗͞ΕΔةݥੑΛ͸ΒΈɼը૾ηϯαΛ࣋ͭ΢ΣΞϥϒϧػثͷ֦ॆͷ๦͛ͱͳΔɽ͜ͷഎܠͷ΋ͱը૾ʹөΔ෺ମͷछผ౳ͷ ίϯςϯπ͕े෼ʹཧղՄೳͰ͋Γͳ͕ΒɼϓϥΠόγ৘ใΛ͢΂ͯӅ΃͍͢Δٕज़͕ൃల͢Δɽ΢ΣΞϥϒϧγεςϜ౳Ͱ֫ಘͨ͠ ը૾ɼө૾ΛΫϥ΢υιʔγϯάͳͲΛར༻ͯ͠ɼϥϕϧ෇༩Λߦ͏͜ͱ΍ɼwebͰͷڞ༗ɼը૾Λݟͳ͕Βͷԕִૢ࡞ͳͲʹ͸ඞ ཁෆՄܽͳٕज़ͱͳΔɽߥ͘ɼࡶવͱͨ͠৘ใͰ΋Մೳͱ͢Δɽ 5 ೥ޙͷը૾ೝࣝͷτϨϯυɿݪాୡ໵ઌੜʢ౦େʣ ը૾ೝࣝ΁ͷظ଴ͱՄೳੑɹத෦େֶ޻ֶ෦৘ใ޻ֶՊɹ౻٢߂࿱ 6 ೥ޙͷը૾ೝࣝͷτϨϯυɿ1SPG5BFLZVO,JNʢ*NQFSJBM$PMMFHF-POEPOʣ w $PNCJOFEPG3'BOE%FFQMFBSOJOH r 3BOEPN'PSFTUͱ%FFQ-FBSOJOHͷ༥߹ r ྫɿ%FDJTJPO'PSFTU<4IPUUPO>  w -POHUFSNDPOUJOVPVTMFBSOJOH r OFWFSFOEJOHJNBHFMFBSOJOH r ऴΘΓͷͳ͍ը૾ֶशϑϨʔϜϫʔΫͷ࣮ݱ (a) (b) Figure 1: Motivation and notation. (a) An example use of a rooted decision DAG for classifying image patches as belonging to grass, cow or sheep classes. Using DAGs instead of trees reduces the number of nodes and can result in better generalization. For example, differently coloured patches of grass (yellow and green) are merged together into node 4, because of similar class statistics. This may encourage generalization by representing the fact that grass may appear as a mix of yellow and green. (b) Notation for a DAG, its nodes, features and branches. See text for details. input instance that reaches that node should progress through the left or right branch emanating from the node. Prediction in binary decision trees involves every input starting at the root and moving down as dictated by the split functions encountered at the split nodes. Prediction concludes when the instance reaches a leaf node, each of which contains a unique prediction. For classification trees, this prediction is a normalized histogram over class labels. Rooted binary decision DAGs. Rooted binary DAGs have a different architecture compared to decision trees and were introduced by Platt et al. [26] as a way of combining binary classifier for multi-class classification tasks. More specifically a rooted binary DAG has: (i) one root node, with in-degree 0; (ii) multiple split nodes, with in-degree 1 and out-degree 2; (iii) multiple leaf nodes, with in-degree 1 and out-degree 0. Note that in contrast to [26], if we have a C-class classification problem, here we do not necessarily expect to have C DAG leaves. In fact, the leaf nodes are not necessarily pure; And each leaf remains associated with an empirical class distribution. Classification DAGs vs classification trees. We explain the relationship between decision trees and decision DAGs using the image classification task illustrated in Fig. 1(a) as an example. We wish to classify image patches into the classes: cow, sheep or grass. A labelled set of patches is used to train a DAG. Since patches corresponding to different classes may have different average intensity, the root node may decide to split them according to this feature. Similarly, the two child nodes may decide to split the patches further based on their chromaticity. This results in grass patches with different intensity and chromaticity (bright yellow and dark green) ending up in different subtrees. However, if we detect that two such nodes are associated with similar class distributions (peaked 2෼໦ΛωοτϫʔΫঢ়ʹ઀ଓ লϝϞϦԽͱΦʔόʔϑΟοςΟϯάΛճආܾ ̋ࣗવݴޠ άϥϑΟΫεɼϩ ボ ςΟΫεͱͷ༥߹ͳͲɼݱࡏओྲྀ ͱͯ͠औΓ૊·Ε͍ͯΔݚڀΛత֬ʹ༧ଌ ˠಛʹɼࣗવݴޠॲཧͱͷ༥߹͸େ͖ͳਐలΛ਱ げ ͨ ̋ݱࡏͷࣗݾڭࢣ͋Γֶश΍ܧଓֶशͷൃలΛࣔࠦ ݱࡏͷج൫Ϟ デ ϧͷߏஙʹ͓͍ͯɼॏཁͳ໾ׂΛՌ͍ͨͯ͠Δ /F3' $-*1 ࣗݾڭࢣ͋Γֶशɾܧଓֶश 6OJ"%ʢࣗಈӡసΛFOEUPFOEͰ࣮ݱʣ
  9. w 44**ٕज़Ϛοϓ  ɿ෼໺  ࣗݾڭࢣ͋Γֶशɼ௒ղ૾ɼ෺ମݕग़ɼ%ج൫Ϟσϧɼ/F3'(BVTTJBO4QMBUUJOH ϩϘςΟΫεج൫Ϟσϧɼେن ໛෼ࢄֶशɼਓ޻σʔλΛ׆༻ֶͨ͠शɼ఺܈ॲཧͱ%෺ମೝࣝɼ7JTVBM4-".ɼಈըೝࣝɾߦಈೝࣝɼΠϕϯτΧ ϝϥɼࢬמΓɼৠཹɼ%J ff

    VTJPO.PEFMʗը૾ੜ੒ɼΠϝʔδϯάٕज़ɼϞσϧΞʔΩςΫνϟ 7J5 ɼϨΠΞ΢τੜ੒ ΫϦΤΠςΟϒϏδϣϯ ˠຊ෼໺ʹ͓͚Δݚڀ͕ਂ͘͞Βʹࡉ෼Խ ೥ͷݱࡏ   %BSL,OPXMFEHF ,OPXMFEHF%JTUJMMBUJPO <)JOUPO /*148`> ڭࢣͷ֬཰෼෍ʢ஌ࣝʣΛ ༻͍ͯੜెΛֶश .PEFMDPNQSFTTJPO <#VDJMV㶙 4*(,%%`> Ξϯαϯϒϧͷग़ྗΛϥϕϧͱͯ͠ ͭͷχϡʔϥϧωοτϫʔΫΛֶश Ϟσϧͷ૊Έ߹Θͤ ஌ࣝͷछྨɾ஌ࣝͷసҠํ๏ ೥      44**प೥ٕज़Ϛοϓɿ஌ࣝৠཹ ෳ਺ͷڭࢣʹΑΔΞϯαϯϒϧΛར༻ .VMUJQMF5FBDIFS <:PV ,%%`> ֬཰෼෍Λू໿ '&&% <1BSL,XBL &$"*`> ಛ௃ϚοϓΛू໿ ࣗ෼ࣗ਎ͷ஌ࣝΛར༻ TFMGEJTUJMMBUJPO ਂ͍૚ͷ஌ࣝΛઙ͍૚΁సҠ -FBSOJOHBVOJpFEDMBTTJpFS <)PV $713`> #FZPVSPXOUFBDIFS <;IBOH *$$7`> ෳ਺ͷੜెͷΈͰֶश %.- <;IBOH $713`> ੜెؒͷ஌ࣝৠཹʹΑΓਫ਼౓͕޲্ 0/& <-BO /FVSM14`> $PMMBCPSBUJWFMFBSOJOH <4POHˍ$IBJ /FVSM14`> ੜెͷઙ͍૚ΛॏΈڞ༗ͯ͠ύϥϝʔλ਺Λ࡟ݮ ஈ֊తʹ஌ࣝΛసҠ   7*% <"IO $713`> ૬ޓ৘ใྔ $3% <5JBO *$-3`> ରরֶश "'% <$IVOH *$.-`> ఢରతֶश ,OPXMFEHF%J⒎VTJPO <)VBOH /FVS*14`> ֦ࢄϞσϧͷֶशํ๏ ,OPXMFEHF3FWJFX <$IFO $713`> ҟͳΔਂ͞ͷ૚ͷؒͰ ஌ࣝΛసҠ .(% <:BOH &$$7`> ϚεΫͨ͠ੜెͷಛ௃Ϛοϓ͔Β ڭࢣͷಛ௃ϚοϓΛ༧ଌ தؒ૚ͷ஌ࣝͷసҠํ๏Λվળ 3,% <1BSL $713`> αϯϓϧؒͷؔ܎ੑ 'MPXPG4PMVUJPO1SPDFEVSF <:JN $713`> ૚ؒͷग़ྗͷ૬ޓؔ܎ "UUFOUJPO5SBOTGFS <;BHPSVZLP *$-3`> "UUFOUJPONBQ தؒ૚ͷग़ྗ͔Β஌ࣝΛநग़ ".3"%*0 <3BO[JOHFS $713`> ෳ਺ͷج൫Ϟσϧ %*/0W $-*1 4". ֶशΛૣظऴྃͨ͠ڭࢣΛར༻ 3$0 <+JO *$$7`> 0OUIFF⒏DBDZ <$IPˍ)BSJIBSBO *$$7`> ೳྗΪϟοϓ໰୊ʹରԠ "VUP,% <-J *$$7`> தؒ૚ͷ஌ࣝදݱ &OTFNCMF,5( <0LBNPUP &$$7`> ஌ࣝͱଛࣦͷ૊Έ߹Θͤ ,%;FSP <-J /FVS*14`> ஌ࣝͱଛࣦͷ૊Έ߹Θͤ -BSHFTDBMFEJTUSJCVUFE <"OJM *$-3`> ֬཰෼෍Λू໿ %VBMOFU <)PV *$$7`> ಛ௃ϚοϓΛू໿ ෳ਺ͷੜెʹΑΔΞϯαϯϒϧΛར༻ %BUBTFU%JTUJMMBUJPO <8BOH BS9JW`> ֶशࡁΈϞσϧͷਫ਼౓͕ߴ͘ͳΔ Α͏ʹೖྗϊΠζΛ࠷దԽ ͦͷଞɿσʔληοτͷৠཹ ੜె ஌ࣝΛసҠ ੜె ੜె ஌ࣝΛసҠ ੜె ੜె ஌ࣝΛసҠ ੜె ஌ࣝΛసҠ ڭࢣ ڭࢣ #"/ <'VSMBOFMMP *$.-`> 4NBMMˠ4NBMMˠʜ TFMGEJTUJMMBUJPO 5FBDIFS"TTJTUBOU <.JS[BEFI """*`> -BSHFˠ.JEEMFˠ4NBMM ʢೳྗΪϟοϓ໰୊ʹରԠʣ %BUBEJTUPSUJPOHVJEFETFMGEJTUJMMBUJPO <9VBOE-JV """*`> ݩσʔλ͕ಉ֦͡ுޙͷσʔλͷग़ྗΛ༧ଌ ʢσʔλ͔Βσʔλ΁ͷTFMGEJTUJMMBUJPOʣ ஌ࣝΛసҠ ੜె σʔλ ͭͷڭࢣͰΞϯαϯϒϧ %BUBEJTUJMMBUJPO <3BEPTBWPWJD $713`> σʔλ֦ுΛར༻ 1SFQBSJOH-FTTPOT <8FO /FVSPDPNQVUJOH`> ޡೝࣝͨ͠σʔλͷ஌ࣝͱ ෆ࣮֬ͳ஌ࣝΛௐ੔ (SBEVBM4BNQMJOH(BUF <.JOBNJ .7"`> ਖ਼ղͨ͠σʔλͷ ஌ࣝͷΈΛసҠ ग़ྗ૚ͷ஌ࣝͷసҠํ๏Λվળ 'VODUJPO.BUDIJOH <#FZFS $713`> NJYVQʹΑΔଟ༷ͳը૾Λ༻͍ͯ ڭࢣͱੜెؒͰؔ਺Ϛονϯά &⒎FDUJWFOFTTPGGVODUJPONBUDIJOH JOESJWJOHTDFOFSFDPHOJUJPO <:BTIJNB &$$78`> ϥϕϧͳ͠σʔλΛ༻͍ͯؔ਺Ϛονϯά ؔ਺Ϛονϯάͱͯ͠஌ࣝৠཹΛ࠶ߟ %*45 <)VBOH /FVS*14`> ΫϥεؒʹՃ͑ͯ Ϋϥε಺ͷ૬ؔΛసҠ 0GGMJOF %JTUJMMBUJPO 0OMJOF %JTUJMMBUJPO ஌ࣝΛసҠ ڭࢣ ੜె ΑΓଟ༷ͳ৘ใΛ࣋ͭ தؒ૚ͷग़ྗΛར༻ 'JU/FUT <3PNFSP *$-3`> தؒ૚ͷ஌ࣝͱͯ͠ ಛ௃ϚοϓΛ࢖༻ ɹɹɿύϥϝʔλΛݻఆ ɹɹɿύϥϝʔλΛߋ৽ ڭࢣɿֶशࡁΈϞσϧ ੜెɿະֶशͷϞσϧ ੜెͷΈΛ༻͍ͯ ੜెؒͰ஌ࣝΛసҠ ڭࢣͷ஌ࣝΛੜె΁సҠ ஌ࣝৠཹͷࣗಈઃܭ ஌ࣝసҠΛิॿ͢ΔϞσϧΛ௥Ճ 3FTJEVBM,% <(BP BS9JW`> ஌ࣝͷࠩΛิ׬͢Δ"TTJTUBOU ҟͳΔϞσϧߏ଄ؒͰ஌ࣝΛసҠ %FJ5 <5PVWSPO *$.-`> ஌ࣝͱͯ֬͠཰෼෍Λ༻͍ͯ $//͔Β7J5΁஌ࣝৠཹ 0OFGPS"MM <)BP /FVS*14`> தؒग़ྗΛMPHJUۭؒʹ౤Ө͢Δ͜ͱͰ ҟͳΔߏ଄ͷϞσϧؒͰதؒ૚ৠཹ ஌ࣝৠཹͷࣗಈઃܭ ,5( <.JOBNJ "$$7`> Ϟσϧͱଛࣦͷ૊Έ߹Θͤ 0SBDMF,OPXMFEHF%JTUJMMBUJPO <,BOH """*`> ΞϯαϯϒϧڭࢣͷͨΊͷੜెͷϞσϧߏ଄ Ϋϥεߏ੒΍λεΫ͕ҟͳΔෳ਺ͷڭࢣͷ஌ࣝΛੜెʹू໿ 4UVEFOUCFDPNJOHUIFNBTUFS <:F $713`> ηϚηάΛֶशͨ͠ڭࢣͱਂ౓ਪఆΛֶशͨ͠ڭࢣ "NBMHBNBUJOH,OPXMFEHF <4IFO """*`> ҟͳΔ෼ྨλεΫΛֶशͨ͠ෳ਺ͷڭࢣ ಛఆͷλεΫ ֶश Ϟσϧʹ͓͚Δ஌ࣝΛઃܭ $-*1,% <'BOH $713`> $-*1ɿ$-*1ʹ͓͍ͯ ैདྷͷ஌ࣝͷ༗ޮੑΛௐࠪ .JOJ7J5 <;IBOH $713`> 7JTJPO5SBOTGPSNFSɿ ΞςϯγϣϯॏΈͱύοντʔΫϯ .BOJGPME%JTUJMMBUJPO <)BP /FVS*14`> 7JTJPO5SBOTGPSNFSɿ ύονؒͷؔ܎ੑ -BSHFTDBMFJODSFNFOUBMMFBSOJOH <8V $713`> ܧଓֶशɿաڈλεΫͰ ֶशͨ͠Ϟσϧͷ֬཰෼෍ *NQSPWJOHGBTUTFHNFOUBUJPO XJUIUFBDIFSTUVEFOUMFBSOJOH <9JF #.7$`> ηϚηάɿۙ๣ͷϐΫηϧͱͷMPHJUؔ܎ 4&&% <'BOH *$-3`> ࣗݾڭࢣ͋Γֶशɿ αϯϓϧؒͷؔ܎ੑ -FBSOJOHF⒏DJFOUPCKFDUEFUFDUJPO NPEFMTXJUILOPXMFEHFEJTUJMMBUJPO <;BHPSVZLP *$-3`> ෺ମݕग़ɿ෺ମྖҬͷۣܗ ڭࢣ ੜె ஌ࣝΛసҠ ੜె ੜె ஌ࣝΛసҠ ੜె ੜె ஌ࣝΛసҠ ࢬמΓͷ ϋʔυ΢ΣΞରԠ &*&<4)BO *4$"> ॏΈڞ༗ͱૄߦྻԋࢉʹରԠ /7*%*""NQFSF</7*%*" (5$> ࢬמΓޙͷਪ࿦Λߴ଎Խ͢ΔػೳΛ࣮૷ %31"*<3FOFTBT *44$$`> ϚΠΫϩϓϩηοαͰ͋Δ3;7)ʹ౥ࡌ͞ΕΔ"*ΞΫηϥϨʔλ ޯ഑ϕʔε Ϛάχνϡʔυϕʔε 5SBOTGPSNFS $// ๅ͘͡Ծઆ ೥       44**प೥ٕज़ϚοϓɿࢬמΓ 5SBOTGPSNFSϞσϧͷొ৔ "UUFOUJPO*T"MM:PV/FFE <7BTXBOJ /*14`> $//Ϟσϧͷొ৔ -F/FU <-F$VO > ࢬמΓͷ࢝·Γ $//Ϟσϧͷൃల "MFY/FU <,SJ[IFWTLZ /*14`> આ໌ੑΛߟྀͨ͠ख๏ ϨΠϠʔ୯ҐͰࢬמΓ ࢬמΓޙͷඍௐ੔ෆཁ ଞͷѹॖख๏ͱͷซ༻ ֶशલʹࢬמΓՄೳ ߏ଄ԽࢬמΓ 4USVDUVSFE1SVOJOH ඇߏ଄ԽࢬמΓ 6OTUSVDUVSFE1SVOJOH ࢉग़ͨ͠ޯ഑Λجʹͯ͠ࢬמΓ ॏΈύϥϝʔλͷେ͖͞Λجʹͯ͠ࢬמΓ ൒ߏ଄ԽࢬמΓ 4FNJTUSVDUVSFE1SVOJOH ɿֶशલʢॳظঢ়ଶͷϞσϧͷॏΈΛධՁʣ ɿֶशதʢεύʔεਖ਼ଇԽ౳ͰϞσϧͱϚεΫΛֶशʣ ɿֶशޙʢֶशࡁΈϞσϧͷॏΈΛධՁʣ ɿਪ࿦࣌ʢೖྗʹԠͯ͡࢖༻͢ΔॏΈΛಈతʹબ୒ʣ (BUF%FDPSBUPS <:PV /*14`> ςΠϥʔల։ϕʔεͷ εέʔϦϯάϑΝΫλͷධՁ 'JMUFS1SVOJOH WJB(FPNFUSJD.FEJBO <)F $713`> Χʔωϧ಺ͷॏΈٴͼ ͹Β͖ͭͷେ͖͞Λߟྀ 3FUIJOLJOHUIF4NBMMFS/PSN -FTT*OGPSNBUJWF"TTVNQUJPO <:F *$-3`> ਖ਼ଇԽ߲ʹܭࢉྔͷج४Λಋೖ 3FT3FQ <%JOH *$$7`> ৑௕ͳॏΈʹͷΈਖ਼ଇԽΛద༻ *NQPSUBODF&TUJNBUJPO <.PMDIBOPW $713`> ΧʔωϧΛ࡟আͨ͠ࡍͷ ଛࣦͷมԽྔͷೋ৐ΛධՁ (SPVQ'JTIFS1SVOJOH <-JV *$.-`> 'JTIFSJOGPSNBUJPOΛ ༻͍ͨॏΈͷධՁ $1SVOF <,JN &$$7`> %//ίϯύΠϥΛ༻͍ͨ ࣮ߦ଎౓ϕʔεͷࢬמΓ /FU"EBQU <:BOH &$$7`> ࣮ߦ࣌ؒΛج४ʹධՁ /FUXPSL4MJNNJOH <-JV *$$7`> εέʔϦϯάϑΝΫλͷ ಋೖͱਖ਼ଇԽ $-*12 <5VOH $713`> ϚάχνϡʔυʹΑΔࢬמΓͱྔࢠԽΛซ༻ -FBSOJOH#PUI 8FJHIUTBOE$POOFDUJPOT GPS&GGJDJFOU/FVSBM/FUXPSLT <)BO "$.`> ॏΈͷେ͖͞Λج४ʹࢬמΓ %ZOBNJD/FUXPSL4VSHFSZGPS &GGJDJFOU%//T <(VP /*14`> ࢬמΓ͞ΕͨॏΈΛ ճ෮ͤ͞ΔͨΊͷΦϓγϣϯΛ௥Ճ &YQMPSJOH4QBSTJUZJO 3FDVSSFOU/FVSBM/FUXPSLT </BSBOH *$-3`> 3//ͷֶशաఔͰޮ཰తʹద༻ 1SVOJOH%FFQ/FVSBM/FUXPSLT GSPNB4QBSTJUZ1FSTQFDUJWF <%JBP *$-3`> ܦࡁֶͷ෋ͷ෼഑๏ଇΛ༻͍ͨࢬמΓ཰ͷಈతܾఆ -FBSOBCMF1SVOJOH <:BP BS9JW`> ༧Ίઃఆͨ͠ࢬמΓ཰ʹରԠͨ͠ਖ਼ଇԽख๏ 4/*1 <-FF *$-3`> ଛࣦؔ਺ʹର͢ΔॏΈͷ ײ౓ΛධՁ 4ZOGMPX <5BOBLB /*14`> ߴѹॖ཰ʹΑΔ૚ͷ่յ΁ରԠ ʢσʔλϑϦʔʣ 0QUJNBM#SBJO4VSHFPO <)BTTJCJ4UPSL /*14`> ͋ΔॏΈ̍ͭʹର͢ΔςΠϥʔల։ Λ༻͍ͯ0#%ΛҰൠԽ (SB41 <8BOH *$-3`> ΁γΞϯΛ༻͍ͯ ॏΈಉ࢜ͷؔ܎ΛධՁ /FVSBM5BOHFOU5SBOTGFS <-JV *$.-`> /5,Ͱಛ௃ֶ͚ͮͨशμΠφϛΫε Λ໛฿͢ΔαϒωοτϫʔΫΛ୳ࡧ 0QUJNBM#SBJO%BNBHF <-F$VO /*14`> ΁γΞϯΛ༻͍ͯ ଛࣦ͕૿Ճ͠ͳ͍ॏΈΛಛఆ 5IF-PUUFSZ5JDLFU)ZQPUIFTJT <'SBOLF *$-3`> ॳظωοτʹ͸ ಉ౳ੑೳͷαϒωοτ͕ଘࡏ 4USPOH-PUUFSZ5JDLFU)ZQPUIFTJT <3BNBOVKBO $713`> େن໛ͳωοτʹ͸ະֶशͰ΋ ಉ౳ੑೳͷαϒωοτ͕ଘࡏ %VBM-PUUFSZ5JDLFU )ZQPUIFTJT <#BJ *$-3`> ϥϯμϜʹબ୒ͨ͠αϒωοτ͸ ౰ͨΓ͘͡ʹม͑Δ͜ͱ͕Մೳ 1BUDI4MJNNJOH <5BOH $713`> ύονؒͷྨࣅ౓Λ؍࡯ "OBMZ[JOH .VMUJ)FBE4FMG"UUFOUJPO <7PJUBM "$-`> -ਖ਼ଇԽΛ༻͍ͯϔουΛ࡟আ 4USVDUVSFE1SVOJOHPG -BSHF-BOHVBHF.PEFMT <8BOH "$-`> ର֯ߦྻ͸)BSE$PODSFUF෼෍ ͔ΒαϯϓϦϯά 7JTJPO5SBOTGPSNFS1SVOJOH <;IV ,%%`> ֤૚ͷલޙͰτʔΫϯͷ ࣍ݩ਺Λѹॖ͢ΔͨΊͷ ର֯ߦྻΛಋೖ 6OJGJFE7JTVBM5SBOTGPSNFS $PNQSFTTJPO <:V *$-3`> ࢬמΓ ϒϩοΫεΩοϓ ஌ࣝৠཹ 91SVOFS <:V *$$7`> 7J5ͷઆ໌ੑΛߟྀͨ͠ࢬמΓ 8%1SVOJOH <:V """*`> ૚ͷਂ͞ͱ෯Λಉ࣌ʹ࡟ݮ $IBOOFM1SVOJOH <)F *$$7`> ૚ͷલޙͰಛ௃ྔͷ มԽ͕࠷খʹͳΔ Α͏ͳॏΈΛ࠶ߏ੒ %JTDSJNJOBUJPOBXBSF $IBOOFM1SVOJOH <;IVBOH /*14`> ૚ຖʹࣝผثΛಋೖ %ZOB#&35 <)PV /*14`> ࢬמΓ ৠཹͰ૚ͱ෯Λѹॖ 304*5" <-JV """*`> ࢬמΓʴ௿ϥϯΫҼ਺෼ղ ʴ஌ࣝৠཹͷซ༻ 050 <$IFO /*14`> ύϥϝʔλΛάϧʔϓԽͯ͠ άϧʔϓ୯ҐͰࢬמΓ "50 <$IFO $713`> ίϯτϩʔϥωοτϫʔΫ ͰϚεΫΛֶश ;FSP51SVOF <8BOH $713`> τʔΫϯͷॏཁ౓ͱ ྨࣅ౓ͰࢬמΓ *UFSBUJWF4/*1 <%F+PSHF *$-3`> 4/*1ΛϚϧνγϣοτͰద༻ /5,4"1 <8BOH *$-3`> /5,ͷมԽ͕খ͍͞ॏΈΛධՁ %FFQ.P& <8BOH 1.-3`> ֶशͨ͠ήʔτωοτϫʔΫ ͰಈతʹνϟϯωϧΛࢬמΓ %FKB7V <-JV *$.-`> ೖྗʹԠͯ͡ಈతʹ νϟωϧ΍ϔουΛ੾Γସ͑ +BNCB <-JFCFS BS9JW> .P&తʹαϒωοτ ϫʔΫΛ੾Γସ͑ 1SVOJOH'JMUFSJO'JMUFS <.FOH /*148`> ৞ΈࠐΈϑΟϧλΛ ετϥΠϓ୯ҐͰࢬמΓ /1"4 <-J $713`> ҟͳΔΧʔωϧαΠζʹదԠͨ͠ ίϯύΠϥϕʔεͷࢬמΓ 1$0/7 <.B """*`> &-P(ʹΑΔϑΟϧλͷ ύλʔϯࢬמΓͱίϯύΠϥ 4QBSTF(15 <'SBOUBS *$.-`> ϒϩοΫαΠζͷࢦఆʹΑΓ ൒ߏ଄తͳεύʔεੑʹ֦ுՄೳ -FBSOJOH4USVDUVSFE4QBSTJUZ JO%FFQ-FBSOJOH <8FO /*14`> Χʔωϧɼνϟϯωϧɼ૚ͷਂ͞ ʹରͯ͠ਖ਼ଇԽΛద༻ &BSMZ#JSEUJDLFUT <:PV *$-3`> ૣֶ͍शஈ֊Ͱ౰ͨΓ͘͡Λൃݟ "SF4JYUFFO)FBET 3FBMMZ#FUUFSUIBO0OF  <.JDIFM /*14`> ϔουຖͷޯ഑ͷେ͖͞Λج४ʹࢬמΓ $IBTJOHTQBSTJUZ JOWJTJPOUSBOTGPSNFST <$IFO /*14`> ϔουͷग़ྗಛ௃ΛςΠϥʔల։ ΛԠ༻ͯ͠ධՁ 4JOHMFTIPU1SVOJOH 'PS1SFUSBJOFE.PEFMT <,PIBNB *$$78`> ࣄલֶशࡁΈϞσϧʹରԠ %FQ(SBQI <'BOH $713`> %//ͷ֤૚ͷґଘؔ܎Λ άϥϑͱͯ͠දݱ ࢬמΓͷλΠϛϯά 8JOOJOH-PUUFSZ5JDLFUT JO%FFQ(FOFSBUJWF.PEFMT <,BMJCIBU """*> ੜ੒Ϟσϧʹ֦ு (SBQI-PUUFSZ5JDLFU (-5  )ZQPUIFTJT <$IFO *$.-> (SBQI/FVSBM/FUXPSLʹ֦ு 5PXBSET4USVDUVSBMMZ4QBSTF-PUUFSZ5JDLFUT <$IFO *$.-`> ඇθϩཁૉΛάϧʔϓԽͯ͠ ߏ଄తʹεύʔεͳ౰ͨΓ͘͡Λ֫ಘ ʢத෦େ.13(ʴ%FOTP*5-BCͰ࡞੒ʣ
  10. w ج൫Ϟσϧʢ'PVOEBUJPONPEFM   େྔ͔ͭ޿ൣͳσʔλͰֶश༷ͯ͠ʑͳԼྲྀλεΫʹసҠͰ͖Δਂ૚ֶशϞσϧ  ೥࣌ʹ͸༧ଌͰ͖ͣɼݱࡏओྲྀͱͳٕͬͨज़ͷҰͭ  FH4FHNFOU"OZUIJOH.PEFM 4".

    <,JSJMMPW *$$7> %*/0W<0RVBC BS9JW> w ج൫ϞσϧͷϚϧνϞʔμϧԽ  $-*1 $POUSBTUJWF-BOHVBHF*NBHF1SFUSBJOJOH <3BEGPSE *$.->  ;FSPTIPUΫϥε෼ྨͱPQFOWPDBCVMBSZೝࣝ΁ͷల։ ೥ͷݱࡏ   I1·T2 I1·T3 … I2·T1 I2·T3 … I3·T1 I3·T2 … ⋮ ⋮ ⋮ I1·T1 I2·T2 I3·T3 (1) Contrastive pre-training Image Encoder Text Encoder Pepper the aussie pup Pepper the aussie pup Pepper the aussie pup Pepper the aussie pup T1 T2 T3 … I1 I2 I3 ⋮ (2) Create dataset classifier from label text plane car dog ⋮ bird A photo of a {object}. ⋮ Text Encoder T1 T2 T3 TN … (3) Use for zero-shot prediction Image Encoder I1 I1·T2 I1·TN I1·T1 … … A photo of a dog. TN IN·T1 IN·T2 IN·T3 I1·TN I2·TN I3·TN ⋮ … IN … ⋮ ⋱ IN·TN I1·T3 w $*'"3ͷը૾ͱϓϩϯϓτͷಛ௃ྔΛ6."1Ͱ࣍ݩʹ࣍ݩ࡟ݮͯ͠ՄࢹԽ ͳͥ;FSPTIPUJNBHFDMBTTJpDBUJPO͕Մೳʁ  ɿը૾ͷಛ௃ྔ ɿϓϩϯϓτͷಛ௃ྔ Ϋϥε͝ͱʹಛ௃ྔ͕ಠཱ͠ɼը૾ͱϓϩϯϓτ͕ಉ͡ՕॴʹຒΊࠐ·Ε͍ͯΔ BQIPUPPGBBJSQMBOF BQIPUPPGBCJSE BQIPUPPGBDBU BQIPUPPGBEFFS BQIPUPPGBGSPH BQIPUPPGBIPSTF BQIPUPPGBBVUPNPCJMF BQIPUPPGBEPH BQIPUPPGBTIJQ BQIPUPPGBUSVDL Query: saturn V, blossom Query: golden gate, yacht Query: Oculus, Ukulele Figure 5. Open-vocabulary segmentation with user-defined queries. Our model accurately segments unseen categories, such as the Saturn V rocket, Oculus headset, and Golden gate bridge. 150 and PC-59 We hypothesis that 12K nouns are adequate for the CLIP to retain its open-vocabulary ability. Thus, we choose to use 1 caption for efficiency purposes as it’s 5x faster in training then using 5 captions. 4.3.2 Mask prompt tuning We ablate the effect of mask prompt tuning in Table 3. The baseline model is MaskFormer Swin-Base with CLIP ViT-L/14. If we only use mask prompt tuning (case (a)), our model outperforms the baseline by a large +4.7% and +4.0% mIoU improvement on ADE-150 and PC-59, respec- tively. Case (b) shows the result of full model fine-tuning. Although it achieves the best accuracy, the trainable param- eters are orders of magnitude higher. In contrast, the pro- posed mask prompt tuning only modifies the input without changing CLIP’s weight. Furthermore, mask prompt tun- ing can further improve over a fully finetuned model, as shown in case (c). Case (c) achieves 29.6% mIoU ADE- GT: building Pred: skycraper GT: rail Pred: road Figure 6. Ambiguity of the class definition in open vocabulary segmentation evaluation. fact that language defined categories are ambiguous and can overlap with each other. Designing a better evaluation met- $-*1ʹΑΔରরֶश ը૾ಛ௃ͱݴޠͷରԠ͚ͮ 0QFOWPDBCVMBSZೝࣝ 074FH<'-JBOH $713>
  11. w 44**ٕज़Ϛοϓ  Λ࡞੒ͨ͜͠ͱ͕͖͔͚ͬͱͳΓऔΓ૊Μͩݚڀ զʑ .13( ͷऔΓ૊Έ  ෇͚͠ ࠷ऴత

    Γɼೝ BN ͸ Λߟྀ ͕௚ײ Δɽ͜ खͰम νϡʔ map Α͏ʹ ͚Δਓ ޮੑ ਓखʹ ɼग़ྗ Δͨ tention map ͷେ͖͞͸ 14×14 Ͱ͋Δɽ࣍ʹɼ Attention map Λमਖ਼͢ΔͨΊʹɼAttention map Λ 224×224 ʹ ֦େ͠ɼAttention map ΛਓखʹΑΓमਖ਼͢Δɽਤ 2 ͷྫͷ৔߹ɼਖ਼ղϥϕϧ͕ “Dalmatian” Ͱ͋Δը૾Λ ResNet-152+ABN ΁ೖྗͨ͠ͱ͖ɼೋͭͷ෺ମ͕ը૾ தʹؚ·Ε͍ͯΔͨΊ “Soccer ball” ͱޡೝࣝͨ͠ɽ͜ ͷͱ͖ɼAttention map ΛՄࢹԽ͢Δͱɼ“Soccer ball” ʹରͯ͠ڧ͘஫ࢹ͍ͯ͠Δ͜ͱ͕֬ೝͰ͖Δɽͦ͜Ͱɼ ਤ 2 Attention map ͷमਖ਼ํ๏ 𝑚 3 𝑚 1 𝑚 2 𝐿^ 𝑦,1 𝐿^ 𝑦,2 𝐿^ 𝑦,3 𝐿 1,2 𝐿 1,3 𝐿 2,1 𝐿 3,1 𝐿 3,2 𝐿 2,3 ^ 𝑦 ^ 𝑦 ^ 𝑦 ήʔτؔ਺ • 5ISPVHI(BUF • $VUPGG(BUF • -JOFBS(BUF • $PSSFDU(BUF • 3FT/FU ධՁର৅ϊʔυ • 3FT/FU • 3FT/FU • 8JEF3FT/FU ิॿϊʔυ ϋΠύϥ୳ࡧ ਂ૚ֶशϞ デ ϧ΁ͷਓͷ஌ݟͷ૊ΈࠐΈ ୳ࡧʹΑΔൃݟతݚڀΞ プ ϩʔν "UUFOUJPO#SBODI/FUXPSL <'VLVJ $713> <ࡾ௡ݪ ৴ֶ> ΞςϯγϣϯϚοϓΛਓͷ஌ݟʹΑΓखಈͰ"#/ϑΝΠϯνϡʔχϯά ˠ デ ʔλ ド Ϧ ブ ϯͳ"*Ϟ デ ϧͱਓͷϋΠ ブ Ϧο ド ʹΑΔΞ プ ϩʔν ݚڀऀ が ઃܭ͍ͯͨͨ͠Ίݶఆత で ͋ͬͨෳ਺Ϟσϧؒͷ஌ࣝৠཹΛϋΠ ύϥ୳ࡧͯ͠࠷దԽˠ৽ͨͳ஌ࣝసҠ๏ が ൃݟ͞Εͨ ,OPXMFEHF5SBOTGFS(SBQI <.JOBNJ "$$7> <0LBNPUP &$$7>
  12. ૉਓൃ૝ݰਓ࣮ߦ΁ த෦େֶϩΰ த෦େֶϩΰ  w ஌ࣝసҠάϥϑΛఏҊ஌ࣝసҠ グ ϥϑʹΑΔڞಉֶश<.JOBNJ .*36 "$$7>

     ݚڀऀ͸໰୊ઃఆͱղ͖ํΛݶఆͤ ず ɼ஌ࣝసҠΛදݱ͢ΔϑϨʔϜϫʔΫΛݚڀऀ が ઐ໳ੑΛൃ شͯ͠ઃܭ  ࠷దԽ୳ࡧΛߦ͏͜ͱ で ৽ͨͳ஌ݟΛൃݟ 4PGUXBSF ࣌୅ͷݚڀํ๏ ܭࢉػ が ๲େͳ パ ϥϝʔλۭ͔ؒΒ࢓༷Λຬͨ͢ プ ϩ グ ϥϜΛ୳ࡧ ʮૉਓൃ૝ݰਓ࣮ߦʯ ୳ࡧʹΑΓ֫ಘͨ͠৽ͨͳ஌ݟ が ࣍ͷ৽ͨͳݚڀͷ୺ॹͱͳΔ͜ͱΛظ଴ ݚڀۭؒ ͜Ε·Ͱͷݚڀ ݚ ڀ ͷ ෳ ࡶ ͞ ૉਓൃ૝ݰਓ࣮ߦ
  13. w 5BFLZVO,JNઌੜ ,"*45 ʹ࠶౓ଧ਍ʢ೥݄೔ʣ ະདྷʹ޲͚ͯ  Hi! It's great to

    hear from you. Yes, the continuous learning and self-supervised learning approaches are indeed pivotal for the development of AGI. Google's Pathways architecture is particularly fascinating, as it aims to leverage a mixture of experts (MoE) strategy by training numerous neural networks on a variety of tasks and selectively activating a few for each inference. This method, while computationally intensive, represents a promising step towards AGI by efficiently managing computational resources and creating more specialized pathways for different inputs. The idea of combining a binary decision tree with deep neural networks to form a hierarchical MoE is also intriguing. It blends the interpretability of random forests with the powerful learning capabilities of deep neural networks, potentially offering a more transparent yet highly effective model. This hybrid approach could address some of the black-box issues inherent in deep learning models while maintaining high performance across diverse tasks. It’s an exciting time for AI research, and these advancements are paving the way for more sophisticated and capable AI systems. ͦ͏Ͱ͢Ͷɺܧଓֶश΍ࣗݾڭࢣ͋ΓֶशͷΞϓϩʔν͸ɺAGIʢਓ޻൚༻஌ೳʣͷ։ൃ ʹ͓͍ͯඇৗʹॏཁͰ͢ɻGoogleͷPathwaysΞʔΩςΫνϟ͸ಛʹڵຯਂ͍Ͱ͢ɻ͜Ε ͸ɺΤΩεύʔτͷࠞ߹ʢMoEʣઓུΛ׆༻ͯ͠ɺ͞·͟·ͳλεΫʹରͯ͠ଟ਺ͷχϡ ʔϥϧωοτϫʔΫΛ܇࿅͠ɺਪ࿦͝ͱʹબ୒తʹ͍͔ͭ͘Λ׆ੑԽ͢Δ͜ͱΛ໨ࢦͯ͠ ͍·͢ɻ͜ͷํ๏͸ܭࢉෛՙ͕ߴ͍΋ͷͷɺܭࢉϦιʔεΛޮ཰తʹ؅ཧ͠ɺҟͳΔೖྗ ʹରͯ͠ΑΓઐ໳Խ͞Εͨܦ࿏Λ࡞੒͢Δ͜ͱͰɺAGIʹ޲͚ͨ༗๬ͳҰาͱͳΓ·͢ɻ όΠφϦܾఆ໦ͱਂ૚χϡʔϥϧωοτϫʔΫΛ૊Έ߹Θͤͯ֊૚తͳMoEΛܗ੒͢Δͱ ͍͏ΞΠσΞ΋ڵຯਂ͍Ͱ͢ɻ͜Ε͸ɺϥϯμϜϑΥϨετͷղऍՄೳੑͱਂ૚χϡʔϥ ϧωοτϫʔΫͷڧྗͳֶशೳྗΛ༥߹ͤ͞ɺΑΓಁ໌ੑ͕͋Γͳ͕ΒߴੑೳͳϞσϧΛ ఏڙ͢ΔՄೳੑ͕͋Γ·͢ɻ͜ͷϋΠϒϦουΞϓϩʔν͸ɺਂ૚ֶशϞσϧʹ಺ࡏ͢Δ ϒϥοΫϘοΫε໰୊Λղܾ͠ͳ͕Βɺଟ༷ͳλεΫͰߴ͍ੑೳΛҡ࣋Ͱ͖Δ͔΋͠Ε· ͤΜɻ AIݚڀʹͱͬͯඇৗʹΤΩαΠςΟϯάͳ࣌ظͰ͋Γɺ͜ΕΒͷਐา͸ɺΑΓߴ౓Ͱೳྗ ͷߴ͍AIγεςϜͷಓΛ੾Γ։͍͍ͯ·͢ɻ (PPHMF1BUIXBZTIUUQTXXXZPVUVCFDPNXBUDI W/GE$D&;X "(*ʹ޲͚ͨ.P& .JYUVSFPG&YQFSUT ઓུ ෳ਺ͷઐ໳ՈϞσϧΛ૊Έ߹ΘͤͯશମͷੑೳΛ޲্ͤ͞ΔΞϓϩʔν ޮ཰తͳܭࢉɿ֤ೖྗʹରͯ͠Ұ෦ͷઐ໳ՈϞσϧͷΈ͕׆ੑԽ͢ΔͨΊɼɹ େن໛ͳϞσϧʹൺ΂ͯܭࢉίετ͕௿͍ ൚༻ੑͷ޲্ɿҟͳΔઐ໳Ո͕ҟͳΔλεΫʹಛԽ͢ΔͨΊɼશମͱͯ͠ Ϟσϧ͕෯޿͍λεΫʹରԠՄೳ ҟͳΔͳΔϞʔμϦςΟʢςΩετ ը૾ Ի੠ͳͲʣΛ౷߹తʹֶश͢Δ ͜ͱͰɼҟͳΔσʔλιʔε͔Βͷ৘ใΛ૊Έ߹ΘͤͯΑΓแׅతͳཧղ ΛಘΔ͜ͱ͕Մೳ
  14. w 44**ͷٕज़Ϛοϓ࡞੒ΛৼΓฦΔͱʢݸਓతײ૝ʣ 👎اըͨ͠͸ྑ͍͕େมͳ࡞ۀ 👎ϚοϓʹهࡌͰ͖ͳ͍ଞͷٕज़΋ଟ਺͋Γ 👌ԿΛ͢΂͖͔ͷ໨ඪΛݟग़ͤͨ w ҰਓͰ͸ٕज़ಈ޲Λ೺Ѳ͢Δ͜ͱ͕೉͍࣌͠୅ʹ  44**΍DWQBQFSDIBMMFOHF౳ͷ૊৫Λ௒͑ͨݚڀऀɾ։ൃऀίϛϡχςΟͷ׆༻ Χϝʹ;͞Θֶ͍͠ͼͷ؀ڥˠ44**ίϛϡχςΟ

    J ޷͖ͳՊ໨΍झຯ͕ͱ͜ͱΜ௥ٻͰ͖ɼ JJ ͦͷ͜ͱ͕पΓ͔Βଚܟ͞Εɼ JJJ ͦͷಛٕΛ஥ؒʹڭ͑Δ͜ͱͰपΓʹՃ͑ͯ ຊਓ·Ͱֶ͕ͼΛਂΊΒΕΔ؀ڥ ҏ౻ެฏʰେֶͱ೔ຊͷةػ - ࠶ߟʱIUUQTXXXLFJPBDKQKBBCPVUQSFTJEFOUCMPH 44**ٕज़ϚοϓΛ௨ͯ͠ݟΔաڈɾݱࡏɼͦͯ͠ະདྷ  ೥ޙͷ44**ʹ͓͍ͯ΋44**ٕज़ϚοϓΛҰॹʹ࡞੒͠·͠ΐ͏ʂ
  15. 44**ٕज़Ϛοϓ  ࡞੒ڠྗऀ  ࢁޱम ౦ࣳ  ࢁԼོٛ த෦େֶ 

    ঙ໺ҳ ిؾ௨৴େֶ  ਺Ҫ੣ਓ αϜεϯ೔ຊݚڀॴ  ്ܓೋ ೶ۀɾ৯඼࢈ۀٕज़૯߹ݚڀػߏ  ෱஍ਖ਼थʢιχʔʣ Ҫ৲ળٱ ΦϜϩϯ  ʢॱෆಉɼॴଐ͸౰࣌ʣ औΓ·ͱΊɿ ਡ๚ਖ਼थʢΦϜϩϯʣ ೔Ӝ৻࡞ʢࡕେʣ ౻٢߂࿱ʢத෦େֶʣ ௗډळ඙ʢ౦ژ޻ۀେֶʣ ڠྗऀɿ ҆ഒຬ σϯιʔΞΠςΟϥϘϥτϦ  ೖߐ߽ /55  ؠଜխʔ େࡕ෎ཱେֶ  5BF,ZVO,JN *NQFSJBM$PMMFHF-POEPO  খྛঊ ࢈ۀٕज़૯߹ݚڀॴ  ࡔ্จ඙ ໊ݹ԰޻ۀେֶ  ᓎా݈ ౦๺େֶ  ౡాರ࢜ ۝भେֶ  ਿຊໜथ ౦ژ޻ۀେֶ  ۄ໦ప ޿ౡେֶ  ౔୩ઍՃ෉ ೔࢈ࣗಈं  ௕ݪҰ ۝भେֶ  தࢁӳथ ౦ژେֶ  ݪాୡ໵ ౦ژେֶ  ງా੓ೋ ౦ژ೶޻େֶ  ޲઒߁ത ಸྑઌ୺Պֶٕज़େֶӃେֶ