Upgrade to Pro — share decks privately, control downloads, hide ads and more …

第七回全日本コンピュータビジョン勉強会 A Multiplexed Network for End-to-End, Multilingual OCR

第七回全日本コンピュータビジョン勉強会 A Multiplexed Network for End-to-End, Multilingual OCR

Yamato.OKAMOTO

July 28, 2021
Tweet

More Decks by Yamato.OKAMOTO

Other Decks in Technology

Transcript

  1. ࣗݾ঺հʢ୹͘ʂʣ Ԭຊେ࿨ʢ͓͔΋ͱ΍·ͱʣ 
 5XJUUFS3PBESPMMFS@%&46 ✓ ֶੜ࣌୅͸ژ౎େֶͰը૾ೝࣝΛઐ߈ ✓ ΦϜϩϯʹͯɺࣄۀͱٕज़ͷೋ౛ྲྀਓࡐͱͯ͠#J[%FWd3%·Ͱखֻ͚Δ ✓ ΑΓɺ-*/&גࣜձࣾͷ$PNQVUFS7JTJPO-BC5FBNʹॴଐ˒/FX

    ➡ -*/&$7-BCΛzӉ஦Ұָ͘͠ಇ͚ΔνʔϜzʹ͢΂͘νʔϜϏϧυத ➡ ເ͸ʮژ౎ʹಌΕΒΕΔݚڀڌ఺Λ࡞Δʯ͜ͱ ˞ຊ೔ͷൃද͸Ұൠެ։৘ใͷ࿦จΛ঺հ͢ΔҐஔ෇͚Ͱ͢ɻ 
 ˞ॴଐஂମ͸ؔ܎͋Γ·ͤΜɻ
  2. ঺հ͢Δ࿦จ • ".VMUJQMFYFE/FUXPSLGPS&OEUP&OE .VMUJMJOHVBM0$3 'BDFCPPL"* 
 IUUQTBSYJWPSHQEGQEG ✓ ޫֶจࣈೝࣝ 0$30QUJDBM$IBSBDUFS3FDPHOJUJPO

    ͷݚڀ ✓ ैདྷ͸ɺzΞϧϑΝϕοτzʹয఺Λ͋ͯͨݚڀ͕ଟ͔ͬͨ ✓ ଟݴޠରԠ͢Δʹͯ͠΋ɺग़ྗΫϥε਺Λ਺े͔Β਺ઍ΁ͱ૿΍͚ͩ͢ͷ߽շͳख๏͕ଟ͔ͬͨ ✓ ঺հ࿦จ͸ɺΞϥϒޠ΍೔ຊޠͳͲܭछʹରԠͭͭ͠ɺεϚʔτͳଟݴޠϞσϧΛఏҊ
  3. ࿦จ঺հͷલʹɺஸೡʹݴ༿ͷఆٛΛ੔ཧ͍ͤͯͩ͘͞͞ 2VFTUJPOɿ0$3͕ݕग़͢Δͷ͸4FOUFODFʁ8PSEʁ$IBSBDUFSʁ w 4FOUFODFɹ➔ɹz*BNEPOFXJUINBOLJOE +0+0z w 8PSEɹɹɹ➔ɹz*zɺlBNzɺlEPOFzɺlXJUIzɺlNBOLJOEzɺl zɺl+0+0zɺlz w $IBSBDUFS➔ɹlNzɺBzɺlOzɺlLzɺlJzɺlOzɺlEz

    "OTXFSɿͲΕͰ΋ͳ͍ w 0$3͸zҙຯతͳ୯ҐzͰ͸ͳ͘zߏ଄తͳ୯ҐzͰจࣈྻΛݕग़͢ΔʢJOTUBODFMFWFMͱ΋ݺͿʣ w ҎޙɺຊࢿྉͰ͸ߏ଄తʹ͋Δఔ౓·ͱ·ͬͨจࣈྻΛʮ5FYUςΩετʯͱݺͿ w จࣈͣͭʢ$IBSBDUFSʣΛର৅ʹ͢ΔͱɺΞϊςʔγϣϯ΍ख๏͕ҟͳΔͷͰ໌֬ʹ۠ผ͢Δ
  4. ࿦จ঺հͷલʹɺͲΜͳΞϊςʔγϣϯΛ࢖͏ͷ͔੔ཧ • 5FYUMBCFMɹˡԼهͷը૾தʹ͸ͳ͍͕ɺ͜ͷ৔߹͸ʮ#"--:4ʯ͕5FYUMBCFMʹ૬౰͢Δɻ͜Ε͸͍͍ͩͨೖखͰ͖Δɻ • 5FYUMFWFMCPVOEJOHCPYʢ੨ʣ • 5FYUQPMZHPOBMCPVOEJOHCPYʢ੺ʣ • QPMZHPOBMCPVOEJOHCPYΛ࠷খۣܗͰғͬͨCPVOEJOHCPYʢ྘ʣˡ੨͕ͳ͍৔߹ʹ׆༻ •

    $IBSBDUFSMFWFMCPVOEJOHCPYʢԫʣˡ΄΅ଘࡏ͠ͳ͍ɺ΄ͱΜͲͷ৔߹͸σʔλˍڭࢣϥϕϧΛը૾߹੒Ͱੜ੒͢Δɻ • $IBSBDUFSMBCFM ԫ৭ృΓͷ੺จࣈ ˡ΄΅ଘࡏ͠ͳ͍ɺ΄ͱΜͲͷ৔߹͸σʔλˍڭࢣϥϕϧΛը૾߹੒Ͱੜ੒͢Δɻ
  5. Text Detectionͱ͸ʁ ը૾ʹؚ·ΕΔશͯͷ5FYUྖҬΛɺۣܗ΍1PMZHPOͰநग़͢Δ w 'BTU3$//ͳͲطଘͷ%FUFDUJPOϞσϧΛར༻Ͱ͖Δ w ςΩετ͕ࣼΊʹͳΔͱඞཁҎ্ʹۣܗ͕େ͖͘ͳΔ w ςΩετྖҬҎ֎ͷಛ௃ྔ΋ࠞೖ͢Δ৔߹͕ଟ͍ w

    Ξϊςʔγϣϯʹख͕͔͔ؒΔ w ϚεΫॲཧͳͲͷ௥ՃʹΑͬͯςΩετྖҬ͔ΒͷΈ 
 ಛ௃ྔΛநग़Մೳ ྫɿ'BTU3$//ͷΑ͏ʹ4MJEJOH8JOEPXͰݕग़͢Δ ྫɿ<5FYU/055FYU>ͷ4FHNFOUBUJPOΛ͢Δ
  6. Text Recognitionͱ͸ʁ ݕग़ͨ͠5FYUྖҬʹؚ·ΕΔ$IBSBDUFSྻΛೝࣝ͢Δ -07& w ֤$IBSBDUFSΫϥε എܠΫϥεͰ4FHNFOUBUJPO w մײΛ΋ͬͯલܠͱ༧ଌ͞ΕͨྖҬʹจࣈ͕͋ΔͱԾఆ͢Δ w

    Ͳͷ$IBSBDUFSΫϥεʹଐ͢Δ͔શϐΫηϧͰଟ਺ܾ͢Δ 
 w $IBSBDUFSMFWFMͷΞϊςʔγϣϯ͕ඞཁͳͷͰ޻਺େ w จࣈͷॱং͕֫ಘͰ͖ͳ͍఺΋՝୊ w 5FYUྖҬͷಛ௃ྔΛܥྻσʔλͷܗʹม׵͢Δ w 3//ϕʔεͷϞδϡʔϧͰਪ࿦ʢ-45.΍(36౳ʣ w 4QBUJBM"UUFOUJPOߏ଄ΛೖΕΔ৔߹΋ଟ͍ w $IBSBDUFSMFWFMͷΞϊςʔγϣϯෆཁͳͷͰ͓खܰ ྫɿ4FNBOUJD4FHNFOUBUJPOͰ$IBSBDUFSΛ༧ଌ͢Δ ྫɿ4FR4FR&OD%FDͰจࣈͣͭग़ྗ͢Δ
  7. 0$3ͰΑ͘Ҿ༻͞ΕΔख๏ &&.-5 • $[FDI5FDIOJDBM6OJWFSTJUZ $BSOFHJF.FMMPO6OJWFSTJUZ  • &&.-5BO6ODPOTUSBJOFE&OEUP&OE.FUIPEGPS.VMUJ-BOHVBHF4DFOF5FYU • IUUQTBSYJWPSHQEGQEG

    $IBS/FU • .BMPOH5FDIOPMPHJFT • $POWPMVUJPOBM$IBSBDUFS/FUXPSLT • IUUQTBSYJWPSHQEGQEG 5FYU4QPUUFS • 'BDFCPPL"* )VB[IPOH6OJWFSTJUZ • .BTL5FYU4QPUUFSW4FHNFOUBUJPO1SPQPTBM/FUXPSLGPS3PCVTU4DFOF5FYU4QPUUJOH • IUUQTBSYJWPSHQEGQEG $3"'54 • $MPWB"*3FTFBSDI /"7&3$PSQ • $IBSBDUFS3FHJPO"UUFOUJPO'PS5FYU4QPUUJOH • IUUQTBSYJWPSHQEGQEG
  8. &&.-5 "$$7` CRAFTS ը૾શମͷ ಛ௃ྔநग़ ResNet34Λϕʔεͱͨ͠FPN(Feature Pyramid Net)ɻ ςΩετ 


    ྖҬݕग़ 1/4 Scaleͷ֤࠲ඪͰ Text/NOT Textɺb-boxɺAngleΛਪ࿦ɻ ʢAncher͸࢖༻͠ͳ͍ʣ ςΩετ 
 ྖҬͷ ಛ௃ྔநग़ ݕग़ͨ͠b-box͔Βճస΍࿪ΈܰݮΛ ໨తʹɺύϥϝλਪఆ͠ͳ͕Β spatial transformer layerΛద༻͢Δɻ ςΩετ ೝࣝ Conv૚ͰจࣈೝࣝثΛߏ੒ ೖྗɿԣ෯͚ͩՄม௕ͷಛ௃ྔ ग़ྗɿจࣈ਺෼(໿7500)ͷlog-softmax ग़ྗ͢Δจࣈ਺͸ಛ௃ྔͷԣ෯ʹൺྫ ͯ͠૿΍͢ɻ ֶश޻෉ ը૾߹੒ʹΑͬͯଟݴޠͷֶशσʔλ Λߏஙɻ ˛ఏҊϞσϧͷશମ૾ɻݕग़෦ͱೝࣝ෦͕௚ྻʹฒͿɻ ˛ֶश༻ͷ߹੒ը૾
  9. $IBS/FU $713` ˛.BTL5FYU4QPUUFS΍$3"'54ʹൺ΂Δͱɺ ɹςΩετ΍จࣈͷೝࣝॲཧ͕ฒྻͰҰؾʹͳ͞ΕΔɻ CharNet ը૾શମͷ ಛ௃ྔநग़ ResNet-50 ͱ 


    Hourglass networks(Newell 2016)ͷ 
 ૊Έ߹Θͤɻ ςΩετ 
 ྖҬݕग़ Text Detection BrunchͱCharacter BrunchΛฒྻʹઃ͚Δɻ ɾText Detection Brunch ࣼΊ΍Χʔϒʹ΋ରԠՄೳͳ طଘख๏(EAST, Textfield)Λద༻ɻ ςΩετྖҬΛग़ྗ͢Δɻ ɾCharacter Brunch 3ͭͷϞδϡʔϧΛฒྻʹ഑ஔ -(1)[Text/NOT text] ͷsegmentation -(2)b-boxʹΑΔCharacter Detection -(3)จࣈ਺෼ͷଟΫϥεsegmentation Characterͷb-boxͱϥϕϧΛग़ྗ͢ Δɻ ݕग़ͨ͠ςΩετྖҬʹؚ·ΕΔจ ࣈू߹Λग़ྗ͢Δ (ͳͷͰɺग़ྗ͸ ݫີʹ͸ςΩετͰ͸ͳ͍)ɻ ςΩετ 
 ྖҬͷ ಛ௃ྔநग़ ςΩετ ೝࣝ ֶश޻෉ ֶशʹ͸Text-levelͱcharacter-level྆ ํͷΞϊςʔγϣϯ͕ඞཁͳͷͰɺ ߹੒σʔλͰֶश͢Δɻ ͦͷޙɺ࣮σʔλΛ࢖ͬͯ Weakly Supervised Learning͢Δɻ ˝5FYU%FUFDUJPO#SVODI͸ ɹςΩετྖҬΛݕग़ ˝$IBSBDUFS#SVODI͸಺෦ͰͭͷॲཧΛฒྻ࣮ߦͯ͠ɺ ɹ$IBSBDUFSMFWFMͷݕग़ͱೝࣝΛ࣮ߦ͢Δɻ
  10. 5FYU4QPUUFSW &$$7` ˛ςΩετྖҬΛ4ISJOLͨ͠ྖҬΛڭࢣσʔλͱֶͯ͠शͤ͞Δɻ 
 ͜ΕʹΑΓྡͷςΩετ͕ͬͭ͘͘ͷΛ๷͙ʢ࣍ͷॲཧʹҠΔͱ͖ʹ͸VOTISJOL͢Ε͹Α͍ʣ ◀︎ ˛<PS>Ͱಛ௃ྔΛϚεΫ͕͚͢Δ ˝௨Γͷํ๏ͰจࣈྻΛ֫ಘ͢ΔɻԼஈͷख๏4".Ͱ͸จࣈϨϕϧͷΞϊςʔγϣϯෆཁ TextSpotter (v1~v3)

    ը૾શମͷ ಛ௃ྔநग़ ResNet50ΛϕʔεʹFPNΛઃ͚Δ(v2) ResNet50ΛϕʔεʹU-NetΛઃ͚Δ(v3) ςΩετ 
 ྖҬݕग़ Fast-RCNNϕʔεͷAncherʹΑΔݕग़ (v2) Text/NOT TextΛSegmentation(v3) ςΩετ 
 ྖҬͷ ಛ௃ྔநग़ AncherͰݕग़ۣͨ͠ܗྖҬʹRoI AlignΛద ༻ͯ͠ಛ௃ྔநग़ (v2) Segmentation݁ՌΛ࠷খۣܗͰ੾Γग़͠ɺ ಛ௃ྔʹRoI AlignͱMaskΛద༻(v3) ςΩετ ೝࣝ (1)֤จࣈʴഎܠͷSegmentationΛ࣮ߦɻ จࣈީิྖҬ಺Ͱଟ਺ܾ(PixelVoting)Λͯ͠ จࣈΛ൑ఆɻ (2)Sequentialͳಛ௃ྔʹม׵ͯ͠Attention෇ ͖ͷseq2seqͰจࣈྻΛग़ྗɻ (1)(2)ͷ2ͭͷ༧ଌ݁ՌΛ֫ಘޙɺ৴པ౓ͷ ߴ͍ํΛ࠾༻͢Δɻ ֶश޻෉ Character-levelͷΞϊςʔγϣϯ͕ͳͯ͘΋ (2)͸ֶशՄೳʢ˞(1)ͷֶशʹ͸ඞཁʣɻ
  11. $3"'54 &$$7` ˛514ʹΑΔۣܗม׵ ࣮ࡍ͸'FBUVSF.BQΛม׵ ˛จࣈྖҬͷ༧ଌ͕Ͱ͖Ε͹ ɹ͔ͦ͜Β1PMZHPOྖҬΛ֫ಘՄೳ ˝࣮σʔλͰͷֶश࣌͸ɺDIBSBDUFSMFWFMͷΞϊςʔγϣϯ͕ແ͍ͨΊ ɹٙࣅϥϕϧʹΑΔ8FBLMZ4VQFSWJTFE-FBSOJOHΛ͢Δ CRAFTS ը૾શମͷ

    ಛ௃ྔநग़ ResNet50ΛϕʔεʹU-Netߏ଄Λઃ͚ Δɻ ςΩετ 
 ྖҬݕग़ ߹੒σʔλʴಠࣗͷڭࢣϥϕϧͰֶश (1)จࣈத৺͕ݪ఺ͷΨ΢εείΞ 
 (2)ྡ઀จࣈͷܨ͕ΓΛࣔ͢είΞ 
 (3)จࣈํ޲ 
 (1)(2)ͷ༧ଌ݁Ռ͔ΒҰఆͷܭࢉॲཧ ͰςΩετྖҬΛPolygonͰநग़ɻ ςΩετ 
 ྖҬͷ ಛ௃ྔநग़ ಛ௃ྔͱ(1)(2)༧ଌ݁ՌΛconcat͢Δɻ thin-plate splineʹΑͬͯɺPolygonͰݕ ग़ͨ͠ςΩετྖҬΛݻఆαΠζͷۣ ܗʹม׵ͯ͠ಛ௃ྔΛநग़͢Δɻ ςΩετ ೝࣝ Sequentialͳಛ௃ྔʹม׵ͯ͠Attention ෇͖ͷseq2seqͰจࣈྻΛग़ྗɻ ֶश޻෉ ςΩετͷΞϊςʔγϣϯ͔Βಠࣗͷ ֶश༻σʔλΛ࡞੒ͯ͠ɺͦΕΛ༧ଌ Ͱ͖ΔΑ͏Ϟσϧʹֶशͤ͞Δɻ ˛ֶश༻ͷ߹੒σʔλ࡞੒࣌ʹ͸্هͷΑ͏ͳಠࣗͷڭࢣϥϕϧΛੜ੒ֶͯ͠शͤ͞Δɻ ɹྫ͑͹-JOL4DPSF͸ॎॻ͖ͷςΩετͷࣝผ཰޲্ʹد༩͢Δɻ
  12. ΞΠσΞɿMulti-HeadʹΑΔଟݴޠରԠ ՝୊ɿଟݴޠରԠ͍ͨ͠ʂ͔͠΋֦ு༰қʹ͍ͨ͠ʂ ➡ ୯७ͳํ๏͸ɺ̔ݴޠؚ͕ΉશͯͷจࣈΛѻ͑ΔΑ͏ग़ྗΫϥε਺Λ֦ு͢Δ͜ͱ ➡ ͨͩ͠ɺ೔ຊޠɾதࠃޠɾؖࠃޠΛѻ͓͏ͱ͢Δͱສఔ౓·ͰΫϥε਺͕૿Ճ͢Δ ➡ ·ͯ͠ɺݴޠʹΑͬͯ৅ܗɾॎॻ͖ɾԣॻ͖ͳͲಛ௃͕େ͖͘ҟͳΔ ➡ ͜ΕΒΛͭͷೝࣝثʢ4JOHMF)FBEʣͰѻ͏ͷ͸ద੾ͩΖ͏͔ʁ

    ➡ จࣈͰ΋௥Ճ࡟আͨ͘͠ͳͬͨ৔߹ɺ࠷ॳ͔Βֶश͢Δख͕ؒൃੜ͔͠Ͷͳ͍ ΞΠσΞ ➡ .VMUJIFBEߏ଄Λ࠾༻ɻͭͭͷݴޠʹಛԽͨ͠ܭͭͷจࣈೝࣝثΛ഑ஔͨ͠ɻ ➡ ͋Θͤͯɺݴޠೝࣝثʢ-BOHVBHF1SFEJDUJPO/FUXPSL -1/ ʣΛ഑ஔͨ͠ɻ ➡ ςΩετྖҬ͝ͱʹݴޠೝࣝΛͯ͠ɺ࠷దͳจࣈೝࣝثΛͭબΜͰਪ࿦࣮ࢪͨ͠ɻ ςΩετྖҬ͝ͱʹ จࣈೝࣝثΛ࢖͍෼͚ ݴޠࣝผ݁ՌʹԠͯ͡ จࣈೝࣝثΛ੾Γସ͑ ෳ਺ͷจࣈೝࣝث Λ഑ஔͨ͠
  13. ΞΠσΞɿLanguageͷڭࢣσʔλΛඞཁͱ͠ͳֶ͍श ՝୊ɿݴޠೝࣝثΛઃஔͨ͠΋ͷͷɺݴޠͷΞϊςʔγϣϯ͕গͳֶͯ͘शͰ͖ͳ͍ɻ ➡ ݴޠͷΞϊςʔγϣϯ͕͋Ε͹ɺͦΕΛ༧ଌͰ͖ΔΑ͏ʹݴޠೝࣝثΛֶशͤ͞Ε͹ྑ͍ɻ ➡ ͔͠͠ɺ࣮ࡍ͸Ξϊςʔγϣϯ͕΄ͱΜͲͳ͍ɻͦΕʹɺϞδϡʔϧݸผͰ͸ͳ͘&&ʹֶश͍ͨ͠ɻ ΞΠσΞ ➡ ݴޠΞϊςʔγϣϯ͕ແͯ͘΋ɺςΩετϥϕϧͷΈͰ&&ʹֶशͰ͖Δํ๏ΛߟҊɻ ➡

    จࣈೝࣝثʹඇରԠͷจࣈʢଞݴޠͷจࣈʣ͕ೖྗ͞Εͨ৔߹ʹϖφϧςΟΛ͔͚ͨɻ DUɿU൪໨ͷจࣈ ZUɿU൪໨ͷจࣈͷڭࢣσʔλ $Sɿจࣈೝࣝػ͕ѻ͏จࣈू߹ ̞ɿPSͷ஋Λฦ͢ 5ɿ࠷େग़ྗจࣈ਺ʢݻఆύϥϝλʣ ݴޠೝࣝثʹݴޠΛਪ࿦ͤͯ͞ɺ 
 DSPTTFOUSPQZMPTTͰֶश͢Δɻ 5FYUMBCFM͚ͩͰֶशͤ͞Δɻ จࣈೝࣝػʹඇରԠͷจࣈ͕ೖྗ͞ΕΔͱϖφϧςΟЌΛ͔͚Δɻ ͜ΕʹΑΓɺݴޠೝࣝث͕ద੾ͳจࣈೝࣝثΛબ୒͢ΔΑ͏ֶश͢Δɻ Mɹɹɿݴޠ Q M ɿϞσϧ͕ਪ࿦ͨ֬͠཰ MMBOHɿѻ͏ݴޠͷ૯਺ ݴޠΞϊςʔγϣϯ͕͋Δ৔߹ ݴޠΞϊςʔγϣϯ͕ͳ͍৔߹
  14. ଟݴޠͷText Spottingͷ࣮ݧ݁ՌˠʮCRAFTSҎ֎ʹ͸উͬͨͥʂʯ CRAFTS(paper ) CRAFTͷ࿦จ͕ใࠂ͞Ε͍ͯΔ࣮ݧ݁Ռ CRAFT S ஶऀΒ͕CRAFTΛ࠶ݱ࣮૷ͯ͠ධՁͨ݁͠Ռ Single-head TextSpotte

    r ఏҊख๏ΛMulti-HeadʹͤͣSingle-HeadͰ8ݴޠͷશͯͷจࣈ(໿9000छ)ʹରԠͤͨ͞Ϟσϧ Multiplexed TextSpotte r ຊࢿྉͰ঺հ͍ͯ͠ΔఏҊख๏ɻ ैདྷͷଟݴޠରԠOCRϞσϧͱɺ8ݴޠͷText SpottingλεΫͰੑೳൺֱͨ͠ ※࣮ݧσʔλʹ͸ଟݴޠΛؚΉ”MTL19 Dataset”Λར༻ Ὂ݁Ռɿ֓Ͷߴੑೳͱͳ͕ͬͨɺ།ҰɺCRAFTʹ͸ಧ͔ͣ F஋ Precision Recall
  15. ଟݴޠͷText Detectionͷ࣮ݧ݁ՌˠʮCRAFTS(paper)Ҏ֎ʹ͸উͬͨͥʂʯ ଟݴޠσʔλʢMLT19ʣΛର৅ʹςΩετݕग़Ͱੑೳൺֱͨ͠ Ὂ݁Ռɿ֓Ͷߴੑೳͱͳ͕ͬͨɺ།ҰɺCRAFT(paper)ʹ͸ಧ͔ͣ Average
 Precision F஋ Precision Recall CRAFTS(paper

    ) CRAFTͷ࿦จ͕ใࠂ͞Ε͍ͯΔ࣮ݧ݁Ռ CRAFT S ஶऀΒ͕CRAFTΛ࠶ݱ࣮૷ͯ͠ධՁͨ݁͠Ռ Single-head TextSpotte r ఏҊख๏ΛMulti-HeadʹͤͣSingle-HeadͰ8ݴޠͷશͯͷจࣈ(໿9000छ)ʹରԠͤͨ͞Ϟσϧ Multiplexed TextSpotte r ຊࢿྉͰ঺հ͍ͯ͠ΔఏҊख๏ɻ
  16. ͜͜ʹ஫໨ɿSoTA͡Όͳͯ͘΋CVPR࠾୒ʹ଍Δߩݙ͕͋Δʂ ςΩετݕग़λεΫʹͯݴޠผʹΈΔͱʢಛʹArabicͱChineseͰʣੑೳ޲্Λୡ੒͍ͯ͠Δ Average
 Precision F஋ Precision Recall F஋  

    ͦ΋ͦ΋ͷ໨త͸ϋϯυϦϯά͠΍͍͢ଟݴޠϞσϧΛఏҊ͢Δ͜ͱ ✓ ఏҊϞσϧ͸Multi-HeadͳͷͰݴޠͷ௥Ճ࡟আ͕༰қ👍 ✓ ໿10000ΫϥεͷSoftmaxΑΓ΋ඒ͍͠ߏ੒ͩͱݴ͑Δ👍 ✓ ͪΌΜͱଞͷOCRϞσϧͱಉ༷ʹE2EʹֶशՄೳ👍 CRAFTSͷੑೳʹ͸ಧ͔ͳ͔͕ͬͨվྑͷ༨஍͕·ͩ·ͩ͋Δ ✓ CRAFTS͸Link ScoreͷಋೖʹΑͬͯॎॻ͖ςΩετʹ΋ڧ͍ ✓ ॎॻ͖ςΩετʹର͢ΔೝࣝੑೳࠩͰউෛ͕෼͔Εͨͱߟ࡯͍ͯ͠Δ ✓ CRAFTSͷ޻෉఺͸ఏҊख๏ʹ΋ಋೖՄೳʢͦΕΛ࣮૷ͨ͠TextSpotter v4͕ۙʑൃද͞ΕͨΓͯ͠…!?ʣ ✓ ଞʹ΋ɺจࣈೝࣝثͷύϥϝλ਺΍ɺࣄલֶशσʔλྔ͕ɺCRAFTͷํ͕ང͔ʹଟ͍ɺͳͲͳͲɺɺɺ
  17. ࢀߟจݙ .BTL5FYU4QPUUFSW IUUQTBSYJWPSHQEGQEG .BTL5FYU4QPUUFSW IUUQTBSYJWPSHQEGQEG .BTL5FYU4QPUUFSW IUUQTBSYJWPSHQEGQEG $3"'5 IUUQTBSYJWPSHQEGQEG $3"'54

    IUUQTBSYJWPSHQEGQEG 8IBU*T8SPOH8JUI4DFOF5FYU3FDPHOJUJPO .PEFM$PNQBSJTPOT %BUBTFUBOE.PEFM "OBMZTJT IUUQTBSYJWPSHQEGQEG $IBS/FU IUUQTBSYJWPSHQEGQEG 5FYU'JFME-FBSOJOH"%FFQ%JSFDUJPO'JFMEGPS *SSFHVMBS4DFOF5FYU%FUFDUJPO IUUQTBSYJWPSHQEGQEG &"45"O&GGJDJFOUBOE"DDVSBUF4DFOF5FYU %FUFDUPS IUUQTBSYJWPSHQEGQEG 4UBDLFE)PVSHMBTT/FUXPSLT IUUQTBSYJWPSHQEGQEG %BUBTFUBOE.PEFM"OBMZTJT IUUQTBSYJWPSHQEGQEG 5PXBSET6ODPOTUSBJOFE&OEUP&OE5FYU 4QPUUJOH IUUQTBSYJWPSHQEGQEG
  18. ิ଍ɿσʔληοτ Ex. Language & Script Data Difficulty Annotation ICDAR 2017

    MLT dataset (MLT17) 9 languages representing 6 different scripts equally multi-oriented scene text annotated using quadrangle bounding boxes. ICDAR 2019 MLT dataset (MLT19) 10 languages representing 7 different scripts. multi-oriented scene text annotated using quadrangle bounding boxes. Total-Text dataset English language. wide variety of horizontal, multi-oriented and curved text annotated at word-level using polygon bounding boxes. ICDAR 2019 ArT dataset (ArT19) English and Chinese languages highly challenging arbitrarily shaped text annotated using arbitrary number of polygon vertices ICDAR 2017 RCTW dataset (RCTW17) Chinese scene text in Chinese drawing polygons to surround every text line ICDAR 2019 LSVT dataset (LSVT19) Chinese, but also has about 20% of its labels in English words. street view text in Chinese drawing polygons to surround every text line ICDAR 2013 dataset (IC13) English language horizontal text annotated at word-level using rectangular bounding boxes ICDAR 2015 dataset (IC15) English language multi-oriented scene text annotated at word-level using quadrangle bounding boxes. $IBSBDUFSMFWFMͷ"OOPUBUJPO͕ແ͍఺ɺݴޠ͕ภ͍ͬͯΔ఺ʹ஫໨ɻ