NAVER Clova OCR

NAVER Clova OCR

Hwalsuk Lee
NAVER OCR Team AI Researcher
https://linedevday.linecorp.com/jp/2019/sessions/D1-7

Be4518b119b8eb017625e0ead20f8fe7?s=128

LINE DevDay 2019

November 20, 2019
Tweet

Transcript

  1. 3.

    Text Detection OCR = Text Detection + Text Recognition Optical

    Character Recognition Text Recognition SAKE BAR  Text Area UTF-8 Image CVPR 2019 ICCV 2019
  2. 5.

    Subtitle 30pt / Arial / Normal Rapid Improvements with Deep

    Learning Japanese vertical Very small Rotated
  3. 6.

    Subtitle 30pt / Arial / Normal Rapid Improvements with Deep

    Learning Detect simple horizontal texts Japanese vertical Very small Rotated
  4. 7.

    Subtitle 30pt / Arial / Normal Rapid Improvements with Deep

    Learning Clova OCR Detect small, vertical, multilingual texts Japanese vertical Very small Rotated
  5. 8.

    Agenda 1. Things to Know When Applying Recent OCR Approaches

    to Japanese 3. Clova OCR Text Recognizer ICCV 2019 2. Clova OCR Text Detector CVPR 2019 4. Full Pipelines for Clova OCR
  6. 17.

    How to Deal with Long Sentences? Existing papers are based

    on word / line unit detection Word/Line Level We detect them character by character and combine them! Character Level
  7. 18.

    Introduction | Why is it So Difficult? Extreme aspect ratio

    Variety of character sizes Shape distortion à Caused by line/word detection!
  8. 19.

    Introduction | Why is it So Difficult? Extreme aspect ratio

    Variety of character sizes Shape distortion à Caused by line/word detection! How about character-level detection?
  9. 20.

    Approach | Character-level Detection Then, what are the necessary annotations?

    Character-level ground truth example Too expensive!
  10. 21.

    Approach | Character-level Detection Weakly Supervised Training! Hakan Bilen “Weakly

    supervised object detection”, CVPR 2018 Workshop. Weak Supervision Strong Supervision
  11. 22.

    CRAFT | Definition of Model Outputs Character Region Awareness for

    Text detection Image (h×w×3) Region score (h/2×w/2×1) Affinity score (h/2×w/2×1) The probability that the given pixel is the center of the character The center probability of the space between adjacent characters
  12. 23.

    CRAFT | Definition of Model Outputs Character Region Awareness for

    Text detection Image (h×w×3) Region score (h/2×w/2×1) Affinity score (h/2×w/2×1) The probability that the given pixel is the center of the character The center probability of the space between adjacent characters to find individual character areas to locate line/word level areas
  13. 24.

    CRAFT | Definition of Model Outputs Ground Truth Label Generation

    Region Score GT Character Boxes When annotations of character-level are provided Create 2D gaussian distribution for each rectangular shape
  14. 25.

    CRAFT | Definition of Model Outputs Ground Truth Label Generation

    Region Score GT Character Boxes Affinity Box Generation Center of a character box Center of a triangle Character box Affinity box When annotations of character-level are provided Create 2D gaussian distribution for each rectangular shape Generate affinity box from adjacent two character-boxes
  15. 26.

    CRAFT | Definition of Model Outputs Ground Truth Label Generation

    Region Score GT Character Boxes Affinity Box Generation Center of a character box Center of a triangle Character box Affinity box Affinity Score GT Affinity Boxes When annotations of character-level are provided Create 2D gaussian distribution for each rectangular shape Generate affinity box from adjacent two character-boxes
  16. 27.

    CRAFT | Training Weakly Supervised Learning Real Image Synthetic Image

    Loss Loss Train with Real Image Train with Synthetic Image Synthetic GT ? Word/line level annotations Character level annotation
  17. 28.

    CRAFT | Training Weakly Supervised Learning Real Image Synthetic Image

    Cropped Splitting Characters Loss Loss Generate Pseudo-GT Train with Real Image Train with Synthetic Image (6/6) (5/7) (5/6) Synthetic GT Word/line level annotations Character level annotation ? Region scores
  18. 29.

    CRAFT | Training Weakly Supervised Learning Real Image Synthetic Image

    Cropped Splitting Characters Loss Loss Confidence map Generate Pseudo-GT Train with Real Image Train with Synthetic Image (6/6) (5/7) (5/6) Synthetic GT Pseudo GT Objective function: Word/line level annotations Character level annotation Region scores
  19. 31.

    CRAFT | Training Character Region Scores Changes as Learning Progresses

    Epoch #1 Epoch #2 Epoch #3 Epoch #4 Epoch #10 Charbox Wordbox . . . . . .
  20. 32.

    CRAFT | Post-processing Rectangle Bounding Box : Character region :

    Affinity region Map Binarization Merging & Labeling
  21. 33.

    CRAFT | Post-processing Rectangle Bounding Box ü Simple yet efficient!

    : Character region : Affinity region : Minimum bounding box Map Binarization Merging & Labeling Minimum Bounding Box
  22. 37.
  23. 38.
  24. 39.

    What is Wrong with Scene Text Recognition Model Comparisons? Dataset

    and Model Analysis, ICCV 2019, Oral Text Recognition for Japanese
  25. 44.

    What is Wrong with STR Comparisons? 1. Different training datasets.

    2. Different evaluation datasets. ‘’ means ‘not reported in paper’
  26. 45.

    What is Wrong with STR Comparisons? 1. Different training datasets.

    2. Different evaluation datasets. 3. Speed and memory are not always evaluated.
  27. 46.

    Our Solution: Unified STR Evaluation 1. Unified training data. 2.

    Unified evaluation data. 3. Time and memory evaluation. 1. 2. 3.
  28. 50.

    50

  29. 51.

    51

  30. 52.

    52

  31. 53.

    Analysis: Trade-off Plot The best accuracy model is not an

    existing combination. (TPS+ResNet+BiLSTM+Attn)
  32. 55.
  33. 58.

    Full Pipeline Text Detection Compensate rotation for each box Boxes

    Angles Text Recognition One Model for Horizontal/Vertical in JPN/KOR/ENG Welcome to JAPAN  잘 오셨습니다