Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NAVER Clova OCR

NAVER Clova OCR

Hwalsuk Lee
NAVER OCR Team AI Researcher
https://linedevday.linecorp.com/jp/2019/sessions/D1-7

Be4518b119b8eb017625e0ead20f8fe7?s=128

LINE DevDay 2019

November 20, 2019
Tweet

More Decks by LINE DevDay 2019

Other Decks in Technology

Transcript

  1. 2019 DevDay NAVER Clova OCR > Hwalsuk Lee > NAVER

    OCR Team AI Researcher
  2. OCR (Optical Character Recognition) What is OCR?

  3. Text Detection OCR = Text Detection + Text Recognition Optical

    Character Recognition Text Recognition SAKE BAR  Text Area UTF-8 Image CVPR 2019 ICCV 2019
  4. Isn’t OCR a conquered technology?

  5. Subtitle 30pt / Arial / Normal Rapid Improvements with Deep

    Learning Japanese vertical Very small Rotated
  6. Subtitle 30pt / Arial / Normal Rapid Improvements with Deep

    Learning Detect simple horizontal texts Japanese vertical Very small Rotated
  7. Subtitle 30pt / Arial / Normal Rapid Improvements with Deep

    Learning Clova OCR Detect small, vertical, multilingual texts Japanese vertical Very small Rotated
  8. Agenda 1. Things to Know When Applying Recent OCR Approaches

    to Japanese 3. Clova OCR Text Recognizer ICCV 2019 2. Clova OCR Text Detector CVPR 2019 4. Full Pipelines for Clova OCR
  9. Things to Know When Applying Recent OCR Approaches to Japanese

  10. Number of Deep Learning Papers for OCR

  11. Deep Learning (DL) vs.. Not DL lDL lNot DL Detection

    Recognition
  12. Most Papers Primarily Deal with English!

  13. KEY Difference 1: No Space Between Words

  14. KEY Difference 2: Vertical Writing

  15. KEY Difference 3: Number of Characters English: 26 Japanese: Over

    3K
  16. Character Region Awareness for Text Detection, CVPR 2019 Text Detection

    for Japanese
  17. How to Deal with Long Sentences? Existing papers are based

    on word / line unit detection Word/Line Level We detect them character by character and combine them! Character Level
  18. Introduction | Why is it So Difficult? Extreme aspect ratio

    Variety of character sizes Shape distortion à Caused by line/word detection!
  19. Introduction | Why is it So Difficult? Extreme aspect ratio

    Variety of character sizes Shape distortion à Caused by line/word detection! How about character-level detection?
  20. Approach | Character-level Detection Then, what are the necessary annotations?

    Character-level ground truth example Too expensive!
  21. Approach | Character-level Detection Weakly Supervised Training! Hakan Bilen “Weakly

    supervised object detection”, CVPR 2018 Workshop. Weak Supervision Strong Supervision
  22. CRAFT | Definition of Model Outputs Character Region Awareness for

    Text detection Image (h×w×3) Region score (h/2×w/2×1) Affinity score (h/2×w/2×1) The probability that the given pixel is the center of the character The center probability of the space between adjacent characters
  23. CRAFT | Definition of Model Outputs Character Region Awareness for

    Text detection Image (h×w×3) Region score (h/2×w/2×1) Affinity score (h/2×w/2×1) The probability that the given pixel is the center of the character The center probability of the space between adjacent characters to find individual character areas to locate line/word level areas
  24. CRAFT | Definition of Model Outputs Ground Truth Label Generation

    Region Score GT Character Boxes When annotations of character-level are provided Create 2D gaussian distribution for each rectangular shape
  25. CRAFT | Definition of Model Outputs Ground Truth Label Generation

    Region Score GT Character Boxes Affinity Box Generation Center of a character box Center of a triangle Character box Affinity box When annotations of character-level are provided Create 2D gaussian distribution for each rectangular shape Generate affinity box from adjacent two character-boxes
  26. CRAFT | Definition of Model Outputs Ground Truth Label Generation

    Region Score GT Character Boxes Affinity Box Generation Center of a character box Center of a triangle Character box Affinity box Affinity Score GT Affinity Boxes When annotations of character-level are provided Create 2D gaussian distribution for each rectangular shape Generate affinity box from adjacent two character-boxes
  27. CRAFT | Training Weakly Supervised Learning Real Image Synthetic Image

    Loss Loss Train with Real Image Train with Synthetic Image Synthetic GT ? Word/line level annotations Character level annotation
  28. CRAFT | Training Weakly Supervised Learning Real Image Synthetic Image

    Cropped Splitting Characters Loss Loss Generate Pseudo-GT Train with Real Image Train with Synthetic Image (6/6) (5/7) (5/6) Synthetic GT Word/line level annotations Character level annotation ? Region scores
  29. CRAFT | Training Weakly Supervised Learning Real Image Synthetic Image

    Cropped Splitting Characters Loss Loss Confidence map Generate Pseudo-GT Train with Real Image Train with Synthetic Image (6/6) (5/7) (5/6) Synthetic GT Pseudo GT Objective function: Word/line level annotations Character level annotation Region scores
  30. CRAFT | Training Ground Truth Label Generation

  31. CRAFT | Training Character Region Scores Changes as Learning Progresses

    Epoch #1 Epoch #2 Epoch #3 Epoch #4 Epoch #10 Charbox Wordbox . . . . . .
  32. CRAFT | Post-processing Rectangle Bounding Box : Character region :

    Affinity region Map Binarization Merging & Labeling
  33. CRAFT | Post-processing Rectangle Bounding Box ü Simple yet efficient!

    : Character region : Affinity region : Minimum bounding box Map Binarization Merging & Labeling Minimum Bounding Box
  34. Results | TotalText Dataset

  35. Results | TotalText Dataset

  36. Analysis | Improved Recognition Performance Correction of Curved Text

  37. Movie

  38. Summary

  39. What is Wrong with Scene Text Recognition Model Comparisons? Dataset

    and Model Analysis, ICCV 2019, Oral Text Recognition for Japanese
  40. How to Handle 100X Larger Characters? English: 26 Japanese: Over

    3K
  41. “UNITED” Image Patch with Text What is Scene Text Recognition

    (STR)? Character String
  42. What are Wrong with STR Comparisons?

  43. What is Wrong with STR Comparisons? 1. Different training datasets.

  44. What is Wrong with STR Comparisons? 1. Different training datasets.

    2. Different evaluation datasets. ‘’ means ‘not reported in paper’
  45. What is Wrong with STR Comparisons? 1. Different training datasets.

    2. Different evaluation datasets. 3. Speed and memory are not always evaluated.
  46. Our Solution: Unified STR Evaluation 1. Unified training data. 2.

    Unified evaluation data. 3. Time and memory evaluation. 1. 2. 3.
  47. Which Module Contributes to the Performance? Transformation Feature Extraction Sequence

    Modeling Prediction
  48. 2 3 2 2 x x x = 24

  49. 49 Accuracy vs. Time Accuracy vs. Speed

  50. 50

  51. 51

  52. 52

  53. Analysis: Trade-off Plot The best accuracy model is not an

    existing combination. (TPS+ResNet+BiLSTM+Attn)
  54. Analysis: The Best Accuracy Model

  55. Summary

  56. Full Pipelines for Clova OCR

  57. Full Pipeline Text Detection Compensate rotation for each box Boxes

    Angles
  58. Full Pipeline Text Detection Compensate rotation for each box Boxes

    Angles Text Recognition One Model for Horizontal/Vertical in JPN/KOR/ENG Welcome to JAPAN  잘 오셨습니다
  59. https://ocrdemo.linebrain.ai

  60. Thank You for Listening