Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NAVER Clova OCR

NAVER Clova OCR

Hwalsuk Lee
NAVER OCR Team AI Researcher
https://linedevday.linecorp.com/jp/2019/sessions/D1-7

LINE DevDay 2019

November 20, 2019
Tweet

More Decks by LINE DevDay 2019

Other Decks in Technology

Transcript

  1. 2019 DevDay
    NAVER Clova OCR
    > Hwalsuk Lee
    > NAVER OCR Team AI Researcher

    View Slide

  2. OCR (Optical Character Recognition)
    What is OCR?

    View Slide

  3. Text
    Detection
    OCR = Text Detection + Text Recognition
    Optical
    Character
    Recognition
    Text
    Recognition
    SAKE
    BAR

    Text Area UTF-8
    Image CVPR 2019 ICCV 2019

    View Slide

  4. Isn’t OCR a conquered technology?

    View Slide

  5. Subtitle 30pt / Arial / Normal
    Rapid Improvements with Deep Learning
    Japanese
    vertical
    Very small
    Rotated

    View Slide

  6. Subtitle 30pt / Arial / Normal
    Rapid Improvements with Deep Learning
    Detect simple
    horizontal texts
    Japanese
    vertical
    Very small
    Rotated

    View Slide

  7. Subtitle 30pt / Arial / Normal
    Rapid Improvements with Deep Learning
    Clova OCR
    Detect small,
    vertical,
    multilingual texts
    Japanese
    vertical
    Very small
    Rotated

    View Slide

  8. Agenda
    1. Things to Know When Applying Recent OCR Approaches to Japanese
    3. Clova OCR Text Recognizer ICCV 2019
    2. Clova OCR Text Detector CVPR 2019
    4. Full Pipelines for Clova OCR

    View Slide

  9. Things to Know
    When Applying Recent OCR
    Approaches to Japanese

    View Slide

  10. Number of Deep Learning Papers for OCR

    View Slide

  11. Deep Learning (DL) vs.. Not DL
    lDL
    lNot DL
    Detection Recognition

    View Slide

  12. Most Papers Primarily Deal with English!

    View Slide

  13. KEY Difference 1: No Space Between Words

    View Slide

  14. KEY Difference 2: Vertical Writing

    View Slide

  15. KEY Difference 3: Number of Characters
    English: 26 Japanese: Over 3K

    View Slide

  16. Character Region Awareness for Text Detection, CVPR 2019
    Text Detection for Japanese

    View Slide

  17. How to Deal with Long Sentences?
    Existing papers are based on word / line unit detection
    Word/Line
    Level
    We detect them character by character and combine them!
    Character
    Level

    View Slide

  18. Introduction | Why is it So Difficult?
    Extreme aspect ratio Variety of character sizes
    Shape distortion
    à Caused by line/word detection!

    View Slide

  19. Introduction | Why is it So Difficult?
    Extreme aspect ratio Variety of character sizes
    Shape distortion
    à Caused by line/word detection!
    How about character-level detection?

    View Slide

  20. Approach | Character-level Detection
    Then, what are the necessary annotations?
    Character-level ground truth example Too expensive!

    View Slide

  21. Approach | Character-level Detection
    Weakly Supervised Training!
    Hakan Bilen “Weakly supervised object detection”, CVPR 2018 Workshop.
    Weak
    Supervision
    Strong
    Supervision

    View Slide

  22. CRAFT | Definition of Model Outputs
    Character Region Awareness for Text detection
    Image
    (h×w×3)
    Region score
    (h/2×w/2×1)
    Affinity score
    (h/2×w/2×1)
    The probability that the given
    pixel is the center of the character
    The center probability of the space
    between adjacent characters

    View Slide

  23. CRAFT | Definition of Model Outputs
    Character Region Awareness for Text detection
    Image
    (h×w×3)
    Region score
    (h/2×w/2×1)
    Affinity score
    (h/2×w/2×1)
    The probability that the given
    pixel is the center of the character
    The center probability of the space
    between adjacent characters
    to find individual character areas
    to locate line/word level areas

    View Slide

  24. CRAFT | Definition of Model Outputs
    Ground Truth Label Generation
    Region Score GT
    Character Boxes
    When annotations of
    character-level are
    provided
    Create 2D gaussian
    distribution for each
    rectangular shape

    View Slide

  25. CRAFT | Definition of Model Outputs
    Ground Truth Label Generation
    Region Score GT
    Character Boxes
    Affinity Box Generation
    Center of a character box
    Center of a triangle
    Character box
    Affinity box
    When annotations of
    character-level are
    provided
    Create 2D gaussian
    distribution for each
    rectangular shape
    Generate affinity box
    from adjacent two
    character-boxes

    View Slide

  26. CRAFT | Definition of Model Outputs
    Ground Truth Label Generation
    Region Score GT
    Character Boxes
    Affinity Box Generation
    Center of a character box
    Center of a triangle
    Character box
    Affinity box
    Affinity Score GT
    Affinity Boxes
    When annotations of
    character-level are
    provided
    Create 2D gaussian
    distribution for each
    rectangular shape
    Generate affinity box
    from adjacent two
    character-boxes

    View Slide

  27. CRAFT | Training
    Weakly Supervised Learning
    Real Image
    Synthetic Image
    Loss
    Loss
    Train with Real Image
    Train with Synthetic Image
    Synthetic GT
    ?
    Word/line level annotations Character level annotation

    View Slide

  28. CRAFT | Training
    Weakly Supervised Learning
    Real Image
    Synthetic Image
    Cropped
    Splitting
    Characters
    Loss
    Loss
    Generate Pseudo-GT
    Train with Real Image
    Train with Synthetic Image
    (6/6)
    (5/7)
    (5/6)
    Synthetic GT
    Word/line level annotations Character level annotation
    ?
    Region
    scores

    View Slide

  29. CRAFT | Training
    Weakly Supervised Learning
    Real Image
    Synthetic Image
    Cropped
    Splitting
    Characters
    Loss
    Loss
    Confidence map
    Generate Pseudo-GT
    Train with Real Image
    Train with Synthetic Image
    (6/6)
    (5/7)
    (5/6)
    Synthetic GT
    Pseudo GT
    Objective function:
    Word/line level annotations Character level annotation
    Region
    scores

    View Slide

  30. CRAFT | Training
    Ground Truth Label Generation

    View Slide

  31. CRAFT | Training
    Character Region Scores Changes as Learning Progresses
    Epoch #1
    Epoch #2
    Epoch #3
    Epoch #4
    Epoch #10
    Charbox
    Wordbox
    . . .
    . . .

    View Slide

  32. CRAFT | Post-processing
    Rectangle Bounding Box
    : Character region
    : Affinity region
    Map Binarization Merging & Labeling

    View Slide

  33. CRAFT | Post-processing
    Rectangle Bounding Box
    ü Simple yet efficient!
    : Character region
    : Affinity region
    : Minimum bounding box
    Map Binarization Merging & Labeling
    Minimum Bounding
    Box

    View Slide

  34. Results | TotalText Dataset

    View Slide

  35. Results | TotalText Dataset

    View Slide

  36. Analysis | Improved Recognition Performance
    Correction of Curved Text

    View Slide

  37. Movie

    View Slide

  38. Summary

    View Slide

  39. What is Wrong with Scene Text Recognition Model Comparisons?
    Dataset and Model Analysis, ICCV 2019, Oral
    Text Recognition for Japanese

    View Slide

  40. How to Handle 100X Larger Characters?
    English: 26 Japanese: Over 3K

    View Slide

  41. “UNITED”
    Image Patch with Text
    What is Scene Text Recognition (STR)?
    Character String

    View Slide

  42. What are Wrong with STR Comparisons?

    View Slide

  43. What is Wrong with STR Comparisons?
    1. Different training datasets.

    View Slide

  44. What is Wrong with STR Comparisons?
    1. Different training datasets.
    2. Different evaluation datasets.
    ‘’ means ‘not reported in paper’

    View Slide

  45. What is Wrong with STR Comparisons?
    1. Different training datasets.
    2. Different evaluation datasets.
    3. Speed and memory are not always evaluated.

    View Slide

  46. Our Solution: Unified STR Evaluation
    1. Unified training data.
    2. Unified evaluation data.
    3. Time and memory evaluation.
    1. 2. 3.

    View Slide

  47. Which Module Contributes to the Performance?
    Transformation Feature Extraction Sequence Modeling Prediction

    View Slide

  48. 2 3 2 2
    x x x = 24

    View Slide

  49. 49
    Accuracy vs. Time
    Accuracy vs. Speed

    View Slide

  50. 50

    View Slide

  51. 51

    View Slide

  52. 52

    View Slide

  53. Analysis: Trade-off Plot
    The best accuracy model is not an existing combination. (TPS+ResNet+BiLSTM+Attn)

    View Slide

  54. Analysis: The Best Accuracy Model

    View Slide

  55. Summary

    View Slide

  56. Full Pipelines for Clova OCR

    View Slide

  57. Full Pipeline
    Text
    Detection
    Compensate rotation
    for each box
    Boxes
    Angles

    View Slide

  58. Full Pipeline
    Text
    Detection
    Compensate rotation
    for each box
    Boxes
    Angles
    Text
    Recognition
    One Model for
    Horizontal/Vertical
    in JPN/KOR/ENG
    Welcome to JAPAN

    잘 오셨습니다

    View Slide

  59. https://ocrdemo.linebrain.ai

    View Slide

  60. Thank You for Listening

    View Slide