Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Collaborative Translation Efforts (CTE)

Collaborative Translation Efforts (CTE)

Linux Foundation Japan
PRO

September 20, 2019
Tweet

More Decks by Linux Foundation Japan

Other Decks in Technology

Transcript

  1. Collaborative Translation Efforts
    (CTE)
    18th July, [email protected]
    Masao Taniguchi,NEC

    View Slide

  2. whoami
     Marketing, Business Planning in DX (Digital Transformation)
     10 + years @ (so called ) OSPO @NEC
     Translation works with LF Translation team
     Open Source Audits in M&A transaction (Dr. Ibrahim Haddad)
     OpenChain Spec/Ref. Training Material 1.1, 1.2 (Congrats, 2.0 release!)
     LF Certification prep guide
     I’m NOT a developer, nor a professional translator
    Open Source Audits
    in M&A transaction
    Open Chain Reference
    Training Material
    Open Chain Spec 1.2
    LF Certfication prep guide

    View Slide

  3. Power of a Word
     A word is meme of culture
     A word sometimes refrects history, values,
    how they think, they live,and what they desire
    Word and Culture
    (Takao Suzuki)
    “ありがとう”/”Gracias”/Thank you

    Parent(s)


    見 “Standing”
    “Tree”
    “To look/care”
    “Pacific” sea = “太平洋”
    “忖度” = “Read the Atmothphere”

    View Slide

  4. Power of Phrase
     Good words, phrases sometimes /often change someone’s
    mind even her/his life (in good way)
    “継続は力なり”
    “Endurance makes you stronger.”
    “El que persevera alcanza.”

    View Slide

  5. Power of Translation
     Good translations bring positve change
    beyond culture
     Awareness,new notion, inspiration even
    innovation
     And Such good translations are brought
    by translators
    “A Translator in Edo era”
    Natsuko TODA, Subtitler,
    film industry interpreter
    ->
    Translated by Naoko Ota

    View Slide

  6. Translation work is…
     Wonderful experience
     Enlightening
     Sympathy with author
     Deep understanding
     Feeling of archivement
     It’s fun, a blink of fun
     But, Basically (Very^2) paiful experience
     Frastating
     Requred stoickness
     Takes time
     Lonely battle (nobody knows that)
     Feel exhauseted after achievement
     Few feedbak
     Few reward (Especially if you are volunteer)
    So…
    Fun Painful
    <<

    View Slide

  7. Motivation: Much more Fun in Translation
    Fun
    Painful
    >>

    View Slide

  8. Motivation: OSS Way is not widespread enough
     For Developer/Engineer, it’s natural even in Japan
     OSS Way is more and more important in DX business,
    but for traditional managers or for C-suite, it isn’t
     Not aware of the power of mass collaboration (yet)
     Cause of this may translation,OSS related translations are
    not ready enough, or are unseen or unreachable…?

    View Slide

  9. Vision of CTE
    (Tentative)
    More productive translation in “Open Source” way;
    with more Openness,more Collaboration,
    and much more Fun!
    Note: CTE is an idea from LF Japan, Nori (Fukuyasu) san

    View Slide

  10. Translation (CTE)

    View Slide

  11. Current: Tranlsations in LF resources
     60% (97/162) in Press (2018)
     7% (12/161) in Blog (2018)
     17% (5/30) in Publication
     91% (11/12) in Open Source Guide
     42% (9/21) in Open Source Good Readings
    [*] LF resources
    https://www.linuxfoundation.jp/newsroom/press/
    https://www.linuxfoundation.jp/newsroom/blog/
    https://www.linuxfoundation.jp/resources/publications/
    https://www.linuxfoundation.jp/resources/open-source-guides/
    https://www.linuxfoundation.org/open-source-guides-reading-list/

    View Slide

  12. Contents
    1. Challenges and Approach
    2. Demo
    3. Trials
    4. Thoughts so far
    5. Next steps

    View Slide

  13. Challenges and
    Approach

    View Slide

  14. Challenges in translation (especially for OSS resources)
    Select Review
    Translate Publish
     Challenges about “Process” (like Openchain initiative)
     Each process has bottleneck

    View Slide

  15. Challenges in translation (especially for OSS resources)
     “Quality(量)”: Dilemma b/w OSS expertise and Translation skills
    • The OSS related translation strongly requires OSS expertise such as
    technology and culture. Its tends to be hard even for professional
    translators.
    • Meanwhile, if you have good enough undestanding abouOSS,
    translation quality may not be good enough
     “Quantity(質)”: Unscalable processes
    • There exist bottlnecks for each process as in;
    • Processes such as Choose ,Prepare,Translate,Review,and Publish
    • Each process is too slow

    View Slide

  16. Challenges in translation (especially for OSS resources)
     Critical thinking and communication is important but…
     Challenge for “忖度(Sontaku: Read-the-atmosphere)” in review process
     You should focus on translation work itself
     But “who translated it “ accounts for much in your mind
     This can be risk to translation quality
    Who
    translated this?
    Is this really
    good translation?
    “Japanese Society”
    Author : Chie Nakane

    View Slide

  17. Quality v.s. Quantity(量と質の問題)
     Essentially “trade-off”, you think it hard to improve both
     Good translation needs much time, brain power
     This leads to less productivity
    Quantity
    (量)
    Qualty
    (質)
    Which do we tackle on at first ?

    View Slide

  18. CTE Basic Approach
     Putting priority on “Quality” requires money (sometimes costs much)
     So, focus on “Quantity” at first,
     To be “prolific”, above all
     Put priority on “speed”, “scalablity”
    (Similar to Cloud native approach Interestingly)
     Of course with “fun” Quantity
    (量)
    Qualty
    (質)

    View Slide

  19. Major bottlenecks though traslation procecess
    Review
    Translate Publish
    Management, Visualize
    Select Prepare
    Project A
    B
    C
    1. Single main translator must translate all
    ->It takes time depending on time and skill
    available for the translator
    ->Besides , quality also depends on his/her skil
    and the quality affects review prcess.
    2. Sequential reviews
    -> Review is done one by one sequentially and
    it takes time
    -> Besides, you often have do “review of review”
    it also takes time
    -> e-mail base communications take time
    (and need your brain power)
    3. Release
    -> Workload is focused on
    few people who have
    DTP or Document cosmetics skills ,
    This delays pubulishing.
    We focus on these processes here

    View Slide

  20. Approach (Cont’d)
    Prepar
    ation
    Sequencial Review
    Translation Publish
    Main Translator
    Reviewer A Reviewer B Reviewer C
    Preparation
    (None in some case )
    Translation
    (Machine)
    Weeks/Months Weeks/Months
    Seconds/
    Minuites
    Simultaneous
    Review
    Weeks
    Publish
    CTE approach
    Current processes
    Reviewer A, B, C
    (GOTO Next Project)

    View Slide

  21. Approach
     To be “prolific” (and scalable), above all
     Invigorate peer review, make it more fun
     More collaborators Use tools to automate
    • Tool 1: OmegaT as translation memory
    • Tool 2: Google Machine Translation
    -> Shorten the time in Translation prcess, and Focus on Review process
    • Tool 3: Hackmd(CodiMD)
    -> Simultaneous, realtime review (Edit markdown contents )
    • Tool 4: Slack
    -> Simultaneous, realtime review (Communication)
     Define Metrics and measure (Bigger is better)
    1. Speed Words /day
    2. Efficiency 1/[Hours/(person*words)]
    1/(Spend time per 1 word, per person)

    View Slide

  22. Approach
    Review
    Translate Publish
    Prepare
    We Focus on 2 procecces

    View Slide

  23. Demo

    View Slide

  24. Trials

    View Slide

  25. Starting point: OpenChain Spec1.0 Translation
     2,200 words
     E-mail base Discussion
     Translation (By me) :
     Duration : 1st March 2017 ~ 9th April 2017
     Work time: 50 h ( including email communication)
     Review : 3 reviewers (LF Kunai san/LF Sato san/Sony Fukuchi-san ) ~ 26th April 2017
     10h per person
    https://github.com/OpenChain-Project/Specification-Translation-JP/blob/master/RELEASE/v1.0/openchainspec-1.0_jp.pdf
    Metric 1: Speed : 55
    Metric 2: Efficiency : 100

    View Slide

  26. Trials: Basic Rule
     Measure/record time in your activity
    -> This is an important factor for evaluation indicators.
    -> Rough measurement is OK
     PERIODIC Online cross reviews in short time (1-2hours)
    -> Eliminate the time you think hard on your own.
    -> Do not disucss for long time. Let’s make reviews more casual
    -> Abstract and manage ToDo list
     Online review should be done via Chat tool
    -> Chat is more casual for many. We need diverse, more collaborators!
     Above all, make it fun!
    -> Enjoy original contents (Author’s idea, thought )
    -> Enjoy interaction ( Off topic is also important sometime)
    -> Enjoy progress (We are coming to the goal!)

    View Slide

  27. 1st Trial
    … Alone.
    (Try to measure the metrics)
    ぼっち

    View Slide

  28. 1st Trial (by 1 person)
     Objective : Measure and evaluate how much single person takes
     Target Resource:
    • TODO Group「Building Leadership in an Open Source Community」
    • Approx.3600 words (24,000 letters)
     # of person:1 (Taniguchi)
     Started: 28th Oct, 2018

    View Slide

  29. 1st Trial (by 1 person): Work in progress
    Edit pane(markdown)
    *Paste machine translation output here at first
    View pane

    View Slide

  30. 1st Trial (by 1 person): Result
     Outout:
    「オープンソース コミュニティのリーダーシップ」
    https://hackmd.io/aibsz3_JTqStRbyTdVO7rA
     Duration: 32days (28th Oct, 2018- 26th Nov, 2018)
    • Translation:Google Machine Translation -> 30 min. (manual work a little)
    • Review -> 870 min. (14.5h)
    • Review (correction by tool) -> 30 min.
    • Release (Image linking ) -> 30 min.
    • Sum 960 min. (16h)
    Note: Evalutation scores become worse because peer review is not included
    Metric 1: Speed : 113 (up)
    Metric 2: Efficiency : 225 (up)

    View Slide

  31. 1st Trial (by 1 person): Result
    https://hackmd.io/zgthoZZcTl-s3JXAg1pgkw?view

    View Slide

  32. 2nd Trial
    +
    … Collaboration a little bit
    ( Realtime review )
    Community Developer’s
    support

    View Slide

  33. 2nd Trial (2 persons)
     Obejective: Evaluate effectiveness by collaboration by 2 people
     Target material:
    • LF「 Certification Preparation Guide 」
    (1) DL site with introduction (104 words,)
    (2) PDF Slide (w/ 21 slides, 4,212 words, )
     # of person:2(Mieko-san and Taniguchi)
    ※Besides that, Inou-san from NEC Solution Innovator joined and did technical check as an engineer
     Started : 27th Dec 2018
    DL site PDF slide

    View Slide

  34. 2nd Trial ( 2 persons) Work in progres
    Simultaneous & realtime online review Chats in Slack

    View Slide

  35. 2nd Trial (2 persons): Result
     Duration:
    32 days (27th Dec, 2018 – 28th Jan, 2019( Incl. Release process by InDesign DTP)
    • Translation: Google Machine Tranlsation -> 35 min. (mainly Copy and Paste)
    • Review ( Self review) -> 1145 min. (19h:609min for Taniguchi), 540 min for Mieko-san)
    • Review ( Online cross-review) -> 595 min.(5 times)
    • Release (DTP by InDesign) -> 420 min.
    Sum 2190 min. (36h)
    • # of ToDo in online review: approx. 20 (closed all)
    Metric 1: Speed : 135 (up)
    Metric 2: Efficiency : 299 (up)
    * Note: scores must be (much) better than this, because this includes “Publish” process

    View Slide

  36. 2nd Trial: Result (Outcome)
    DL site introduction PDF Slide (Prep Guide)

    View Slide

  37. 3rd Trial (Ongoing)
    #kubernetes-docs-ja
    +
    … Collaboration a little bit more
    ( Launch HackMD site in Tokyo Region to increase realitime response)

    View Slide

  38. Case studies
     General info, but essential
     Business interest
     Good entrance for beginers
     But low priority for engineers
     We may contribute to the
    community ( a little )

    View Slide

  39. 3 Trial (3 persons, Ongoing )
     Obejective: Evaluate effectiveness by collaboration by 3+ people
     Target material: Kubernetes Case Stadies (40+ case studies)
    • 1st case study: China unicom
    • 1016 words
     # of person 3
     Started : 15th June 2019

    View Slide

  40. Result : 3 Trial (for now )
     From 15th to 30th June 2019
     Translation : 5 minutes
     Review
     Self : 290 min(4.8h) for 3 reviewers
     Online: 210 min (3.5h, Held twice)
     Publish
     HTML format
    Metric 1: Speed : 72 (down)
    Metric 2: Efficiency : 367 (up)

    View Slide

  41. Trials Round-up (for now)
    *1: Note: Actual scores become worse because peer review is not included
    *2: Note: Actual scores must be better than this, because this includes “Publish” process
    Efficiency
    Speed
    Start
    (55,100)
    1st trial
    (113,225)
    2nd trial
    (135,299)
    *2
    *1
    3rd trial (on going)
    (75,367)
    *2
    (Words / day)
    4 persons
    1 person
    2 persons (1 supporter)
    3 persons

    View Slide

  42. この次(現在進行中)
    https://todogroup.org/guides/marketing-open-source-projects/
    https://kubernetes.io/case-studies/appdirect/

    View Slide

  43. Thoughts (so far)
     To some extent translation can be faster and more efficient in
    Translation and Review processes
     And we see each process can be breakdown into “sub-processes”
     Above all, it’s becoming enjoyable
     Measuring metrics/and eveluatie them is meaningful
     But, this is not solid yet. We need to looking for better approaches.

    View Slide

  44. Thoughts (so far):プロセス内の必要タスク
    Select Review
    Translate Publish
    Management
    ・進捗管理・促進 ・場の提供 ・課題管理
    ・インセンティブの提供 ・人材やスキルの把握 ・読者からのフィードバック計測
    ・表記・ルールなどのメンテ ・品質チェック
    ・必要データ・ツール類の整備 ・メトリック評価
    ・Issueの発行
    ・CLA締結
    ・リポジトリのfork
    (コミュニティ、翻訳メモリ)
    ・機械翻訳の実施
    ・対訳作成
    ・オンラインレビュー準備
    ・作業時間の計測
    ・翻訳対象の決定
    ・ライセンス確認
    ・コミュニティとの調整
    ・ルールの決定
    ・ボリュームのチェック
    ・必要知識にあった
    メンバの招集、参加
    ・ツールの選定、準備
    ・アウトプットと作業
    スコープの明確化
    ・ルール・用語の共有
    ・オンラインレビュー
    ・作業時間の計測
    ・課題/TODO管理
    ・コミュニティでの
    確認
    ・マージ作業
    (コンテンツ、メモリ
    辞書・用語などなど)
    ・デザインルール決め
    ・フォーマット対応作業
    ・作業時間の計測
    ・アップストリームへの
    提供
    ・サイトへの掲載
    ・クレジット
    ・アナウンス

    View Slide

  45. Thoughts (so far)
     Quality? Yes, Quality and Quantitys are “tradeoffs”
     But both may be improbed (especially in 2nd trial)

    View Slide

  46. まとめ

    View Slide

  47. 翻訳における課題(より俯瞰的に)
    • プロセスのボトルネック(シーケンシャル、時間がかかる、リリースできない、非生産的)
    • 皆が同じようにつらい
    • インセンティブ
    【量の問題】
    • 冗長性とフラグメンテーション(個別最適・分散しすぎ)
     ツール・ポリシー・プロセス・ノウハウ・リソースの分散
    • 各プロセスのボトルネック
     シーケンシャルなプロセス
     CLA
     ファイルフォーマット依存(.html , .idd, .docx, .pptx, .md, .txt, ODF…)
     コミュニケーション・・・
    • 翻訳者、レビュワの確保
    【質の問題】
    • 遠慮、忖度
    • 人材とスキル(特にツールスキルは結構な障壁)
    • そもそも読者からのフィードバックがない(=Trial&Errorの仕組みの欠如)

    View Slide

  48. アプローチ(今回の活動の知見から)
    翻訳における効率化、負荷逓減そして楽しさ増大の可能性が垣間見れた
    • プロセスの効率化:機械翻訳のフル活用(遠慮、忖度の低減)
    • メトリックの設定、見える化
    • プロセスの細分化:タスクレベルでやるべきことが見える
    • レビューの進捗促進:日程・時間を決めてのオンライン同時レビュー(シーケンシャル⇒パラレル)
    • より楽しく(カジュアルな参加、チャット)
    • インセンティブ:コミュニティでのコントリビューション(インセンティブ)
    • 翻訳レビューへのエンジニアの参加で質を高める

    View Slide

  49. 残る課題(今回の活動の知見から)
    翻訳を取り巻く課題は多様、多数、複雑。全体を俯瞰して課題を明確にしながらTrial&Errorが必要
    • やっぱりツールがボトルネック?(OmegaT/Git/InDesign)
    • 人材育成・スキルアップ・ノウハウ共有⇒ Meetup
    • Slection/Publishプロセス
    • 全体管理プロセス、見える化
    • フラグメンテーションの低減:リソースの統一と共有(表記、翻訳メモリ、用語定義、ノウハウ。。。)
    • CLAなど手続きの簡略化(Community Hub)
    • コピーライト関連
    • 機械翻訳の質
    • さらなる自動化(メトリック自動測定)
    • さらなる見える化(全体進捗)
    • 役割の定義(カジュアルレビュワと、コアレビュワ)
    • カジュアルレビュワのインセンティブはどうするのか?
    • 市場価値の検証(読者からのフィードバックの測定)
    POC的活動。
    まずはできるところでの
    情報共有Trial & Errorで
    DevOps的に

    View Slide

  50. リソースの一元化(必要なところ、できるところから)
    https://hackmd.io/@maabou512/S1DlLROBH

    View Slide

  51. 機械翻訳の質
    “And this is why building and maintaining leadership in open source projects
    is key to corporate strategies and goals. However, it isn’t as easy as pounding
    desks and throwing around cash-based clout.”
    Open Source Guide: Building Leadership in an Open Source Community
    https://github.com/todogroup/guides/blob/master/building-leadership-in-an-open-source-community.md
    “オープンソースプロジェクトでリーダーシップを築き維持することが企業の戦略と
    目標の鍵となるのはこのためです。ただし、デスクを叩いたり、現金を使用した
    効果的な効果を出したりするのと同じくらい簡単ではありません。”
    That's why building and maintaining leadership with open source projects is the
    key to a company's strategy and goals. However, it's not as easy as hitting a desk
    or using cash for an effective effect
    Translated Japanese is weird….(Grammar is OK , but ..)

    View Slide

  52. View Slide

  53. カジュアルレビュワ
    コアレビュワ

    View Slide

  54. その先の野望
    • 独自の機械翻訳モデル(個人・グループ共有)
    • 下地を固めて、もっとクリエイティブな世界へ(翻訳者のレベル⇒翻訳者の個性)
    • 翻訳業界とのつながり・連携
    • もっと産業横断的な活動へ
    • いずれは大作、違うジャンル、ベストセラー本も?

    View Slide

  55. ありがとうございました。
    CollaboTrans.slack.com

    View Slide