en.cookpad.com
- import JP from cookpad.com
- have translators / reviewers make an EN version
- started a few months ago using Transifex
- site is publicly accessible since Aug 5th.
Friday, August 30, 13
Slide 3
Slide 3 text
Transifex
Friday, August 30, 13
Slide 4
Slide 4 text
in house translation
Friday, August 30, 13
Slide 5
Slide 5 text
Opportunities
- Need to clone Transifex’s “good” features:
- Semi automated phrases translation
- Glossary / suggestions
- Progress indicator etc.
Friday, August 30, 13
Slide 6
Slide 6 text
Today
- Need to clone Transifex’s “good” features:
- Semi automated phrases translation
- Glossary / suggestions
- Progress indicator etc.
Friday, August 30, 13
Slide 7
Slide 7 text
Ingredients translation
- Restricted vocabulary
- (Almost) no grammar involved
- Not fun to translate
- Waste of time & quality goes down
Friday, August 30, 13
Slide 8
Slide 8 text
Challenges in translation
- word ambiguity (book a flight / read a book)
- word order (English / Japanese)
- pronouns meaning ([...], it is good)
- etc...
Friday, August 30, 13
Slide 9
Slide 9 text
In our case
- JP doesn’t have plural forms
- ۄͶ͗ → “Onion”? “Onions”?
- Can’t parse the quantities easily
- Quantity ambiguities
- ʮେ1ʯ →ʮେ̍͞͡ʯɺʮେ͖Ίͷʓʓ̍ʯ
- ʮ̎ʯ→ “2 cloves”? “2 slices”? ...
Friday, August 30, 13
Slide 10
Slide 10 text
How do
machine translation
system work?
Introduction to MT
Reference:
Friday, August 30, 13
Slide 11
Slide 11 text
Direct translation
word by word. no analysis. rules based.
translating “much”:
if previous word is ... then ...
else if previous word is ... and next is ... then ...
.............
Friday, August 30, 13
Slide 12
Slide 12 text
Direct translation
can achieve something for EN to FR but
EN to JP is a different story
don’t know anything about the context
“he said that ...” / “I like that car”
Friday, August 30, 13
Slide 13
Slide 13 text
Transfer & interlingua
based systems
analyze the data
build a representation of the meaning of a sentence
that is independent of the language
anyway, it’s just crazy s**t
Friday, August 30, 13
Slide 14
Slide 14 text
Statistical MT
- build parallel corpora with conditional
probabilities for each sentence
- fetch the most likely translation for a given
sentence
Concept:
use sample sentences translated in both languages
In practice:
Friday, August 30, 13
Slide 15
Slide 15 text
Implementation
Dead simple (for now)
- Try a perfect “name quantity” match
- Fallback to “name” only
- done
The more the translations, the better the system
Friday, August 30, 13