Slide 1

Slide 1 text

pseudo SMT @ cookpad Friday, August 30, 13

Slide 2

Slide 2 text

en.cookpad.com - import JP from cookpad.com - have translators / reviewers make an EN version - started a few months ago using Transifex - site is publicly accessible since Aug 5th. Friday, August 30, 13

Slide 3

Slide 3 text

Transifex Friday, August 30, 13

Slide 4

Slide 4 text

in house translation Friday, August 30, 13

Slide 5

Slide 5 text

Opportunities - Need to clone Transifex’s “good” features: - Semi automated phrases translation - Glossary / suggestions - Progress indicator etc. Friday, August 30, 13

Slide 6

Slide 6 text

Today - Need to clone Transifex’s “good” features: - Semi automated phrases translation - Glossary / suggestions - Progress indicator etc. Friday, August 30, 13

Slide 7

Slide 7 text

Ingredients translation - Restricted vocabulary - (Almost) no grammar involved - Not fun to translate - Waste of time & quality goes down Friday, August 30, 13

Slide 8

Slide 8 text

Challenges in translation - word ambiguity (book a flight / read a book) - word order (English / Japanese) - pronouns meaning ([...], it is good) - etc... Friday, August 30, 13

Slide 9

Slide 9 text

In our case - JP doesn’t have plural forms - ۄͶ͗ → “Onion”? “Onions”? - Can’t parse the quantities easily - Quantity ambiguities - ʮେ1ʯ →ʮେ̍͞͡ʯɺʮେ͖Ίͷʓʓ̍ʯ - ʮ̎ʯ→ “2 cloves”? “2 slices”? ... Friday, August 30, 13

Slide 10

Slide 10 text

How do machine translation system work? Introduction to MT Reference: Friday, August 30, 13

Slide 11

Slide 11 text

Direct translation word by word. no analysis. rules based. translating “much”: if previous word is ... then ... else if previous word is ... and next is ... then ... ............. Friday, August 30, 13

Slide 12

Slide 12 text

Direct translation can achieve something for EN to FR but EN to JP is a different story don’t know anything about the context “he said that ...” / “I like that car” Friday, August 30, 13

Slide 13

Slide 13 text

Transfer & interlingua based systems analyze the data build a representation of the meaning of a sentence that is independent of the language anyway, it’s just crazy s**t Friday, August 30, 13

Slide 14

Slide 14 text

Statistical MT - build parallel corpora with conditional probabilities for each sentence - fetch the most likely translation for a given sentence Concept: use sample sentences translated in both languages In practice: Friday, August 30, 13

Slide 15

Slide 15 text

Implementation Dead simple (for now) - Try a perfect “name quantity” match - Fallback to “name” only - done The more the translations, the better the system Friday, August 30, 13

Slide 16

Slide 16 text

Demo Friday, August 30, 13