Nov. 03, 2018 Pointwise HSIC (EMNLP 2018) task: compute co-occurrence strength of word bigrams to extract collocations Example 1: Collocation Extraction [Manning&Schütze,’99] Have you you ever York ? … observed data: word bigrams from corpora
HSIC (EMNLP 2018) task: compute co-occurrence strength of input-response sentence pairs to rank the candidate responses I've lost my wallet I saw it at the … I've lost my wallet I’m so sleepy I've lost my wallet I don’t know input by users response candidates Example 2: Dialogue Response Selection [Lowe+,’15] observed data: input-response message pairs from dialogue corpora I'm hungry! Let’s have lunch Will it rain today? It’s about to rain I love this manga I don’t know …
PMI(, ) = log (|) () <latexit sha1_base64="iZm2t2viqgME5d2mBcyJL6SmJMc=">AAADD3icjVJBb9MwFHbCYCOw0cGRi0VVqZXGlCAkdkGatAs7bCqIbpOaqnKcl9aKY0e2wxqFiN/Ar+GGuPITuPBbcNoA7caBJ1n69L3v+T2/z1HOmTa+/8Nx72zdvbe9c9978HB371Fn//GFloWiMKKSS3UVEQ2cCRgZZjhc5QpIFnG4jNKTJn/5AZRmUrw3ZQ6TjMwESxglxlLTziK8ZjHMianCjJi5yqrh2Wld9xcH5QC/xiGXszBRhFZ/dbrMogQP6+mfknfn53WN++XHxaD+L+Gg9qadrn/oLwPfBkELuqiN4XTf+RTGkhYZCEM50Xoc+LmZVEQZRjnUXlhoyAlNyQzGFgqSgZ5UyxXVuGeZGCdS2SMMXrLrFRXJdDOvVTaz6pu5hvxXblyY5GhSMZEXBgRdNUoKjo3Ezb5xzBRQw0sLCFXMzorpnNiNGuvKRpdCMCpjeL5qtZFqZtY5UPtIDSYjTDRMdWJ9VoysWE2EblnOLP1ba+brWnzW3O7hHs7lNahcsuU2stz+B/tnvB5e75s25Qc4BZUyXVvLgpsG3QYXLw4Di9++7B6ftubtoKfoGeqjAL1Cx+gNGqIRouins+XsOnvuZ/eL+9X9tpK6TlvzBG2E+/0XCTb9Bg==</latexit> <latexit sha1_base64="iZm2t2viqgME5d2mBcyJL6SmJMc=">AAADD3icjVJBb9MwFHbCYCOw0cGRi0VVqZXGlCAkdkGatAs7bCqIbpOaqnKcl9aKY0e2wxqFiN/Ar+GGuPITuPBbcNoA7caBJ1n69L3v+T2/z1HOmTa+/8Nx72zdvbe9c9978HB371Fn//GFloWiMKKSS3UVEQ2cCRgZZjhc5QpIFnG4jNKTJn/5AZRmUrw3ZQ6TjMwESxglxlLTziK8ZjHMianCjJi5yqrh2Wld9xcH5QC/xiGXszBRhFZ/dbrMogQP6+mfknfn53WN++XHxaD+L+Gg9qadrn/oLwPfBkELuqiN4XTf+RTGkhYZCEM50Xoc+LmZVEQZRjnUXlhoyAlNyQzGFgqSgZ5UyxXVuGeZGCdS2SMMXrLrFRXJdDOvVTaz6pu5hvxXblyY5GhSMZEXBgRdNUoKjo3Ezb5xzBRQw0sLCFXMzorpnNiNGuvKRpdCMCpjeL5qtZFqZtY5UPtIDSYjTDRMdWJ9VoysWE2EblnOLP1ba+brWnzW3O7hHs7lNahcsuU2stz+B/tnvB5e75s25Qc4BZUyXVvLgpsG3QYXLw4Di9++7B6ftubtoKfoGeqjAL1Cx+gNGqIRouins+XsOnvuZ/eL+9X9tpK6TlvzBG2E+/0XCTb9Bg==</latexit> <latexit sha1_base64="iZm2t2viqgME5d2mBcyJL6SmJMc=">AAADD3icjVJBb9MwFHbCYCOw0cGRi0VVqZXGlCAkdkGatAs7bCqIbpOaqnKcl9aKY0e2wxqFiN/Ar+GGuPITuPBbcNoA7caBJ1n69L3v+T2/z1HOmTa+/8Nx72zdvbe9c9978HB371Fn//GFloWiMKKSS3UVEQ2cCRgZZjhc5QpIFnG4jNKTJn/5AZRmUrw3ZQ6TjMwESxglxlLTziK8ZjHMianCjJi5yqrh2Wld9xcH5QC/xiGXszBRhFZ/dbrMogQP6+mfknfn53WN++XHxaD+L+Gg9qadrn/oLwPfBkELuqiN4XTf+RTGkhYZCEM50Xoc+LmZVEQZRjnUXlhoyAlNyQzGFgqSgZ5UyxXVuGeZGCdS2SMMXrLrFRXJdDOvVTaz6pu5hvxXblyY5GhSMZEXBgRdNUoKjo3Ezb5xzBRQw0sLCFXMzorpnNiNGuvKRpdCMCpjeL5qtZFqZtY5UPtIDSYjTDRMdWJ9VoysWE2EblnOLP1ba+brWnzW3O7hHs7lNahcsuU2stz+B/tnvB5e75s25Qc4BZUyXVvLgpsG3QYXLw4Di9++7B6ftubtoKfoGeqjAL1Cx+gNGqIRouins+XsOnvuZ/eL+9X9tpK6TlvzBG2E+/0XCTb9Bg==</latexit> <latexit sha1_base64="iZm2t2viqgME5d2mBcyJL6SmJMc=">AAADD3icjVJBb9MwFHbCYCOw0cGRi0VVqZXGlCAkdkGatAs7bCqIbpOaqnKcl9aKY0e2wxqFiN/Ar+GGuPITuPBbcNoA7caBJ1n69L3v+T2/z1HOmTa+/8Nx72zdvbe9c9978HB371Fn//GFloWiMKKSS3UVEQ2cCRgZZjhc5QpIFnG4jNKTJn/5AZRmUrw3ZQ6TjMwESxglxlLTziK8ZjHMianCjJi5yqrh2Wld9xcH5QC/xiGXszBRhFZ/dbrMogQP6+mfknfn53WN++XHxaD+L+Gg9qadrn/oLwPfBkELuqiN4XTf+RTGkhYZCEM50Xoc+LmZVEQZRjnUXlhoyAlNyQzGFgqSgZ5UyxXVuGeZGCdS2SMMXrLrFRXJdDOvVTaz6pu5hvxXblyY5GhSMZEXBgRdNUoKjo3Ezb5xzBRQw0sLCFXMzorpnNiNGuvKRpdCMCpjeL5qtZFqZtY5UPtIDSYjTDRMdWJ9VoysWE2EblnOLP1ba+brWnzW3O7hHs7lNahcsuU2stz+B/tnvB5e75s25Qc4BZUyXVvLgpsG3QYXLw4Di9++7B6ftubtoKfoGeqjAL1Cx+gNGqIRouins+XsOnvuZ/eL+9X9tpK6TlvzBG2E+/0XCTb9Bg==</latexit> response candidates 10 Nov. 03, 2018 Pointwise HSIC (EMNLP 2018) using RNNs Example 2: Dialogue Response Selection by PMI [Li+’16] tough to learn can be applied to sparse expressions learn parameters compute co-occurrence strength ? I've lost my wallet I saw it at the … I'm hungry! Let’s have lunch Will it rain today? It’s about to rain I love this manga I don’t know … sentences
Pointwise HSIC (EMNLP 2018) 30 • Cosine Similarity between Sentence Vectors o Sentence vectors [Kiros+,’15; Dai&Le,’15; Iyyer+,’15; Hill+,’16; Cer+,’18] o Sum of word vectors [Mikolov+,’13; Pennington+,’14; Bojanowski+,’17] o Many pre-trained models are off-the-shelf! • Structured Kernels o [Collins&Duffy,’02; Bunescu&Mooney,’06; Moschitti,’06] • Combinations o We can freely combine (sum, product) kernels PHSIC(, ; , ℓ ) <latexit sha1_base64="rpCR4umr+BS2iuel0ydKBLrI120=">AAAERXicrVNbaxNBFJ42Xup6a/XRl6lLIIVYsrFeoAQKEWxBJWJbC9kYJrMnzbBzWWdm28Zhf40/RnzVJ3+E4IP4qrNNik3bBx+cfdiP71y+c87MGWScGdtofJubr1y6fOXqwrXg+o2bt24vLt3ZNSrXFHao4krvDYgBziTsWGY57GUaiBhweDtI26X97QFow5TctuMMeoLsSzZklFhP9RdbsSB2pIXrbL7Zahe1o/p4PV4OYhDZCN5nyrjCpUX9LBUD50Ww0l8MG6uN44PPg2gKQjQ9nf7S/I84UTQXIC3lxJhu1MhszxFtGeVQBHFuICM0JfvQ9VASAaaeHLDMHMOeOzruecbREWHMWAwKXC2bMWdtJXmRrZvb4dOeYzLLLUjqXbxtmHNsFS5nhROmgVo+9oBQzXyJmI6IJtT6ic6o5JJRlcCDidSMaaikNRlQX7IBKwiTJePa/o40IxPWEGmmLGeePvG1o9O++GWZPcBVnKlD0Jli0mKqRObv0t93UMWnddMyvI5T0CkzPuErOHw2nXxbCUFkEne0GjiBVeGqgfPRFo6ssWOfKi7nOXSdou/iZRdGRfHO+Rxbw13Cc9h2YbNwtbC5UgTld1rWsvSDZyQc0omK+/twim6z56WwV/Je3ZN32ysZ6cfXHTLOWy+YgOcaQC6vPapjrXKZQOL71NJPvfUws3VMJB0p3SoTlMG1MFrxP68kTa6hbLyssFi/sBAJ+/9WiIbkP5TgdyQ6uxHnwW5zNfL49Vq48Xi6LQvoHrqPaihCT9AG2kQdtIMo+og+oy/oa+VT5XvlZ+XXxHV+bhpzF82cyu8/mTBreA==</latexit> <latexit sha1_base64="rpCR4umr+BS2iuel0ydKBLrI120=">AAAERXicrVNbaxNBFJ42Xup6a/XRl6lLIIVYsrFeoAQKEWxBJWJbC9kYJrMnzbBzWWdm28Zhf40/RnzVJ3+E4IP4qrNNik3bBx+cfdiP71y+c87MGWScGdtofJubr1y6fOXqwrXg+o2bt24vLt3ZNSrXFHao4krvDYgBziTsWGY57GUaiBhweDtI26X97QFow5TctuMMeoLsSzZklFhP9RdbsSB2pIXrbL7Zahe1o/p4PV4OYhDZCN5nyrjCpUX9LBUD50Ww0l8MG6uN44PPg2gKQjQ9nf7S/I84UTQXIC3lxJhu1MhszxFtGeVQBHFuICM0JfvQ9VASAaaeHLDMHMOeOzruecbREWHMWAwKXC2bMWdtJXmRrZvb4dOeYzLLLUjqXbxtmHNsFS5nhROmgVo+9oBQzXyJmI6IJtT6ic6o5JJRlcCDidSMaaikNRlQX7IBKwiTJePa/o40IxPWEGmmLGeePvG1o9O++GWZPcBVnKlD0Jli0mKqRObv0t93UMWnddMyvI5T0CkzPuErOHw2nXxbCUFkEne0GjiBVeGqgfPRFo6ssWOfKi7nOXSdou/iZRdGRfHO+Rxbw13Cc9h2YbNwtbC5UgTld1rWsvSDZyQc0omK+/twim6z56WwV/Je3ZN32ysZ6cfXHTLOWy+YgOcaQC6vPapjrXKZQOL71NJPvfUws3VMJB0p3SoTlMG1MFrxP68kTa6hbLyssFi/sBAJ+/9WiIbkP5TgdyQ6uxHnwW5zNfL49Vq48Xi6LQvoHrqPaihCT9AG2kQdtIMo+og+oy/oa+VT5XvlZ+XXxHV+bhpzF82cyu8/mTBreA==</latexit> <latexit sha1_base64="rpCR4umr+BS2iuel0ydKBLrI120=">AAAERXicrVNbaxNBFJ42Xup6a/XRl6lLIIVYsrFeoAQKEWxBJWJbC9kYJrMnzbBzWWdm28Zhf40/RnzVJ3+E4IP4qrNNik3bBx+cfdiP71y+c87MGWScGdtofJubr1y6fOXqwrXg+o2bt24vLt3ZNSrXFHao4krvDYgBziTsWGY57GUaiBhweDtI26X97QFow5TctuMMeoLsSzZklFhP9RdbsSB2pIXrbL7Zahe1o/p4PV4OYhDZCN5nyrjCpUX9LBUD50Ww0l8MG6uN44PPg2gKQjQ9nf7S/I84UTQXIC3lxJhu1MhszxFtGeVQBHFuICM0JfvQ9VASAaaeHLDMHMOeOzruecbREWHMWAwKXC2bMWdtJXmRrZvb4dOeYzLLLUjqXbxtmHNsFS5nhROmgVo+9oBQzXyJmI6IJtT6ic6o5JJRlcCDidSMaaikNRlQX7IBKwiTJePa/o40IxPWEGmmLGeePvG1o9O++GWZPcBVnKlD0Jli0mKqRObv0t93UMWnddMyvI5T0CkzPuErOHw2nXxbCUFkEne0GjiBVeGqgfPRFo6ssWOfKi7nOXSdou/iZRdGRfHO+Rxbw13Cc9h2YbNwtbC5UgTld1rWsvSDZyQc0omK+/twim6z56WwV/Je3ZN32ysZ6cfXHTLOWy+YgOcaQC6vPapjrXKZQOL71NJPvfUws3VMJB0p3SoTlMG1MFrxP68kTa6hbLyssFi/sBAJ+/9WiIbkP5TgdyQ6uxHnwW5zNfL49Vq48Xi6LQvoHrqPaihCT9AG2kQdtIMo+og+oy/oa+VT5XvlZ+XXxHV+bhpzF82cyu8/mTBreA==</latexit> <latexit sha1_base64="rpCR4umr+BS2iuel0ydKBLrI120=">AAAERXicrVNbaxNBFJ42Xup6a/XRl6lLIIVYsrFeoAQKEWxBJWJbC9kYJrMnzbBzWWdm28Zhf40/RnzVJ3+E4IP4qrNNik3bBx+cfdiP71y+c87MGWScGdtofJubr1y6fOXqwrXg+o2bt24vLt3ZNSrXFHao4krvDYgBziTsWGY57GUaiBhweDtI26X97QFow5TctuMMeoLsSzZklFhP9RdbsSB2pIXrbL7Zahe1o/p4PV4OYhDZCN5nyrjCpUX9LBUD50Ww0l8MG6uN44PPg2gKQjQ9nf7S/I84UTQXIC3lxJhu1MhszxFtGeVQBHFuICM0JfvQ9VASAaaeHLDMHMOeOzruecbREWHMWAwKXC2bMWdtJXmRrZvb4dOeYzLLLUjqXbxtmHNsFS5nhROmgVo+9oBQzXyJmI6IJtT6ic6o5JJRlcCDidSMaaikNRlQX7IBKwiTJePa/o40IxPWEGmmLGeePvG1o9O++GWZPcBVnKlD0Jli0mKqRObv0t93UMWnddMyvI5T0CkzPuErOHw2nXxbCUFkEne0GjiBVeGqgfPRFo6ssWOfKi7nOXSdou/iZRdGRfHO+Rxbw13Cc9h2YbNwtbC5UgTld1rWsvSDZyQc0omK+/twim6z56WwV/Je3ZN32ysZ6cfXHTLOWy+YgOcaQC6vPapjrXKZQOL71NJPvfUws3VMJB0p3SoTlMG1MFrxP68kTa6hbLyssFi/sBAJ+/9WiIbkP5TgdyQ6uxHnwW5zNfL49Vq48Xi6LQvoHrqPaihCT9AG2kQdtIMo+og+oy/oa+VT5XvlZ+XXxHV+bhpzF82cyu8/mTBreA==</latexit>
Dependence of ! and " . Co-occurrence of # and $ . Mutual Information Pointwise Mutual Information 32 Nov. 03, 2018 Pointwise HSIC (EMNLP 2018) PMI(, ; , ) = log (, ) () () <latexit sha1_base64="brvaGHiqW4ASZZmeoDha1J/A8O0=">AAAEc3icrVNbTxNBFN5CVVxvoI+8DNSSkhTSFowmpAkJJkoipiZc012b6expO+nszDozC9TJ/kgf/Re+GF89S5eEAg8+OC/77Xeu850z/URwYxuNn6W5+fKDh48WHvtPnj57/mJx6eWxUalmcMSUUPq0Tw0ILuHIcivgNNFA476Ak/54L7efnIM2XMlDO0kgjOlQ8gFn1CLVWxyvBTG1Ix27zsF+VrusT3ZO62frQeCvtUkAcTKCbxKGLnNBxE0i6MTYiQA/EGoYDDRlLuho1XenZ1kXg8Ps+h9/wylEC/J+5vcWK43NxtUhd0GzABWvOJ3e0tyvIFIsjUFaJqgx3WYjsaGj2nImIPOD1EBC2ZgOoYtQ0hhMPTrnibmCobu8UmjG0dHYmEncz0g1v7m5bcvJ+2zd1A7ehY7LJLUgGbqgbZAKYhXJlSUR18CsmCCgTHNskbARRYks6j9TJZWcqQg2pqVmTAMlrUmAYcsGbEy5zBm3hxPVnE5ZQ6UpWMGRvva1o5u+5CDP7pMqSdQF6ERxaQlTcYKTx+3wq+Rm3XEeXidj0GNuMOFnuHhfKL+n4pjKaDrMmKjMVX2H0RYubbEMuZ4D18l6LlhxlWaWfXWYY39wTEUKh67Sylyt0lrPl2B2FpaPvyMj4YJNq7hi5xJlsm4rxFIEK6FX93rLw5yRKF93wIVof+IxfNAAcmX7TZ1olcoIIrynlqh6eyuxdUIlGyndzhPkwbVKcx0/WEmaVEN+8bzDbOfeRnD5/60RDdF/aAHfSPP2i7gLjlubTcRftiu7W8VrWfCWvVWv5jW9t96u99HreEce8354f0peqTT/u7xcXi2/nrrOlYqYV97MKW/8BcpoePQ=</latexit> <latexit sha1_base64="brvaGHiqW4ASZZmeoDha1J/A8O0=">AAAEc3icrVNbTxNBFN5CVVxvoI+8DNSSkhTSFowmpAkJJkoipiZc012b6expO+nszDozC9TJ/kgf/Re+GF89S5eEAg8+OC/77Xeu850z/URwYxuNn6W5+fKDh48WHvtPnj57/mJx6eWxUalmcMSUUPq0Tw0ILuHIcivgNNFA476Ak/54L7efnIM2XMlDO0kgjOlQ8gFn1CLVWxyvBTG1Ix27zsF+VrusT3ZO62frQeCvtUkAcTKCbxKGLnNBxE0i6MTYiQA/EGoYDDRlLuho1XenZ1kXg8Ps+h9/wylEC/J+5vcWK43NxtUhd0GzABWvOJ3e0tyvIFIsjUFaJqgx3WYjsaGj2nImIPOD1EBC2ZgOoYtQ0hhMPTrnibmCobu8UmjG0dHYmEncz0g1v7m5bcvJ+2zd1A7ehY7LJLUgGbqgbZAKYhXJlSUR18CsmCCgTHNskbARRYks6j9TJZWcqQg2pqVmTAMlrUmAYcsGbEy5zBm3hxPVnE5ZQ6UpWMGRvva1o5u+5CDP7pMqSdQF6ERxaQlTcYKTx+3wq+Rm3XEeXidj0GNuMOFnuHhfKL+n4pjKaDrMmKjMVX2H0RYubbEMuZ4D18l6LlhxlWaWfXWYY39wTEUKh67Sylyt0lrPl2B2FpaPvyMj4YJNq7hi5xJlsm4rxFIEK6FX93rLw5yRKF93wIVof+IxfNAAcmX7TZ1olcoIIrynlqh6eyuxdUIlGyndzhPkwbVKcx0/WEmaVEN+8bzDbOfeRnD5/60RDdF/aAHfSPP2i7gLjlubTcRftiu7W8VrWfCWvVWv5jW9t96u99HreEce8354f0peqTT/u7xcXi2/nrrOlYqYV97MKW/8BcpoePQ=</latexit> <latexit sha1_base64="brvaGHiqW4ASZZmeoDha1J/A8O0=">AAAEc3icrVNbTxNBFN5CVVxvoI+8DNSSkhTSFowmpAkJJkoipiZc012b6expO+nszDozC9TJ/kgf/Re+GF89S5eEAg8+OC/77Xeu850z/URwYxuNn6W5+fKDh48WHvtPnj57/mJx6eWxUalmcMSUUPq0Tw0ILuHIcivgNNFA476Ak/54L7efnIM2XMlDO0kgjOlQ8gFn1CLVWxyvBTG1Ix27zsF+VrusT3ZO62frQeCvtUkAcTKCbxKGLnNBxE0i6MTYiQA/EGoYDDRlLuho1XenZ1kXg8Ps+h9/wylEC/J+5vcWK43NxtUhd0GzABWvOJ3e0tyvIFIsjUFaJqgx3WYjsaGj2nImIPOD1EBC2ZgOoYtQ0hhMPTrnibmCobu8UmjG0dHYmEncz0g1v7m5bcvJ+2zd1A7ehY7LJLUgGbqgbZAKYhXJlSUR18CsmCCgTHNskbARRYks6j9TJZWcqQg2pqVmTAMlrUmAYcsGbEy5zBm3hxPVnE5ZQ6UpWMGRvva1o5u+5CDP7pMqSdQF6ERxaQlTcYKTx+3wq+Rm3XEeXidj0GNuMOFnuHhfKL+n4pjKaDrMmKjMVX2H0RYubbEMuZ4D18l6LlhxlWaWfXWYY39wTEUKh67Sylyt0lrPl2B2FpaPvyMj4YJNq7hi5xJlsm4rxFIEK6FX93rLw5yRKF93wIVof+IxfNAAcmX7TZ1olcoIIrynlqh6eyuxdUIlGyndzhPkwbVKcx0/WEmaVEN+8bzDbOfeRnD5/60RDdF/aAHfSPP2i7gLjlubTcRftiu7W8VrWfCWvVWv5jW9t96u99HreEce8354f0peqTT/u7xcXi2/nrrOlYqYV97MKW/8BcpoePQ=</latexit> <latexit sha1_base64="brvaGHiqW4ASZZmeoDha1J/A8O0=">AAAEc3icrVNbTxNBFN5CVVxvoI+8DNSSkhTSFowmpAkJJkoipiZc012b6expO+nszDozC9TJ/kgf/Re+GF89S5eEAg8+OC/77Xeu850z/URwYxuNn6W5+fKDh48WHvtPnj57/mJx6eWxUalmcMSUUPq0Tw0ILuHIcivgNNFA476Ak/54L7efnIM2XMlDO0kgjOlQ8gFn1CLVWxyvBTG1Ix27zsF+VrusT3ZO62frQeCvtUkAcTKCbxKGLnNBxE0i6MTYiQA/EGoYDDRlLuho1XenZ1kXg8Ps+h9/wylEC/J+5vcWK43NxtUhd0GzABWvOJ3e0tyvIFIsjUFaJqgx3WYjsaGj2nImIPOD1EBC2ZgOoYtQ0hhMPTrnibmCobu8UmjG0dHYmEncz0g1v7m5bcvJ+2zd1A7ehY7LJLUgGbqgbZAKYhXJlSUR18CsmCCgTHNskbARRYks6j9TJZWcqQg2pqVmTAMlrUmAYcsGbEy5zBm3hxPVnE5ZQ6UpWMGRvva1o5u+5CDP7pMqSdQF6ERxaQlTcYKTx+3wq+Rm3XEeXidj0GNuMOFnuHhfKL+n4pjKaDrMmKjMVX2H0RYubbEMuZ4D18l6LlhxlWaWfXWYY39wTEUKh67Sylyt0lrPl2B2FpaPvyMj4YJNq7hi5xJlsm4rxFIEK6FX93rLw5yRKF93wIVof+IxfNAAcmX7TZ1olcoIIrynlqh6eyuxdUIlGyndzhPkwbVKcx0/WEmaVEN+8bzDbOfeRnD5/60RDdF/aAHfSPP2i7gLjlubTcRftiu7W8VrWfCWvVWv5jW9t96u99HreEce8354f0peqTT/u7xcXi2/nrrOlYqYV97MKW/8BcpoePQ=</latexit> MI(, ) = KL[ ‖ ] = E ( , ) log (, ) () () <latexit sha1_base64="GA5Ew+EeGw//AxnLgW338TsAVc4=">AAAExnicrVPdbtMwFE5HgRH+Blxy41GGOqlM7RgCCU1CDMSAgYbYxqYkTK5z0lp17GA7bMVY4sl4D96CR+CkzSYKXHCBb3z8nZ/v+Pz0C8GN7Xa/N+bONM+eOz9/Ibx46fKVqwvXru8ZVWoGu0wJpff71IDgEnYttwL2Cw007wt43x9tVPr3n0AbruSOHReQ5HQgecYZtQgdLny7Q+Kc2qHO3esXvr3fOVgm66fQqy0fxdta9d3+gSfxF1I//PQ+8Ekch3fQvpQpkoB17ePOeNm7SYB+Rp75+AkfDEQUxpAXQ/goYeBQnXJTCDo2diwgjIUaxJmmzJ1yRRgm8SdvfCYnjBHioQ8nYXVyuNDqrnQnh/wp9GqhFdRn+/Da3I84VazMQVomqDFRr1vYxFFtOROAgUsDBWUjOoAIRUlzMJ30Ey/MREzc8aToM4aO5saM874nS9W/ze+6CvybLipt9jBxXBalBcnQBHVZKYhVpGoWSbkGZsUYBco0xxQJG1KslMVqz7CUkjOVwt0p1YwqU9KaAhimjB3KKZcV4jZwSDSnU9RQaWpUcIRPbO3wV1vyuooekiVSqCPQheLSEqbyAocJBy5cIr/yjir3DhmBHnGDAd/A0dO68hsqz6lMpz3NifJuKXTobeHY1jNR1TNz2/7QxYuu1fP+g8MYL7I9KkrYca1V79qt1eVqFmZ7YfnoMyISjtiUxdWjVyjjo9UEqQgyoVV0sjhJhUgsX5RxIda3eA7PNYBcXLvfIVpVw53iP7XEqq/fK2yHUMmGSq9XASrndqu3jBcySVNqqD5eZegf/TUR3IF/S0RD+h9SwB3p/b4Rfwp7qys9lN+utR7fq7dlPrgZ3AraQS94EDwONoPtYDdgjduNl413jZ3mZlM2y+bR1HSuUfvcCGZO8+tPYe+YJg==</latexit> <latexit sha1_base64="GA5Ew+EeGw//AxnLgW338TsAVc4=">AAAExnicrVPdbtMwFE5HgRH+Blxy41GGOqlM7RgCCU1CDMSAgYbYxqYkTK5z0lp17GA7bMVY4sl4D96CR+CkzSYKXHCBb3z8nZ/v+Pz0C8GN7Xa/N+bONM+eOz9/Ibx46fKVqwvXru8ZVWoGu0wJpff71IDgEnYttwL2Cw007wt43x9tVPr3n0AbruSOHReQ5HQgecYZtQgdLny7Q+Kc2qHO3esXvr3fOVgm66fQqy0fxdta9d3+gSfxF1I//PQ+8Ekch3fQvpQpkoB17ePOeNm7SYB+Rp75+AkfDEQUxpAXQ/goYeBQnXJTCDo2diwgjIUaxJmmzJ1yRRgm8SdvfCYnjBHioQ8nYXVyuNDqrnQnh/wp9GqhFdRn+/Da3I84VazMQVomqDFRr1vYxFFtOROAgUsDBWUjOoAIRUlzMJ30Ey/MREzc8aToM4aO5saM874nS9W/ze+6CvybLipt9jBxXBalBcnQBHVZKYhVpGoWSbkGZsUYBco0xxQJG1KslMVqz7CUkjOVwt0p1YwqU9KaAhimjB3KKZcV4jZwSDSnU9RQaWpUcIRPbO3wV1vyuooekiVSqCPQheLSEqbyAocJBy5cIr/yjir3DhmBHnGDAd/A0dO68hsqz6lMpz3NifJuKXTobeHY1jNR1TNz2/7QxYuu1fP+g8MYL7I9KkrYca1V79qt1eVqFmZ7YfnoMyISjtiUxdWjVyjjo9UEqQgyoVV0sjhJhUgsX5RxIda3eA7PNYBcXLvfIVpVw53iP7XEqq/fK2yHUMmGSq9XASrndqu3jBcySVNqqD5eZegf/TUR3IF/S0RD+h9SwB3p/b4Rfwp7qys9lN+utR7fq7dlPrgZ3AraQS94EDwONoPtYDdgjduNl413jZ3mZlM2y+bR1HSuUfvcCGZO8+tPYe+YJg==</latexit> <latexit sha1_base64="GA5Ew+EeGw//AxnLgW338TsAVc4=">AAAExnicrVPdbtMwFE5HgRH+Blxy41GGOqlM7RgCCU1CDMSAgYbYxqYkTK5z0lp17GA7bMVY4sl4D96CR+CkzSYKXHCBb3z8nZ/v+Pz0C8GN7Xa/N+bONM+eOz9/Ibx46fKVqwvXru8ZVWoGu0wJpff71IDgEnYttwL2Cw007wt43x9tVPr3n0AbruSOHReQ5HQgecYZtQgdLny7Q+Kc2qHO3esXvr3fOVgm66fQqy0fxdta9d3+gSfxF1I//PQ+8Ekch3fQvpQpkoB17ePOeNm7SYB+Rp75+AkfDEQUxpAXQ/goYeBQnXJTCDo2diwgjIUaxJmmzJ1yRRgm8SdvfCYnjBHioQ8nYXVyuNDqrnQnh/wp9GqhFdRn+/Da3I84VazMQVomqDFRr1vYxFFtOROAgUsDBWUjOoAIRUlzMJ30Ey/MREzc8aToM4aO5saM874nS9W/ze+6CvybLipt9jBxXBalBcnQBHVZKYhVpGoWSbkGZsUYBco0xxQJG1KslMVqz7CUkjOVwt0p1YwqU9KaAhimjB3KKZcV4jZwSDSnU9RQaWpUcIRPbO3wV1vyuooekiVSqCPQheLSEqbyAocJBy5cIr/yjir3DhmBHnGDAd/A0dO68hsqz6lMpz3NifJuKXTobeHY1jNR1TNz2/7QxYuu1fP+g8MYL7I9KkrYca1V79qt1eVqFmZ7YfnoMyISjtiUxdWjVyjjo9UEqQgyoVV0sjhJhUgsX5RxIda3eA7PNYBcXLvfIVpVw53iP7XEqq/fK2yHUMmGSq9XASrndqu3jBcySVNqqD5eZegf/TUR3IF/S0RD+h9SwB3p/b4Rfwp7qys9lN+utR7fq7dlPrgZ3AraQS94EDwONoPtYDdgjduNl413jZ3mZlM2y+bR1HSuUfvcCGZO8+tPYe+YJg==</latexit> <latexit sha1_base64="GA5Ew+EeGw//AxnLgW338TsAVc4=">AAAExnicrVPdbtMwFE5HgRH+Blxy41GGOqlM7RgCCU1CDMSAgYbYxqYkTK5z0lp17GA7bMVY4sl4D96CR+CkzSYKXHCBb3z8nZ/v+Pz0C8GN7Xa/N+bONM+eOz9/Ibx46fKVqwvXru8ZVWoGu0wJpff71IDgEnYttwL2Cw007wt43x9tVPr3n0AbruSOHReQ5HQgecYZtQgdLny7Q+Kc2qHO3esXvr3fOVgm66fQqy0fxdta9d3+gSfxF1I//PQ+8Ekch3fQvpQpkoB17ePOeNm7SYB+Rp75+AkfDEQUxpAXQ/goYeBQnXJTCDo2diwgjIUaxJmmzJ1yRRgm8SdvfCYnjBHioQ8nYXVyuNDqrnQnh/wp9GqhFdRn+/Da3I84VazMQVomqDFRr1vYxFFtOROAgUsDBWUjOoAIRUlzMJ30Ey/MREzc8aToM4aO5saM874nS9W/ze+6CvybLipt9jBxXBalBcnQBHVZKYhVpGoWSbkGZsUYBco0xxQJG1KslMVqz7CUkjOVwt0p1YwqU9KaAhimjB3KKZcV4jZwSDSnU9RQaWpUcIRPbO3wV1vyuooekiVSqCPQheLSEqbyAocJBy5cIr/yjir3DhmBHnGDAd/A0dO68hsqz6lMpz3NifJuKXTobeHY1jNR1TNz2/7QxYuu1fP+g8MYL7I9KkrYca1V79qt1eVqFmZ7YfnoMyISjtiUxdWjVyjjo9UEqQgyoVV0sjhJhUgsX5RxIda3eA7PNYBcXLvfIVpVw53iP7XEqq/fK2yHUMmGSq9XASrndqu3jBcySVNqqD5eZegf/TUR3IF/S0RD+h9SwB3p/b4Rfwp7qys9lN+utR7fq7dlPrgZ3AraQS94EDwONoPtYDdgjduNl413jZ3mZlM2y+bR1HSuUfvcCGZO8+tPYe+YJg==</latexit> contribute # $ ! " between all possible pairs between a specific pair PMI(#, $) is the contribution of (#, $) to MI(!, ") MI(!, ") is the expectation of PMI(#, $)
2018) 42 task: compute co-occurrence strength of input-response sentence pairs to rank the candidate responses observed data: input-response sentence pairs from dialogue corpora use PHSIC I've lost my wallet I saw it at the … I've lost my wallet I’m so sleepy I've lost my wallet I don’t know input by users response candidates I'm hungry! Let’s have lunch Will it rain today? It’s about to rain I love this manga I don’t know … corpus: Twitter (following [Sordoni+,’15])
after filtering (3M ⟼ 1M) Nov. 03, 2018 Pointwise HSIC (EMNLP 2018) 46 39.82 (-1.20) 40.56 (-0.46) PHSIC reduces # of training data to 1/3, almost without sacrificing BLEU random fast_align PHSIC (RBF kernel of fastText) 40.95 (-0.07) 41.02 (using all (3M) training data) Corpus: ASPEC-JE corpus [WMT’14] Model: Transformer [Vaswani+,’17] on fairSeq baseline using word alignment [Dyer+,’13]
smoothed variant of PMI” by kernels o applicable to sentences (PHSIC “smoothes” the matching by kernels) o allows various similarity metrics to be plugged in as kernels o requires very short learning time (Estimators are reduced to matrix calculations) •Experiments (Use Case) o (Re-)Ranking — Dialogue Response Selection o Data Selection — Noisy Parallel Corpus Filtering Nov. 03, 2018 Pointwise HSIC (EMNLP 2018) 47 github.com/cl-tohoku/phsic pip install phsic-cli Special Thanks to Ryo Takahashi
the SotA word aligners) [Dyer+,’13] •1. word alignment on each (x i , y i ) with fast_align to get a set of aligned word pairs with its probabilities •2. compute co-occurrence score of (x i , y i ) with the average log probability of aligned word pairs Nov. 03, 2018 Pointwise HSIC (EMNLP 2018) 54