SIGMOD 2022 国際会議報告 / Report on the International Conference SIGMOD 2022

SIGMOD 2022国際会議報告 2022.8.4 第43回先端的データベースとWeb技術動向報告会 ACM SIGMOD ⽇本⽀部第80回⽀部⼤会 Tsubasa TAKAHASHI Seng
Pei LIEW Data Science Center / AI Company, LINE Corp.

2 https://linecorp.com/ja/pr/news/ja/2022/4269

Tsubasa TAKAHASHI, Ph.D Senior Research Scientist / R&D Manager at
LINE R&D Activity • R&D on Privacy x ML (LINE Data Science Center) • R&D on Trustworthy AI (LINE AI Company) • 招聘研究員 (WASEDA U.) / DBSJ理事など各種委員を担当 Selected Publication • 分散型シャッフリング @SIGMOD 2022 w/ Liewさん • プライバシ x クエリ処理 @VLDB 2022 w/ 加藤さん • プライバシ x データ合成 @ICDE 2021, ICLR 2022 w/ ⾼⽊さん, Liewさん • Graph NNへのポイズニング @BigData 2019 • テンソル分解による異常検知 @WWW 2017 Univ. NEC LINE ⽊更津⾼専卒業筑波⼤学⼠/修⼠筑波⼤社会⼈博⼠ CMU訪問研究員上林奨励賞中央研究所配属匿名化の研究2010~15 AIセキュリティ2016~18 Privacy x ML2019~ Trustworthy AI2021~

SIGMOD Conference • データベース三⼤会議の⼀つ • SIGMOD/PODS, VLDB, ICDE • 直近の会議
• 2018 Houston (⽶国) • 2019 Amsterdam (オランダ) • 2020 Portland (⽶国) • 2021 Xian (中国) à virtual • 2022 Philadelphia (⽶国) + virtual (zoom) 4 (引⽤) https://sigmod.org/

SIGMOD 2022 @ Philadelphia • First Hybrid SIGMOD Conference •
現地会場︓Marriott Philadelphia Downtown • Virtual会場︓Zoom + Gather.town • ⼀部のソーシャルプログラムは現地会場のみ • 参加者数 • 現地︓550 • リモート︓300 • ※初⽇のオープニングで公表された数値 5 (引⽤) https://2022.sigmod.org/ 髙橋、リュウはリモート参加

参加者数（国別） • 参加者数（トータル） • 1位︓⽶国 • 2位︓ドイツ • 3位︓中国 •
4位︓カナダ • 5位︓スイス • … • ⽇本は20位前後︖ 6 ほぼリモート現地 > リモート現地 ≒ リモート ※ 図表は投影スライドのスクリーンショットです。

Covid-19対策 • マスクは必須 • ⾷事とプレゼンテーションは除く • Daily Testing • COVID
rapid self-tests for each evening • 陽性者数を毎⽇発表 • 陽性になったら求められること • マスクなしで濃厚接触した⼈を報告 • 可能な限り⾃主的な隔離 • N95マスクの装着 7 (引⽤) https://2022.sigmod.org/

バンケット • @National Constitution Center • ⽶国の歴史に関するミュージアム 8 リモートで参加のため、バンケットには不参加（中継なし） (引⽤)
https://2022.sigmod.org/

SIGMOD 2022の運営コアメンバー 9 検索・Web クエリ⾔語・DB理論データ統合・クエリ処理分散システム DB/学習理論
(引⽤) https://2022.sigmod.org/

SIGMOD 2022のスポンサー 10 GAFAM + BATが勢揃い ※ 図表は投影スライドのスクリーンショットです。

査読プロセスの向上 • Review Quality Week • 著者へのフィードバックに先⽴ち実施 • AEは査読者に対して、建設的かつプロフェッショナルなレビュー（constructive
and professional review）を保証することを伝達 • Author Feedback • AEは、著者からのフィードバックへの応答を義務付けられている • AE meta-review • 著者と共に採択を⽬指してRevisionに対応する • Revision Phase • 8+ week 11

ケーススタディ – LINEの採択論⽂の場合 • 投稿︓2nd Round • Notification à revision
• Revision • 2ヶ⽉間でMajor Revision • Revisionを投稿 • 最終通知の1週間前に、Meta-reviewerから連絡 • マイナーな修正に対応することができた • Acceptの通知 • 最終評価︓ A / A / R à Meta-reviewerが採択されるように推してくれた 12

投稿数 / 採択数 13 ※ 図表は投影スライドのスクリーンショットです。

論⽂のキーワード 14 ※ 図表は投影スライドのスクリーンショットです。

採択率 15 ※ 図表は投影スライドのスクリーンショットです。

採択率 (トピック別) 16 Graph / RDF / SNS (21%) Data
management for ML (49%) ML for data management (27%) Security / Privacy (37%) クエリ処理・最適化 (25%) Stream / Sensor (69%) Transaction (42%) ※ 図表は投影スライドのスクリーンショットです。

採択率 (性別) 17 ※ 図表は投影スライドのスクリーンショットです。

採択率 (トラック別) 18 採択数(率) 投稿数(割合) ※ 図表は投影スライドのスクリーンショットです。

投稿件数（国別） 19 ※ 図表は投影スライドのスクリーンショットです。

Review Discussion 20 0はdesk reject ※ 図表は投影スライドのスクリーンショットです。

プログラム構成 • 28 research sessions, 2 industrial, 5 demo •
Live presentation or short pre-recorded presentation • 3 キーノート • 2 Diversity & inclusion events • D&I keynote • SIGMOD D&I panel • New researcher symposium 21

プログラム 22

SIGMOD Panel 1: The DB community vis-a-vis grand challenges related
to the environment, health, and society: innovation engine, plumber, or bystander? 23

Panel 1 • The DB community vis-a-vis grand challenges related
to the environment, health, and society: innovation engine, plumber, or bystander? • Organizer: • Magdalena Balazinska (Univ. of Washington) • Panelists: • Anastasia Ailamaki (EPFL) • Leilani Battle (Univ. of Washington) • Johannes Gehrke (Microsoft Research) • Masaru Kitsuregawa (NII, Univ. of Tokyo) • David Maier (Portland State Univ.) • Christopher Re (Stanford) • Meihui Zhang (Beijing Institute of Technology) 24 環境・健康・社会のグランドチャレンジに相対するDBコミュニティイノベーションエンジンか︖配管⼯か︖傍観者か︖

Questions to Panelists • Q1: Should we engage and find
collaborators? • Q2: Should we build prototypes, open systems, hire teams? • Q3: Should we collaborate with other researchers or practitioners? • Q4: Should the database community organize ourselves to facilitate and recognize work that solves real world problems and have practical impact? 25

Lessons from Panel 1 (1/3) • Leilani Battle (U. Washington)
• Helping people requires a meaningful connection and active dialogue. • David Maier (Portland State U.) • Take care with people who think you will be gratified to help out simply because they have "interesting data". • Anastasia Ailamaki (EPFL) • Talking to scientists an endless source of inspiration. • Be patient with peer reviewers. • Open mind toward building bridges across sciences. 26

Lessons from Panel 1 (2/3) • Johannes Gehrke (Microsoft) •
Keep your eye on the prize (変化を注視しつづける) • Find a real problem instance • Select the right collaborators • Don’t underestimate indirect influence (AIの発展はデータに基づいている) • Invest into People • Christopher Ré (Stanford) • The point of the projects is to develop people. • Students lead to new directions –and end bad ones. 27

Lessons from Panel 1 (3/3) • 喜連川先⽣ (NII, U. Tokyo)
• Most researchers rushed to develop their own models, rather than working together. The world produced hundreds of mediocre tools, rather than a handful of properly trained and tested ones. • Database researchers' role is to show the importance of data and importance of data sharing by solving real world problem with Databased approach in addition to writing papers. • Meihui Zhang (Beijing Institute of Technology) • Working with non-computer scientists require: • A lot of patience and understanding • Data collection and cleaning • Involvement in non-CS writing for subject matter experts to publish in their domain 28

SIGMOD Panel 2: Publication Culture and Review Processes in the
Data Management Community: An Open Discussion 29

Panel 2: Publication Culture and Review Processes in the Data
Management Community: An Open Discussion • タイトル • Publication Culture and Review Processes in the Data Management Community: An Open Discussion • パネリスト • Sihem Amer-Yahia (CNRS LIG and Univ. Grenoble Alpes) • Sourav S. Bhowmick (NTU Singapore) • Xin Luna Dong (Meta) • Stratos Idreos (Harvard) • Wolfgang Lehner (TU Dresden) • オーガナイザー • Divesh Srivastava 30 ※ 図表は投影スライドのスクリーンショットです。

Diversity & Inclusion • Diversity & Inclusion • 2021からデータサイエンストラックを創設した •
2021は Data Science & Engineering Track • 2022は Data Science Track と Data-centric Application Track • 8ページの短いフォーマットで早期の成果出しに配慮したもの • Data Management Trackは12ページ • 会場からのコメント • マルチトラックの導⼊はencouragingである。⾒守りたい • オープンレビューの導⼊は︖ 31

よいPublicationとは︖よい論⽂とは︖ • What are good publication? What is the impact
of the paper? • citation numbers / h-index • How many of ideas are either pushing new ideas come up or generating practical impact in industry in changing people’ life • Ex. Booking airline ticket • Database communityの貢献 • 良い論⽂とは何か︖ • worth well for readers’ time • Paper should • have good idea • Well written • Easy to understood • Inspiring 32

査読の質を上げるためには • PCメンバーの選択の質を上げる • トピックのカバー率 / 多様性 • レビューの⽂章がどうとかではなく、もっと上流の改善が必要 •
PCメンバーの選択は学会のデザインの問題 • レビューボードを構成するメンバーを考えるために詳細なデータソースを作る必要がある • 会場からのコメント • 今⽇では多くの会議でマルチラウンドsubmissionが導⼊されている à パイオニアであるDBコミュニティの勝利 • この2,3年でSIGMODの査読の質は改善した 33

DBコミュニティの査読プロセスのよい点 • DBコミュニティのよいところ • 査読者がDBコミュニティに対して責任を持っている • シニアPCがPC間での議論を促している • 他のコミュニティとの⽐較では “better”
• オンタイムの査読が⾏われている • VLDBの査読の場合 • Very very tiny%の論⽂だけが⼀発採択 • ほとんどの論⽂がRevision • ボーダラインに乗った論⽂にRevisionの機会を与える • 採択になる論⽂のほとんどが“ボーダライン”ペーパーである • この機会を与えることで、他の会議に単に流れることを防ぐ • 査読者向けのFAQを作った 34

ページ数に関する議論 • DBの論⽂はページ数が⻑い。他のコミュニティには馴染みがない • ページ数の削減のメリット • 8ページ前後の論⽂は1つのContributionの記述にフォーカスできる • 成果を早期にアウトプットできる •
ページ数の削減＝ Expectationの変更 • DB系の論⽂は3つのmajor technical contributionを求める傾向にある。短いページ数に移⾏するためには、expectationを変える必要がある • どれだけのdetailを1つの論⽂に求めるのか • DB系の場合、システム / 理論 / 性能 • システム系の論⽂にとっては12ページですら制限が強いとの声も 35

所感 • コミュニティとして危機感を持ち、熱⼼に議論している点が好印象 • この2~3年のSIGMOD/VLDBの査読プロセスへの評価は⾼い • 実体験でも⽣産的なコメントを頂いている • 数年前との⽐較では採択率がほぼ2倍になったが、⽇本からの採択はそれほど増えていない点が残念
• SIGMOD/VLDBと⽐べてICDEへの⾔及がほぼなかった • EDBTに⾔及している⽅は何⼈かいた 36

Keynote • Keynote 1 • Reflections on a Career in
Computer Science • Barbara Liskov (MIT) • Keynote 2 • On A Quest for Combating Filter Bubbles and Misinformation • Laks V.S. Lakshmanan (University of British Columbia) • Keynote 3 • Is Data Management the Beating Heart of AI Systems? • Christopher Ré (Stanford) 37 (引⽤) https://2022.sigmod.org/

Keynote 3 38 ※ 図表は投影スライドのスクリーンショットです。

AI開発のビッグウェーブを乗りこなす 39 ※ 図表は投影スライドのスクリーンショットです。

Appleと共に3つのスタートアップを設⽴ 40 ※ 図表は投影スライドのスクリーンショットです。

深層学習すれば問題は解決するのか︖ • データの質の影響が不明確 • モデル (アルゴリズム) を変えても性能差が僅か • ラベルの質についても考える必要がある 41
※ 図表は投影スライドのスクリーンショットです。

訓練データがボトルネック 42 ※ 図表は投影スライドのスクリーンショットです。

Snorkel: プログラムによるラベリング • ラベリング関数を作成し、⾃動的なラベル付与を実現 • ⼈⼿によるラベリングでのコスト等の課題を解決 43 ※ 図表は投影スライドのスクリーンショットです。

Foundation Model • 巨⼤なモデルと巨⼤なデータがもたらす驚異的な変化 • 最新のモデルでは、「⾺に乗った宇宙⾶⾏⼠」のような架空の画像をコンテキストを理解して⽣成できる 44 ※ 図表は投影スライドのスクリーンショットです。

Foundation Model • 巨⼤なモデルを巨⼤なデータで訓練した汎⽤モデル • Promptingによって、追加学習なしに様々なタスクを設計可能 • 推論 / 翻訳
/ QA / 要約 / … 45 ※ 図表は投影スライドのスクリーンショットです。

Foundation Model • コード⽣成や画像⽣成も 46 ※ 図表は投影スライドのスクリーンショットです。

Foundation Model for Data Tasks • テーブルデータの⽋損補完 47 ※ 図表は投影スライドのスクリーンショットです。

Robustness / Biasの問題 • DNNは意図しない特徴を使って判断していることがある • 虹彩を使った性別判別器 à マスカラ等のメイクの影響が⼤きい •
動物の分類 à 背景画像から判断している 48 ※ 図表は投影スライドのスクリーンショットです。

Keynote 3のまとめ 49 ※ 図表は投影スライドのスクリーンショットです。

Best Paper Award受賞論⽂の紹介 Tsubasa TAKAHASHI 50

Best Paper Award 51 https://dl.acm.org/doi/pdf/10.1145/3514221.3517844 キーワード︓ Differential Privacy / SPJA
Query / Foreign-key Constraint PODS’22 / SIGMOD’21 でも採択 DPの第⼀⼈者 (l-diversityの提案者)

Differential Privacy Differential Privacy (DP) とは︖ • データ収集・解析の結果に対してプライバシーの⽔準を統計的に表現した尺度 • 統計的に「どれだけ他⼈と⾒分けがつかないか」をプライバシーパラメータ
ε で表現 • Google, Apple, US Census等で実装され始めた Golden Standard 解決する課題 • いかなる外部知識との突合にも頑健なプライバシー保護の提供（特別なノイズを加算） • データ活⽤に伴う累積的なプライバシー消費の定量的な管理 !" 他⼈と⾒分けがつかないようにアルゴリズムの出⼒にノイズ加算 (ε程度に他⼈と⾒分けられない) プライバシー消費を定量的に管理可能ノイズ加算 Privacy消費累積 Analysis #1 # $ Analysis #2 # %$ Analysis #3 # &$

ノイズの設計とセンシティビティセンシティビティΔ" • 関数 # の出⼒の最⼤の変化量（想定する隣接性における） 53 Δ" = sup
# ( − # (* + Examples Δ,-./01234 = 1 Δ6078/ = 1 Δ4938 = 1 : ラプラスメカニズム • 平均0、分散b = Δ" /= のラプラス分布からノイズをサンプリング ℳ ( = # ( + Lap 0, Δ" = ※ CDPの場合ノイズで覆い隠す度合い

R2Tの概要 • 取り組む課題 • Self-joinを⽤いる際にSensitivityをどう扱えばよいか︖ • 従来の考え⽅ • Truncation Mechanism:
• クエリへの貢献 (値の⼤きさ等) が ! より⼤きい個⼈をテーブルから削除 • ある個⼈に変更があっても、⾼々!の変化 à sensitivity = ! • à クエリの結果にバイアスが⼊る à バイアスの⼩さい " はどう選べばよいか︖ • 貢献 • Self-joinを⽤いる際のSensitivityの上限値!を準最適に導出するアルゴリズム Race-to-the-Top (R2T) を提案 54

Self-joinがなぜ問題か︖ • ナイーブなTruncationがうまく機能しない • あるユーザの追加/削除が他のユーザにも影響してしまうため • Self-joinなしの従来の想定よりも閾値!の値に⼤きく依存してしまう à " をどう設定するか︖
55 SELCT SUM(Amount) FROM Transaction, People P1, People P2 WHERE P1.ID = From AND P2.ID = To; ID Location p1 Tokyo p2 Kyoto p3 Hokkaido p4 Okinawa … From To Amount p1 p2 1,000 p3 p4 1,000 p5 p6 1,000 p7 p8 1,000 p9 p10 1,000 … … p1 p3 p5 p7 p9 p2 p4 p6 p8 p10 " = 1,000 ' '( … p1 p3 p5 p7 p9 p2 p4 p6 p8 p10 pz α ) ', " = 1,000× 1 2 , ) '(, " = 0 N 追加 People(ID, Location) Transaction(From, To, Amount)

Race-to-the-Top (R2T) メカニズム • 基本戦略︓様々な!を競わせ、最も誤差の⼩さい値を出⼒ • "の候補は2のn乗の値に限る • DPを保証しながら #
$(&, ")を 56 真の出⼒ Truncation by " Truncation by " + DP Truncation による誤差 DPのノイズによる誤差 DPに必要なノイズノイズの加算を前提とした補正項 !を競わせて最⼤値をR2Tメカニズムの出⼒とする (引⽤) https://dl.acm.org/doi/pdf/10.1145/3514221.3517844 真の出⼒ R2Tの出⼒

Evaluation: Error Level • 多くのクエリでR2Tの性能が⾼い (誤差が⼩さい) 57 (引⽤) https://dl.acm.org/doi/pdf/10.1145/3514221.3517844

!の選択の有効性 • R2Tの誤差は⼩さい、ただしLPメカニズムの最良値には劣る 58 (引⽤) https://dl.acm.org/doi/pdf/10.1145/3514221.3517844

R2Tのまとめ • 取り組んだ課題 • Self-joinを⽤いる際にSensitivityをどう扱えばよいか︖ • Truncationに基づく⼿法では、閾値!の選択に依存したバイアスが⽣じる • àどのようにしてバイアスの⼩さい !
を選択すればよいか︖ • 貢献 • Self-joinを⽤いる際のSensitivityの上限値!を準最適に導出するアルゴリズム Race-to-the-Top (R2T) を提案 • 制限 • Group-byクエリには未対応 (Future workとして⾔及されている) 59

おわりに 60

おわりに • SIGMOD 2022の参加報告として以下を報告しました • SIGMOD2022の概要 • 投稿件数や採択率、査読プロセスなど • パネル討論2件、キーノート1件を
• LINEの採択論⽂「Network Shuffling」 • Best Paper Award受賞論⽂「R2T」 • VLDB 2022 (9/5~9, @Sydney) で以下の論⽂を発表予定 • HDPView: Differentially Private Materialized View for Exploring High Dimensional Relational Data F. Kato, T. Takahashi, S. Takagi, Y. Cao, S.P. Liew, M. Yoshikawa 61

62 投稿スケジュール SIGMOD 2023 • 1st Round: 4/15 • 2nd
Round: 7/15 • 3rd Round: 10/15 PODS 2023 • 1st Round: 5/30 • 2nd Round: 11/28 https://2023.sigmod.org/

SIGMOD 2022 国際会議報告 / Report on the Internationa...

SIGMOD 2022 国際会議報告 / Report on the International Conference SIGMOD 2022

More Decks by LINE Developers

Other Decks in Technology

Featured

Transcript