知られざるAmazonとAWSのScientist

© 2022, Amazon Web Services, Inc. or its affiliates. ©
2022, Amazon Web Services, Inc. or its affiliates. 知られざる AmazonとAWSの Scientist 久保隆宏 Developer Relation Machine Learning

© 2022, Amazon Web Services, Inc. or its affiliates. 背景
(1/2) 2 • AWSとして、2022年に実施されたNLP若手の会にスポンサーをさせていただきました。データセットと分析環境を提供させて頂くとともに、Applied Scientist賞をお送りしました。

© 2022, Amazon Web Services, Inc. or its affiliates. 背景
(2/2) 3 • Applied Scientistとは、Amazon/AWSの研究開発職の一つです。受賞チームの方に実際Applied Scientistと出会える特典をお送りしたところ、話が盛り上がりすぎて面喰いました。事前の検討ではそんな特典よりAmazon Giftといわれるぐらいウケないと考えられていた。。。 • そこで、Amazon/AWSの研究開発職を紹介する資料を作りました。日本語の紹介資料は初めてです(2022年10月時点)。発表を通じ知られざるScientistのお仕事を知って頂き、キャリアの選択肢を増やしていただけたら幸いです！

© 2022, Amazon Web Services, Inc. or its affiliates. 4
Takahiro Kubo / 久保隆宏 Developer Relations Engineer, Machine Learning Career 1. SAPコンサルタント(10年) + kintone エヴァンジェリスト業務要件定義から開発、運用保守まで一貫した導入支援を実施。 SaaS連携による効率的な業務アプリケーションの開発を模索している時にkintoneと出会い、エヴァンジェリストとして活動。 2. 機械学習エンジニア (5年) 自然言語処理の研究に従事。対話アプリケーションのプロトタイプ作成、自然言語処理による企業の非財務情報評価に取り組む。研究部署在籍中、「Pythonで学ぶ強化学習」「直感 Deep Learning」などを執筆。arXivTimes やNLP若手の会といったコミュニティ活動にも参加。 3. プロダクトマネージャー(2年) プロダクトマネージャーとして非財務情報参照・点検サービスの開発に携わる。研究開発をプロダクト化する険しい道を泥まみれで進む経験をする。非財務開示に携わる方、評価する方双方が使えるサービスです！ 4. Developer Relation (1年~) 機械学習を活用したプロダクトの開発を学び普及させるためにAWSへ。 Cybozu Days 2016

© 2022, Amazon Web Services, Inc. or its affiliates. 機械学習のDeveloper
Relationsの活動機械学習の学びから活用まで連続した成長体験を提供する 5 Learning Experimenting Prototyping 様々な大学、書籍のコードを AWSで学べるようにする。 Studio Labで学べる教材をまとめてCommunityサイトで公開。ハッカソンやコンペティションの開催支援。特定の業務課題にフォーカスした課題解決型ハンズオンも提供。プロダクト開発チームを対象に、機械学習のプロダクトでの活用を支援するワークショップを提供。詳細: ML Enablement Workshop 今回の発表のきっかけ SaaS on AWS 2022で Money Forward様と登壇

© 2022, Amazon Web Services, Inc. or its affiliates. Agenda
6 • Amazon/AWSの研究開発職 • 研究内容 • 応募方法

© 2022, Amazon Web Services, Inc. or its affiliates. Amazon/AWSの研究開発職
(1/3) 7 3種類の職種があります。 1. Applied Scientist Applied Scientistは、Amazon EchoやAmazonショッピングなど、実際プロダクトで使用するモデルを改善するための研究開発を行う。開発チームと連携するため特に開発スキルが求められる。 2. Data Scientist Data Scientistは、Amazonのデータを分析し商品推薦の施策を行う、A/Bテストを実施するなど、内部のチームのために動くことが多い。一般的にML Engineerといわれるような人も含む。 3. Research Scientist すぐに製品にはならないが、将来役立つ基礎的な研究を行う。

(2/3) 8 3つの”Scientist”のロールガイドラインは共通しているため、職務内容は共通する点が多い。参考: Applied Scientistのロールガイドラインの一例 ※YANSで行われたハッカソンのApplied Scientist賞審査基準としても使用 • アプローチの新規性(Ambiguity / Scientific Complexity) 既存の手法を拡張、応用することで新規・創造性のある手法で解決しているか。 • 実装の実現性(Engineering Complexity) 現実の問題に適用可能な効率的かつスケールする実装を行えているか。 • 背景知識の豊富さ(Knowledge) 先進的な原理と手法を理解し利用できているか。

(3/3) 9 AWSとAmazonで研究職はほぼ同じ扱いなので、AWS、Amazon間の異動は割合容易に行える。また、日本で入ってGlobalへの異動も活発。

主な研究領域画像自然言語処理機械学習全般機械学習実装 Amazonの検索、 Alexaの発話理解などが代表例 Amazonの商品画像、 Amazon Primeの動画などが代表例強化学習、因果推論系など様々 A/Bテスト、分散学習、 SageMakerによる開発効率化など。

画像処理の研究 (1/2) 2022年はVision & Languageが多い印象。 FashionVLP: Vision language transformer for fashion retrieval with feedback 【CVPR2022】Amazon Alexa Natural Understanding 服などの画像とリクエスト(「もっと丈が短い」など)から、条件に合う画像を検索するタスクに取り組んだ研究。物体検出モデルから得られたタグ、全体/部分/特徴点周辺の画像特徴を入力するTransformerのモデルを開発しFashionIQでSOTA。 Vision-language pre-training with triple contrastive learning 【CVPR2022】Amazon 画像とテキストの特徴マッチングの精度を高める研究。テキストは画像の一部にしか言及しないので、関係ない箇所を特徴が近いと学習する可能性がある。画像の頑健な特徴に注目させるため、異なる変形を行った画像ペアと同一画像内のペアの判定を学習に組み込んだ。Vision & Languageのタスクに転移しSOTAの精度を達成。

画像処理の研究 (2/2) ECCV (10/23~27)にも参加しています。 Fine-grained fashion representation learning by online deep clustering GLASS: Global to local attention for scene-text spotting

自然言語処理の研究 (1/2) 2022年は、モデルを実用する時に問題になる秘匿性や公平性に関する研究がみられる。 Canary extraction in natural language understanding models 【ACL2022】Amazon Alexa AI 自然言語理解のモデルから学習データ内の機微なコード(電話番号や郵便番号)の抽出を試み、最良な設定では50%の確率で数値4桁が抽出できることを確認。 On the intrinsic and extrinsic fairness evaluation metrics for contextualized language representations 【ACL2022】Amazon Alexa AI 公平性を計測する場合、ベースの言語モデルと後続タスクを含めた公平性とで2種がある。言語モデルの公平性は後続タスクへ必ずしも引き継がれないことを示した。

© 2022, Amazon Web Services, Inc. or its affiliates. 自然言語処理の研究
(2/2) 15 IR evaluation and learning in the presence of forbidden documents 【SIGIR2022】Amazon 商品レビューの検索で、広告レビューや嘘情報を適切にフィルタリングした上で結果を表示できているか評価するための指標nDCGfの提案。nDCGでは関係ない記事をマイナスで評価するため値が0~1の範囲に入らず学習が安定しない点を、最良/最悪サブセットのスコアで正規化している。 I wish I would have loved this one, but I didn’t: A multilingual dataset for counterfactual detection in product reviews 【EMNLP2021】Amazon もし〜だったら、という半事実を含む商品レビューのデータセット。英語、ドイツ語、日本語が対象。割合は1~2% 程度だが事実に基づかないレビューのためユーザー体験が悪くなる。半事実を含む文の構文からデータを収集し、半事実ではないが似ている文をBERTの類似度で収集している。

機械学習全般の研究 (1/3) 研究は本当に様々！個人的な関心から、強化学習と因果推論系をピックアップ。 Faster deep reinforcement learning with slower online network 【NeurIPS 2022】Amazon Web Service DQNではネットワークを更新する際に重みを固定したTarget Networkを使用する。更新中の重みがTarget Networkの近傍に留まるようにすることで、学習の安定性を向上させたDQN-Pro、Rainbow-Proを提案。シンプルな改善で大幅な性能向上。 Causal structure-based root cause analysis of outliers 【ICML2022】Amazon Research Tubingen システム障害といった外れ値となるイベントにおいて、何が根本的なトリガだったかをデータと因果グラフから特定する研究。3つの川のうち、どれが下流での氾濫を引き起こしたか現実のイギリスのケースに適用して分析している。

© 2022, Amazon Web Services, Inc. or its affiliates. 機械学習全般の研究
(2/3) 17 サプライチェーン最適化については奨学金(SCOT/INFORMS Scholarships)を設けています。今年は3カ国13大学から15名が対象に。

© 2022, Amazon Web Services, Inc. or its affiliates. 機械学習全般の研究
(2/3) 18 AWSとMicrosoft共同で、因果推論の OSS PyWhyの開発を開始(Microsoftの開発していたDoWhyの後継)。 AWSとしては、マイクロサービスアーキテクチャで一つのサービスの変化が他のサービスに与える影響を推論するなど、システムアーキテクチャの面での用途を考えています。

機械学習実装の研究 (1/3) 機械学習の効率的な演算方法や実践的なノウハウなどを発表。 DietCode: Automatic optimization for dynamic tensor program 【MLSys 2022】Amazon Web Service 系列を扱うモデルなど、演算グラフが動的に形成されるネットワークであってもハードウェア上で効率に演算するための研究。演算子ごとではなく汎用的な探索空間を使用することで動的なグラフでも効率的に計算できるようにした。 Profiling deep learning workloads at scale using Amazon SageMaker 【KDD2022】 Amazon Web Service 機械学習モデルの学習パフォーマンスをプロファイリングする研究。大規模モデルになるにつれ、CPU/GPUが効率的に使えているかどうかが実験サイクルを上げるのに重要になる。実装の改善につながるようコードとメトリクスを対応させて可視化している。実装をオープンソースで提供。

Configuration Data Collection Data Verification Machine Resource Management Serving Infrastructure ML Code Analysis Tool Process Management Tools Feature Extraction Monitoring “Only a small fraction of real-world ML systems is composed of the ML code” source: Hidden Technical Debt in Machine Learning Systems [D. Sculley, & al.] – 2015 https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Ground Truth Glue Clarify Data Wrangler Feature Store Processing Job Studio Auto Pilot JumpStart Debugger Model Monitor Endpoint Pipeline MWAA Edge Quick Sight Experiments Auto Scaling Training Job 機械学習実装の研究 (2/3) 機械学習の技術的負債を軽減するSageMakerの機能を学会で発表しています。

Configuration Data Collection Data Verification Machine Resource Management Serving Infrastructure ML Code Analysis Tool Process Management Tools Feature Extraction Monitoring “Only a small fraction of real-world ML systems is composed of the ML code” source: Hidden Technical Debt in Machine Learning Systems [D. Sculley, & al.] – 2015 https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Ground Truth Glue Clarify Data Wrangler Feature Store Processing Job Studio Auto Pilot JumpStart Debugger Model Monitor Endpoint Pipeline MWAA Edge Quick Sight Experiments Auto Scaling Training Job Amazon SageMaker automatic model tuning: Scalable gradient-free optimization 【KDD 2021】 AWS モデルのチューニング自動化機能 Amazon SageMaker Clarify: Machine learning bias detection and explainability in the cloud 【KDD 2021】 AWS モデルのバイアス検知を行う機能。機械学習実装の研究 (3/3)

Globalに展開するサービスを改善することで世界中の人の日常をよりよくできる (☚YouTubeの動画をチェック)

応募方法 Amazon Scienceのサイトから募集中のポジションを検索できます。 ※日本での募集は現状あまりないが・・・

© 2022, Amazon Web Services, Inc. or its affiliates. 採用までの流れ
25 • インターンからが一般的 • 海外ではだいたい夏にサマーインターンを行って、1~2ヶ月開発して最後のタイミングで来年入ろう、みたい話になることが多い。募集中のインターンもAmazon Scienceのサイトから検索可能。 • 日本でも今後行うかも。 • 3つのScientistのロールガイドラインは共通しているので、ロール間の異動も活発。 ▪ Amazon/AWSに入る場合、Scientistの職種にこだわるより入ってしまったほうが良かったりする。日本で入ってグローバルへの異動も活発。

知られざるAmazonとAWSのScientist

知られざるAmazonとAWSのScientist

Takahiro Kubo PRO

More Decks by Takahiro Kubo

Other Decks in Research

Featured

Transcript

© 2022, Amazon Web Services, Inc. or its affiliates. ©

© 2022, Amazon Web Services, Inc. or its affiliates. 背景

© 2022, Amazon Web Services, Inc. or its affiliates. 背景

© 2022, Amazon Web Services, Inc. or its affiliates. 4

© 2022, Amazon Web Services, Inc. or its affiliates. 機械学習のDeveloper

© 2022, Amazon Web Services, Inc. or its affiliates. Agenda

© 2022, Amazon Web Services, Inc. or its affiliates. Amazon/AWSの研究開発職

© 2022, Amazon Web Services, Inc. or its affiliates. Amazon/AWSの研究開発職

© 2022, Amazon Web Services, Inc. or its affiliates. Amazon/AWSの研究開発職

© 2022, Amazon Web Services, Inc. or its affiliates. Agenda

© 2022, Amazon Web Services, Inc. or its affiliates. 11

© 2022, Amazon Web Services, Inc. or its affiliates. 12

© 2022, Amazon Web Services, Inc. or its affiliates. 13

© 2022, Amazon Web Services, Inc. or its affiliates. 14

© 2022, Amazon Web Services, Inc. or its affiliates. 自然言語処理の研究

© 2022, Amazon Web Services, Inc. or its affiliates. 16

© 2022, Amazon Web Services, Inc. or its affiliates. 機械学習全般の研究

© 2022, Amazon Web Services, Inc. or its affiliates. 機械学習全般の研究

© 2022, Amazon Web Services, Inc. or its affiliates. 19

© 2022, Amazon Web Services, Inc. or its affiliates. 20

© 2022, Amazon Web Services, Inc. or its affiliates. 21

© 2022, Amazon Web Services, Inc. or its affiliates. 22

© 2022, Amazon Web Services, Inc. or its affiliates. Agenda

© 2022, Amazon Web Services, Inc. or its affiliates. 24

© 2022, Amazon Web Services, Inc. or its affiliates. 採用までの流れ

© 2022, Amazon Web Services, Inc. or its affiliates. おわりに

© 2022, Amazon Web Services, Inc. or its affiliates. Thank