Slide 1

Slide 1 text

Harnessing Backend.AI for AI Model Training in Supply Chain Context Leksikov Sergey, 권용근

Slide 2

Slide 2 text

Outline 1. Part 1 – Background and Demo 1. Introduction 2. Background 3. Objective 4. Demo 2. Part 2 - Data Preparation and Processing pipeline 1. Data acquisition for Domain Adaptation 2. Synthetic Data Generation Pipeline 3. Part 3 - Model Development and Evaluation 1. Domain Adaptation 2. Pre-Instruction Tuning (PIT) (using FastTrack) 3. Key-Fact Fine-tuning 4. Fine-tuning & Evaluation

Slide 3

Slide 3 text

Part 1 – Background and Demo

Slide 4

Slide 4 text

Introduction  "In business, time is money" o Every minute wasted on manual processes in buyer-supplier communications is a missed opportunity for efficiency

Slide 5

Slide 5 text

Background: International Trade (국제무역)  International Trade (국제무역) o Exchange goods or services across borders, connecting supplier and buyers globally o Key in enabling access to broader markets, reducing costs o Supply Chain Management (SCM) ensures smooth operations by managing  Logistics  Customs  Tarrifs  Deliveries

Slide 6

Slide 6 text

Background: Buyer-Supplier Interaction – Buyer (Initial Inquiry):  "안녕하세요, 귀사의 제품에 대해 더 많은 정보를 알고 싶습니다. 특히 X 제품의 가격과 배송 기간에 대해 알려주시면 감사하겠습니다." – Supplier (Response):  "안녕하세요. X 제품의 가격은 개당 10,000원이며, 최소 주문 수량은 50개입니다. 배송 기간은 주문 후 약 7일 정도 소요됩니다." – Buyer (Negotiation):  "가격을 조금 조정할 수 있을까요? 100개를 주문하면 할인이 가능한지 알고 싶습니다." – Supplier (Response to Negotiation):  "100개 주문 시 5% 할인을 제공해드릴 수 있습니다. 이 경우 개당 9,500원이 되며, 배송 기간은 동일하게 유지됩니다." – Buyer (Request for Quote):  "최종 조건으로 견적서를 보내주시면 감사하겠습니다. 확인 후 바로 주문하겠습니다." – Supplier (Final Quote):  "첨부된 견적서를 확인해주시기 바랍니다. 기타 문의 사항이 있으시면 언제든지 연락 주십시오."

Slide 7

Slide 7 text

Background: Quote Document  Quote (견적서) document: o Document from seller (supplier) to buyer o Contains:  price, quantity, delivery date, shipping options

Slide 8

Slide 8 text

Background: Quote Document – examples

Slide 9

Slide 9 text

Background: Key Challenges  Large amounts of data o Ex. Long email communications  Scattered Information Across Multiple Emails  Manual information extraction o Errors can occur o Missing details o Takes long time  Time-Consuming Quote Generation  Privacy and confidentiality o Can not use OpenAI ChatGPT and other API services to help

Slide 10

Slide 10 text

Objective  Eliminate manual work via automation  Reduce time spent for information collection and quote generation  Keep information private and confidential

Slide 11

Slide 11 text

Backend.AI Solution - "QuoteFlow"  "QuoteFlow" System – Automated email processing and quote generation pipeline using open-source self-hosted LLM run on Backend.AI  Summarization  Key-Fact Extraction  Tag Assignment  Quote Document Generation

Slide 12

Slide 12 text

Solution – Inference Pipeline

Slide 13

Slide 13 text

Large Language Model use case  LoRA continual pre-training LLM on domain adaptation of international trade domain  Synthetic data generation for email conversation using LLM inference o Email conversations are private data and cannot be obtained  LoRA Fine-tuning LLM on extraction key-facts  Using LLM for quote generation using markdown syntax  We use Gemma 2 – 2b Instruct o Multilingual o Small

Slide 14

Slide 14 text

Text Inference App Demo on Backend.AI - QuoteFlow

Slide 15

Slide 15 text

Summary and Tag Extraction

Slide 16

Slide 16 text

Tag Assignment

Slide 17

Slide 17 text

Key-Fact Extraction

Slide 18

Slide 18 text

Quote Generation

Slide 19

Slide 19 text

Part 2-Data Preparation and Processing Pipeline Diving into more details

Slide 20

Slide 20 text

Domain Adaptation (DA) - Data collection  Gemma 2 LLM model, may not know some concepts and definition from International Trade  LoRAcontinual pre-training Gemma 2 on vocabulary definition and domain can better help understand email conversation  Lectures were pre-processed by turning them into detailed summaries Lecture transcript

Slide 21

Slide 21 text

Instruction Fine-tuning  Continual-pretraining Domain Adaptation => Instruction Finetuning the model: o Answer questions regarding the domain dataset o To summarize given text

Slide 22

Slide 22 text

Synthetic Dataset Generation

Slide 23

Slide 23 text

Data Flow

Slide 24

Slide 24 text

1. Defining random variables for generation

Slide 25

Slide 25 text

2. Initialize variable for Scenario Generation

Slide 26

Slide 26 text

3. Scenario Generation

Slide 27

Slide 27 text

4. Company Profile Generation

Slide 28

Slide 28 text

Step 4 - Email conversation generation. 28 • PyAutoGen was used to instantiate Agent for Buyer and Supplier and make them do conversation

Slide 29

Slide 29 text

5. Key-Fact extraction using LLM

Slide 30

Slide 30 text

6. Quote Document Generation

Slide 31

Slide 31 text

7. PDF file conversion

Slide 32

Slide 32 text

Preparation dataset for finetuning  Input: Email Conversation  Target: Key-Facts Total synthetic conversations generated and used for finetuning: 1155 The conversations represented as a json format with 'role', 'content' keys.

Slide 33

Slide 33 text

Training dataset with Input and Target Train Input Train Target

Slide 34

Slide 34 text

Formatting dataset for training 34 • Gemma – dialogue template used to format conversations

Slide 35

Slide 35 text

Part 3 – Model training and Evaluation

Slide 36

Slide 36 text

Concept of Domain Adaptation 36

Slide 37

Slide 37 text

Why Domain Adaptation (vs scratch) 37 Train model from scratch DAPT No general knowledge Already have general knowledge High cost (Relatively) Low cost (Relatively) Require large domain dataset (At least tens of GB) Relatively require small domain dataset Train domain-adaptive model (Domain-specific PLM) Difficult to make sufficient domain-specific dataset

Slide 38

Slide 38 text

Training Flow 38 Foundation Model Domain- Adaptive Model Pre- Instructed Model Fine-tuned Model Pretraining on raw domain dataset Fine-tuning on Q/A dataset (PIT) Fine-tuning on extraction details or Summarization & Tags

Slide 39

Slide 39 text

Training Flow 39 Foundation Model Domain- Adaptive Model Pre- Instructed Model Fine-tuned Model Pretraining on raw domain dataset Fine-tuning on Q/A dataset (PIT) Fine-tuning on extraction details or Summarization & Tags

Slide 40

Slide 40 text

Dataset: Pre-Instruction Tuning (PIT) 40 Raw text, speech to text (lecture) Json format, term-description pair Raw text, question-options-answer

Slide 41

Slide 41 text

Dataset: Pre-Instruction Tuning (PIT) 41

Slide 42

Slide 42 text

Dataset: Pre-Instruction Tuning (PIT) 42 Q/A Dataset 25,993 Pairs

Slide 43

Slide 43 text

Training: Pre-Instruction Tuning (PIT) (using FastTrack) 43

Slide 44

Slide 44 text

Training: Pre-Instruction Tuning (PIT) (using FastTrack) 44 Skipped Running Todo Log Checking

Slide 45

Slide 45 text

Training: Pre-Instruction Tuning (PIT) (using FastTrack) 45 Log Checking Status

Slide 46

Slide 46 text

Evaluation: Pre-Instruction Tuning (PIT) 46 Base model Trained model 0.62 0.77 1. SemScore • Target response와 Model response간의 의미적 유사도. 1.00 is the best Q: 송장과 포장 명세서의 차이점은 무엇인가요? • Target response: 송장은 수량, 단가, 총 금액을 포함하며, 포장 명세서는 순중량, 총중량, 포장단위, 포장수량을 포함합니다. 송장과 포장 명세서는 대부분 동일한 내용을 가지지만, 포장 명세서는 금액 정보가 없고 포장 관련 정보가 추가됩니다. • Model response: 송장과 포장 명세서는 내용이 유사하지만, 포장 명세서는 물픔의 포장 상태를 구체적으로 기재하여 실제 포장의 확인을 위한 기준으로 사용됩니다. Similarity: 0.723 => Trained model 이 Base model에 비해 의미적 유사도를 더 잘 반영하고 있음

Slide 47

Slide 47 text

Evaluation: Pre-Instruction Tuning (PIT) 47 2. Truthfulness • LLM에게 Model response와 Target response를 제공한 뒤, 1-5점 사이의 점수를 이유와 함께 제공받음. Q: 운송이 물류에서 어떤 역할을 하는가요? • Target response: 물류비 절감에 중요한 역할을 합니다. • Model response: 운송은 물류의 핵심이며, 효율적인 운송을 위해서는 적절한 수단과 경로를 선택해야 합니다. • Score: 4 • Reason: 모델의 응답은 대부분 사실에 근거하고 있습니다. 모델은 물류의 핵심 요소로서 운송을 정확히 식별하고, 효율적인 운송을 위해 적절한 수단과 경로를 선택하는 것의 중요성을 강조하고 있습니다. 그러나 비용 절감이라는 특정 측면을 직접적으로 다루지는 않아 감점했습니다. => Training was effective: Low scoring(1-2)이 줄어들고, High scoring(4-5)의 비율이 늘어남.

Slide 48

Slide 48 text

Training Flow 48 Foundation Model Domain- Adaptive Model Pre- Instructed Model Fine-tuned Model Pretraining on raw domain dataset Fine-tuning on Q/A dataset (PIT) Fine-tuning on extraction details or Summarization & Tags

Slide 49

Slide 49 text

Fine-tuning: Key Fact Extraction 49 Goal: Extract Key Facts in Yaml Format from email conversation

Slide 50

Slide 50 text

Dataset: Key Fact Extraction 50 1,143 Set of synthetic email conversation dataset

Slide 51

Slide 51 text

Deepspeed를 활용한 Multi-node training (on Backend.AI Cloud) 51 Training code Deepspeed Config Hostfile.txt Node IP Num of GPUs

Slide 52

Slide 52 text

Deepspeed를 활용한 Multi-node training (on Backend.AI Cloud) 52 Main1 Container Sub1 Container Sub2 Container Sub3 Container

Slide 53

Slide 53 text

Evaluation: Key Fact Extraction (metric: F1-score) 53 Model Response Answer Key, Value mismatch

Slide 54

Slide 54 text

Evaluation: Key Fact Extraction (metric: F1-score) 54 True Positive : Exact matching True Negative: When both true, prediction are empty False Positive: When prediction is not null & prediction is different from true False Negative: When prediction is null but true has value F1-score F1-score: 0.72

Slide 55

Slide 55 text

Training Flow 55 Foundation Model Domain- Adaptive Model Pre- Instructed Model Fine-tuned Model Pretraining on raw domain dataset Fine-tuning on Q/A dataset (PIT) Fine-tuning on extraction details or Summarization & Tags

Slide 56

Slide 56 text

Fine-tuning: Summarization & Tags 56 Goal: Extract summary & tags from email conversation 1. 초기 문의 및 견적 요청: • 발신자: 이민재 한국에너텍 구매담당 과장. • 긴급성과 규정 준수를 강조하면서 에너지 솔루션에 대한 자세한 견적 요청을 보냈습니다. 예산은 5억 원으로 명시됐습니다. 2. 답변: • 발신자: EnerTech Solutions 수석 영업 관리자 David Park. • 태양광 패널, 풍력 터빈, 에너지 저장 시스템, 설치/유지보수 서비스가 포함된 세부 견적을 제공하며 총 5억 원입니다. • 배송 시 결제, 표준 배송과 빠른 배송 옵션, 규정 준수, 5년 보증 등의 서비스 약관이 포함되어 있습니다. Summary ['거래', '협상', '공급망', '에너지', '기술', '계약', '한국'] Tags

Slide 57

Slide 57 text

Dataset: Summarization & Tags 57 1143 Set of summary & tags of email conversation

Slide 58

Slide 58 text

Evaluation: Summarization 58 FineSurE: Fine-grained Summarization Evaluation using LLMs (2024) Faithfulness: 문서에 없는 정보를 포함하거나, 문서의 내용과 다른 잘못된 정보를 포함하고 있는가? (Hallucination) Completeness: 모든 핵심 사실을 포함하고 있는가? Conciseness: 모델의 요약에 불필요한 세부 정보가 포함되지 않았는가?

Slide 59

Slide 59 text

Evaluation: Summarization 59 FineSurE: Fine-grained Summarization Evaluation using LLMs (2024) 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% Faithfuflness Completeness Conciseness Evaluation using FineSurE for nine LLMs Phi-2 Mixtral-8x7b Mixtral-8x7b-Inst Llama3-70b-Inst Gemini-1-pro GPT-3.5-turbo GPT-4-turbo GPT-4-omni Fine-tuned model (Ours) • High faithfulness • High completeness • Better than gpt-4 turbo’s conciseness

Slide 60

Slide 60 text

Evaluation: Tags 60 ['에너지 솔루션', '에너지 규정', '신속 배송', '지불 조건', '주문 확인', '배송', '보증', '한국 에너테크 주식회사'] Model Generate Tags ['에너지', '에너지', '신속 배송', '지불 조건', '주문 확인', '협상', '보증', '한국'] Adjusted Model generated Tags ['거래', '협상', '공급망', '에너지', '기술', '계약', '한국'] Target Tags 의미적 유사도 비교 F1-score 계산

Slide 61

Slide 61 text

Evaluation: Tags 61 • Metric: F1-score • 추출해야할 태그의 수와 범위가 명확하지 않아 모델의 성능이 과소평가 될 수 있음 • 의미적으로 유사한 단어는 올바르게 추출한 것으로 간주 • Base model에 비해 89% 향상된 태그 추출 능력

Slide 62

Slide 62 text

Conclusion  "QuoteFlow" - Backend.AI system pipeline introduction  Domain Adaptation training  Instruction Finetuning  Synthetic Dataset Generation  Key-Fact Extraction  Evaluation

Slide 63

Slide 63 text

No content

Slide 64

Slide 64 text

Appendix