Whisper보다 6배빠른 distil-whisper로 오디오 데이터에서 RAG 수행기

Slide 1

Slide 1 text

1 Whisper보다 6배 빠른 Ditil-Whisper로 오디오 데이터에서 RAG 수행기 백 혜 림 LangChainKR 2023

Slide 2

Slide 2 text

연사소개 이전에 주로 하던 분야 Speech enhancement, Source separation 퇴사 후 주로 요즘 관심있는 분야는 LLM, Speech, Mutimodal, Mlops + 강의 백 혜 림 [email protected] 2 LinkedIn : https://www.linkedin.com/in/hyerimbaek-227489185/ TechBlog : https://rimiyeyo.tistory.com

Slide 3

Slide 3 text

3 목차 1.Overview 2.STT Model 3.FlashAttention 4.TextSplit & Embedding 5.RAG 6.DEMO

Slide 4

Slide 4 text

4 [email protected] 백혜림 Project Overview Project Overview

Slide 5

Slide 5 text

5 Overview Youtube url 입력 오디오 추출 crawling Distil- Whisper 23.11.2 Huggingface 출시 Audio에서 추출한 Text Huggingface Embedding Vector Database Text Splitters Embeddings LLM 질문 답변 Relevancy ChatGPT API

Slide 6

Slide 6 text

6 [email protected] 백혜림 STT MODEL STT MODEL

Slide 7

Slide 7 text

7 Whisper 2022년 9월에 출시된 음성인식(ASR, Speech to Text) 다양한 언어로 학습하고, 훈련데이터 총 680,000시간 중 한국어 8,000시간 학습 출처 : https://arxiv.org/abs/2212.04356

Slide 8

Slide 8 text

8 Distil-Whisper Distil-Whisper, Whisper보다 속도는 6배 빠르다, 성능은 1%로 유지 인코더는 유지하고 디코더는 2개만 유지. Weighted distillation loss사용 라이선스가 있는 9개의 오픈 소스 데이터 셋 22,000시간 분량. 훈련용 라벨은 whisper로 pseudo label생성. WER 필터가 적용되어 WER 점수가 10% 이상인 라벨만 유지된다는 것 출처 : https://arxiv.org/abs/2311.00430

Slide 9

Slide 9 text

9 Distil-Whisper Distil-Whisper, Whisper보다 속도는 6배 빠르다, 성능은 1%로 유지

Slide 10

Slide 10 text

10 Distil-Whisper Distil-Whisper, Whisper보다 속도는 6배 빠르다, 성능은 1%로 유지 Flash attention2에 주목!

Slide 11

Slide 11 text

11 Flash attention2 Distil-Whisper, Whisper보다 속도는 6배 빠르다, 성능은 1%로 유지 Flash attention2에 주목! pip install flash-attn --no-build-isolation

Slide 12

Slide 12 text

12 Flash attention2 Distil-Whisper, Whisper보다 속도는 6배 빠르다, 성능은 1%로 유지 Flash attention2에 주목! 일반 컴퓨터에서는 돌아가지 않습니다! pip install flash-attn --no-build-isolation RuntimeError: FlashAttention only supports Ampere GPUs or newer.

Slide 13

Slide 13 text

13 Flash attention2 Distil-Whisper, Whisper보다 속도는 6배 빠르다, 성능은 1%로 유지 Flash attention2에 주목! 일반 컴퓨터에서는 돌아가지 않습니다! Flash Attention2는 최신 Nvidia GPU 3090 및 4090, A100, H100 pip install flash-attn --no-build-isolation

Slide 14

Slide 14 text

14 Flash attention2 GPU는 런타임시 3가지 메모리에 접근할 수 있습니다. 실제 실행 코어와 함께 위치한 on-chip 메모리 크기는 제한되어있지만 (A100의 경우 최대 20MB) 매우 빠름! 칩 외부이지만 카드 내부 메모리. 즉, gpu에 있지만 코어 자체와 같은 위치에 있지 않음 A100에는 40GB의 HBM있지만 대역폭 1.5TB 전통적인 CPU RAM 너무 느림 GPU 아키텍쳐

Slide 15

Slide 15 text

15 Flash attention 출처 : https://www.marktechpost.com/2022/06/03/researchers-at-stanford-university-propose-flashattention-fast-and-memory-efficient-exact-attention-with-io-awareness/ FlashAttention은 캐시된 키/값 블록을 SRAM에 저장하여 각 단계마다 고대역폭메모리(HBM)에서 다시 읽는 것을 방지 느린 HBM에 대한 액세스를 줄이고 처리량을 향상시킵니다. GPU 스레드 활용 및 작업 분할에서 여전히 일부가 비효율성

Slide 16

Slide 16 text

16 Flash attention2 출처 : https://crfm.stanford.edu/2023/07/17/flash2.html - 비 matmul flop 감소 : GPU에서 더 느린 비 행렬 곱셈 연산을 줄이도록 알고리즘이 조정되고, 빠른 matmul FLOP의 비율을 증가 - 더 나은 병렬성 : 배치 크기/헤더를 넘어 시퀀스 길이에 대한 병렬성을 추가. 이는 긴 시퀀스에 대한 GPU 활용도를 향상 - 더욱 스마트하진 작업 분할 : 작업을 GPU스레드로 나누어 불필요한 동기화와 공유 메모리 사용량을 줄입니다. 동일한 시간 프레임에서 훨씬 더 긴 context로 모델을 훈련이 가능!

Slide 17

Slide 17 text

17 Whisper vs Distil-Whisper 출처 : https://www.linkedin.com/posts/liorsinclair_a-team-just-made-openais-whisper-6x-faster-activity-7130263371920097280-cQsi?utm_source=share&utm_medium=member_desktop 데모 체험 : https://huggingface.co/spaces/Xenova/distil-whisper-web

Slide 18

Slide 18 text

18 [email protected] 백혜림 Text Splitter Text Splitter

Slide 19

Slide 19 text

19 Overview Youtube url 입력 오디오 추출 crawling Distil- Whisper 23.11.2 Huggingface 출시 Audio에서 추출한 Text Huggingface Embedding Vector Database Text Splitters Embeddings LLM 질문 답변 Relevancy ChatGPT API

Slide 20

Slide 20 text

20 Text Splitter 문장에 의미론적 의미를 지닌 작은 조각으로 텍스트를 나누는 도구입니다. Text Splitter가 중요한 이유! LLM모델마다 max token의 수가 다르기 때문.

Slide 21

Slide 21 text

21 Text Splitter 문장에 의미론적 의미를 지닌 작은 조각으로 텍스트를 나누는 도구입니다. Text Splitter가 중요한 이유! LLM모델마다 max token의 수가 다르기 때문. RecursiveCharacterTextSplitter는 기본적으로 토큰 수가 아닌 문자 수로 분할

Slide 22

Slide 22 text

22 Text Splitter 문장에 의미론적 의미를 지닌 작은 조각으로 텍스트를 나누는 도구입니다. Text Splitter가 중요한 이유! LLM모델마다 max token의 수가 다르기 때문. RecursiveCharacterTextSplitter는 기본적으로 토큰 수가 아닌 문자 수로 분할 RecursiveCharacterTextSplitter.from_tiktoken_encoder는 토큰 수로 분할

Slide 23

Slide 23 text

23 Embedding 텍스트나 이미지와 같은 비정형 데이터를 고차원의 벡터 형태로 변환하는 것을 의미 자연어처리에서 단어나 문장을 고차원 벡터로 변환하는 작업은 단어 임베딩 임베딩의 주요 목적은 원래의 데이터의 의미나 특성을 최대한 보존하면서 연산이 가능한 형태로 변환하는 것 Text Splitters https://projector.tensorflow.org/

Slide 24

Slide 24 text

24 Embedding 출처 : https://www.graphable.ai/blog/knowledge-graph-embeddings/ https://pranay-dave9.medium.com/openai-embeddings-the-key-to-powerful-text-clustering-342706b22d12

Slide 25

Slide 25 text

25 VectorStore 임베딩 및 관련 문서를 저장하고 쿼리할 수 있는 특별한 유형의 데이터베이스 “의미상 유사항”항목을 검색할 때, ‘모자를 쓴 고양이’와 유사한 문서를 검색하려는 경우 고양이, 모자 또는 모자를 쓴 기타 동물과 관련된 결과를 찾을 수 있습니다.

Slide 26

Slide 26 text

26 VectorStore 임베딩 및 관련 문서를 저장하고 쿼리할 수 있는 특별한 유형의 데이터베이스 결국 벡터스토어에서 원하는 검색 결과를 가져오기 위해서는 임베딩 벡터간의 유사도를 측정해야 함 벡터스터어는 코사인 유사도 검색을 최적화해서 수행할 수 있도록 설계되어 있음. 임베딩 벡터를 저장할 때 특정한 방식으로 인덱싱해 검색 시간을 크게 단축

Slide 27

Slide 27 text

27 Load LLM과 RAG수행 RAG(Retrieval Augmented Generation) 어떻게 질문에서 적절한 content를 선택할 수 있을까? 출처 : https://opentutorials.org/module/6369

Slide 28

Slide 28 text

28 1 2 3 들어주셔서 감사합니다 ☺ [email protected] 백혜림