Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Audio and Video Processing with Generative AI

Issei.Komori
September 26, 2024

Audio and Video Processing with Generative AI

NES Sapporo Tech Brewery #1 登壇資料
https://nec-solutioninnovators.connpass.com/event/323337/

Issei.Komori

September 26, 2024
Tweet

More Decks by Issei.Komori

Other Decks in Technology

Transcript

  1. • 一般認知されている ◦ テレビCM ▪ みんなAI Gemini に相談だ ◦ 本屋

    ▪ 面倒なことはChatGPTにやらせよう 生成AIのここ最近
  2. OpenAI コンシューマー向けAIチャット マップ Anthropic Google Gemini Free Advanced ChatGPT Free

    Plus Team Enterprise Claude Free Pro Team Gemini for GoogleWorkSpace Business Enterprise ... Gems GPTs Perplexity NotebookLM X Grok Perplexity Genspark Genspark Microsoft Projects
  3. OpenAI よく使われるクラウド周辺と、モデル マップ Anthropic Google GPT 4o 4o-mini o1 Whisper

    DALL-E TTS Embeddings Claude Haiku Sonnet Opus Gemini Ultra Pro Flash Nano Gemma Google Cloud Azure AWS AOAI Bedrock Vertex AI Microsoft
  4. OpenAI 開発者コンソール, Tools マップ Anthropic Google Google Cloud Azure AWS

    Vertex AI AOAI Bedrock Vertex AI Stusio Google AI Stusio Azure OpenAI Studio Amazon Bedrock Studio Playglound API Console GitHub Amazon Q Developer GitHub Copilot Microsoft
  5. from openai import OpenAI client = OpenAI() audio_file= open("/path/to/file/german.mp3", "rb")

    transcriptions = client.audio.transcriptions.create( model="whisper-1", file=audio_file, response_format="verbose_json", timestamp_granularities=["segment"] ) print(transcriptions.text) ←文字起こし ex) Whisper ←タイムスタンプ
  6. ex) Whisper 超えそうだったらチャンク分割 rom pydub import AudioSegment song = AudioSegment.from_mp3("good_morning.mp3")

    # PyDub handles time in milliseconds ten_minutes = 10 * 60 * 1000 first_10_minutes = song[:ten_minutes] first_10_minutes.export("good_morning_10.mp3", format="mp3")
  7. Person A: Hey, are we still meeting at the cafe

    this afternoon? I’m thinking of working on our project there. Person B: Yes, I’m good with that! What time are we meeting again? Person C: I think we said around 3 PM, right? I’ll be there a bit earlier to grab a table. Person A: Perfect! I’ll bring my laptop and notes. Do we have any updates from last time? Person B: I worked on the new designs and made some progress. I’ll show them to you when we meet. Person C: Great! I’ve been working on the backend, so we can discuss how to integrate everything later. Person A: Awesome, looking forward to it. See you guys at 3! Person B: See you! Person C: See you soon! Transcription Diarizationは、 ABCの情報がほしいかどうか 音声からでなければ抽出できない
  8. ex) Whisper 文字起こしに前処理いれる from pyannote.audio import Audio, Pipeline pipeline =

    Pipeline.from_pretrained("pyannote/speaker-diarization-3.1") diarization = pipeline(audio_file.name) audio = Audio(sample_rate=16000, mono=True) for segment, _, speaker in diarization.itertracks(yield_label=True): # 音声ファイルから話者のセグメントを切り出す waveform, sample_rate = audio.crop(no_silence_audio_file.name, segment)   ・・・
  9. ex) Whisper MP3 MP3 MP3 MP3 MP3 Person A: Hey,

    are we still meeting at the cafe this afternoon? I’m thinking of working on our project there. Person B: Yes, I’m good with that! What time are we meeting again? Person C: I think we said around 3 PM, right? I’ll be there a bit earlier to grab a table. Person A: Perfect! I’ll bring my laptop and notes. Do we have any updates from last time? Person B: I worked on the new designs and made some progress. I’ll show them to you when we meet. Person C: Great! I’ve been working on the backend, so we can discuss how to integrate everything later. Person A: Awesome, looking forward to it. See you guys at 3! Person B: See you! Person C: See you soon!
  10. from google.cloud import storage from vertexai.generative_models import GenerationConfig, GenerativeModel, Part

    bucket = storage.Client().bucket(bucket_name) blob = bucket.blob(file_name) blob.upload_from_file(file) gcs_uri = f"gs://{bucket_name}/{file_name}" video = Part.from_uri(mime_type="video/mp4", uri=gcs_uri) prompt = "XXXXX" response = model.generate_content([video, prompt], stream=True) for chunk in response: result += chunk.text print(result) ex) Vertex AI (Gemini 1.5 Pro) ←ストレージ保存 ←ここ
  11. • Transcription • Diarization • TimeStamp • File Size •

    Expression ex) Vertex AI (Gemini 1.5 Pro)