Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

LISAで推論セグメンテーションを楽しもう

 LISAで推論セグメンテーションを楽しもう

2023/09/13に開催された大阪駆動開発様主催の「AIミーティング」で発表した資料です。
"LISA: Reasoning Segmentation via Large Language Model" (Lai et al., 2023)を使ってみた感想です。

スライドの内容について誤りやご意見がありましたら[email protected]までご連絡ください。

大阪駆動開発
conpass: https://osaka-driven-dev.connpass.com/

AIミーティング 2023/09/13 #AIMTG #ChatGPT #GPT4 #PaLM
connpass: https://osaka-driven-dev.connpass.com/event/292953/

LISA: Reasoning Segmentation via Large Language Model (Lai et al., 2023)
github: https://github.com/dvlab-research/LISA
arXiv: https://arxiv.org/abs/2308.00692

Hirofumi Matsuda

October 11, 2023
Tweet

Other Decks in Technology

Transcript

  1. LISAとは “LISA: Reasoning Segmentation via Large Language Model” (Lai et

    al., 2023, arXiv: 2308.00692) →大規模言語モデルを用いた推論セグメンテーション Large Language Instructed Segmentation Assistant
  2. LISAとは: Example 1 INPUT OUTPUT "Can you segment the camera

    lens that is more suitable for photographing nearby objects in this image?" "Sure, [SEG]." 出典) https://github.com/dvlab-research/LISA
  3. LISAとは: Example 2 INPUT OUTPUT "Who was the president of

    the US in this image? Please output segmentation mask and explain the reason." "Sure, the segmentation result is [SEG]. The President of the United States in the image is President Obama.” 出典) https://github.com/dvlab-research/LISA
  4. LISAとは INPUT OUTPUT 従来 LISA image segmentation mask NN image

    segmentation mask LMM text • 対象物を事前にカテゴリ化して明示的に示す必要がある • 暗黙的な指示を理解することはできない • 入力のテキストをクエリとした複雑な推論が可能 例)「テレビのアレ、どこだっけ…?あのチャンネル変 えるやつ、アレだよアレ」→答え: リモコン
  5. 複雑なクエリの例: 「どんな動物がいるか説明して」 INPUT OUTPUT Can you segment the animal? Please

    explain what animals are there. ASSITANT: Sure, [SEG]. In the image, there is a cow statue and a calf statue outside a building.
  6. 複雑なクエリの例: 「建物の入り口と道のりを教えて」 INPUT OUTPUT Can you segment the entrance of

    building? Please explain how we can get to there. ASSITANT: Sure, the segmentation result is [SEG]. In the image, there is a large white cow statue located outside of a building. To get to the entrance of the building, one would need to walk around the cow statue and approach the building from the front. The cow statue is quite prominent, so it should be easy to spot and navigate around while approaching the building.
  7. LLaVA: Large Language and Vision Assistant - Multi Modal LLM

    出典) https://github.com/haotian-liu/LLaVA https://llava.hliu.cc/
  8. References • LISA: Reasoning Segmentation via Large Language Model https://github.com/dvlab-research/LISA

    https://arxiv.org/abs/2308.00692 • LLaVA: Large Language and Vision Assistant https://github.com/haotian-liu/LLaVA • Segment Anything https://github.com/facebookresearch/segment-anything • VisionLLM https://github.com/OpenGVLab/VisionLLM https://arxiv.org/abs/2305.11175