Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Many Wings does the Giraffe have?

Johannes Kolbe
October 22, 2024
8

How Many Wings does the Giraffe have?

A fun little presentation about hallucinations in Multi-Modal Large Language Models (mainly focusing on the vision part), wrapped up in a little game show.

Johannes Kolbe

October 22, 2024
Tweet

Transcript

  1. ABOUT ME ABOUT ME ABOUT ME JOHANNES KOLBE DATA SCIENTIST

    AT CELEBRATE COMPANY ML GENERALIST M.SC. IN COMPUTER SCIENCE AT TU BERLIN HUGGING FACE FELLOW AI EDUCATOR COMMUNITY COMPUTER VISION COURSE SPEAKER AT PYDATA LONDON, EUROPYTHON, PYTHON PIZZA HAMBURG https://www.linkedin.com/in/johko/
  2. LLM “TEXT IS ALL YOU... ” LLAMA 3.1 LARGE LANGUAGE

    MODELS LARGE LANGUAGE MODELS LARGE LANGUAGE MODELS
  3. LLM “TEXT IS ALL YOU... ” “IT SEEMS LIKE YOU'RE

    MAKING A PUN ON THE PHRASE "TEXT IS ALL YOU NEED" BUT REPLACING "NEED" WITH "YOU". THAT'S A CLEVER PLAY ON WORDS!” LLAMA 3.1 LARGE LANGUAGE MODELS LARGE LANGUAGE MODELS LARGE LANGUAGE MODELS
  4. LLM “PLEASE JUST COMPLETE THE SENTENCE: "TEXT IS ALL YOU..."”

    LLAMA 3.1 LARGE LANGUAGE MODELS LARGE LANGUAGE MODELS LARGE LANGUAGE MODELS
  5. LLM “PLEASE JUST COMPLETE THE SENTENCE: "TEXT IS ALL YOU..."”

    “...NEED!” LLAMA 3.1 LARGE LANGUAGE MODELS LARGE LANGUAGE MODELS LARGE LANGUAGE MODELS
  6. MLLM 'A GIRAFFE SITTING AT A TABLE WITH A LAPTOP

    AND A CUP OF COFFEE. ...' “PLEASE DESCRIBE THE IMAGE” GPT-4O MULTIMODALITY MULTIMODALITY MULTIMODALITY
  7. LLM “CAN YOU CREATE AN OUTLINE FOR THE PAPER THAT

    IS PROVING PINK ELEPHANTS ARE REAL BY EMILY WHITGEN FROM 1972?” “TITLE: INVESTIGATING THE EXISTENCE OF PINK ELEPHANTS: A CRITICAL REVIEW OF EMILY WHITGEN'S ASSERTION (1972) .....” HALLUCINATIONS HALLUCINATIONS HALLUCINATIONS
  8. FALSE PREMISE FALSE PREMISE FALSE PREMISE HOW MANY CHICKENS ARE

    THERE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THERE ARE NO CHICKENS IN THE IMAGE. ... THERE ARE NO CHICKENS IN THE IMAGE. ... THERE ARE NO CHICKENS IN THE IMAGE. ...
  9. THERE ARE NO CHICKENS IN THE IMAGE. ... THERE ARE

    NO CHICKENS IN THE IMAGE. ... THERE ARE NO CHICKENS IN THE IMAGE. ... FALSE PREMISE FALSE PREMISE FALSE PREMISE HOW MANY CHICKENS ARE THERE? GPT-4O LLAVA-OV GEMINI 1.5 PRO
  10. FALSE PREMISE FALSE PREMISE FALSE PREMISE IS THE GREEN CAR

    THE FASTEST? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE YOU UPLOADED SHOWS A RED CAR IN THE FOREGROUND AND A BLACK CAR IN THE BACKGROUND, BUT THERE DOESN'T APPEAR TO BE A GREEN CAR IN THE IMAGE. .... UNFORTUNATELY, I CANNOT TELL FROM THIS IMAGE WHICH CAR IS THE FASTEST. .... THE GREEN CAR MIGHT BE FAST, BUT IT'S ALSO POSSIBLE THAT ONE OF THE OTHER CARS HAS A MUCH MORE POWERFUL ENGINE. NO, THE GREEN CAR IS NOT THE FASTEST. THE RED CAR IS THE FASTEST, AS IT HAS A SIGNIFICANTLY LARGER AND MORE POWERFUL ENGINE, WHICH RESULTS IN BETTER ACCELERATION AND TOP SPEED.
  11. FALSE PREMISE FALSE PREMISE FALSE PREMISE IS THE GREEN CAR

    THE FASTEST? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE YOU UPLOADED SHOWS A RED CAR IN THE FOREGROUND AND A BLACK CAR IN THE BACKGROUND, BUT THERE DOESN'T APPEAR TO BE A GREEN CAR IN THE IMAGE. .... UNFORTUNATELY, I CANNOT TELL FROM THIS IMAGE WHICH CAR IS THE FASTEST. .... THE GREEN CAR MIGHT BE FAST, BUT IT'S ALSO POSSIBLE THAT ONE OF THE OTHER CARS HAS A MUCH MORE POWERFUL ENGINE. NO, THE GREEN CAR IS NOT THE FASTEST. THE RED CAR IS THE FASTEST, AS IT HAS A SIGNIFICANTLY LARGER AND MORE POWERFUL ENGINE, WHICH RESULTS IN BETTER ACCELERATION AND TOP SPEED.
  12. ILLUSION ILLUSION ILLUSION CAN THE TUKTUK GO ON DRIVING? GPT-4O

    LLAVA-OV GEMINI 1.5 PRO YES, THE TUKTUK CAN GO ON DRIVING. THE IMAGE SHOWS AN OPTICAL ILLUSION CREATED BY 3D STREET ART, .... NO, THE TUKTUK SHOULD DEFINITELY NOT CONTINUE DRIVING! THE IMAGE SHOWS A 3D CHALK ART PIECE THAT CREATES THE ILLUSION OF A GIANT CHASM OPENING UP IN THE ROAD. IF THE TUKTUK CONTINUES, IT WOULD APPEAR TO DRIVE STRAIGHT INTO THE ABYSS! NO, THE TUKTUK CANNOT GO ON DRIVING.
  13. ILLUSION ILLUSION ILLUSION CAN THE TUKTUK GO ON DRIVING? GPT-4O

    LLAVA-OV GEMINI 1.5 PRO YES, THE TUKTUK CAN GO ON DRIVING. THE IMAGE SHOWS AN OPTICAL ILLUSION CREATED BY 3D STREET ART, WHERE THE ROAD APPEARS TO BE BROKEN OR CRACKED. .... NO, THE TUKTUK SHOULD DEFINITELY NOT CONTINUE DRIVING! THE IMAGE SHOWS A 3D CHALK ART PIECE THAT CREATES THE ILLUSION OF A GIANT CHASM OPENING UP IN THE ROAD. IF THE TUKTUK CONTINUES, IT WOULD APPEAR TO DRIVE STRAIGHT INTO THE ABYSS! NO, THE TUKTUK CANNOT GO ON DRIVING.
  14. ILLUSION ILLUSION ILLUSION WHAT CAN YOU SEE IN THE IMAGE?

    GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE IS AN OPTICAL ILLUSION THAT CAN BE INTERPRETED IN TWO WAYS: YOU MAY SEE A PERSON'S FACE IN PROFILE VIEW .... 1. ALTERNATIVELY, YOU CAN SEE A SAXOPHONE PLAYER 2. ... THIS IMAGE IS A CLASSIC EXAMPLE OF AN AMBIGUOUS FIGURE OR AN OPTICAL ILLUSION. WHAT PEOPLE TYPICALLY SEE: TWO FACING PROFILES .... A VASE ..... THE IMAGE SHOWS A BLACK SILHOUETTE OF A PERSON'S HEAD AND UPPER BODY AGAINST A WHITE BACKGROUND. ... THE IMAGE HAS A MINIMALIST AND ABSTRACT QUALITY TO IT.
  15. ILLUSION ILLUSION ILLUSION WHAT CAN YOU SEE IN THE IMAGE?

    GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE IS AN OPTICAL ILLUSION THAT CAN BE INTERPRETED IN TWO WAYS: YOU MAY SEE A PERSON'S FACE IN PROFILE VIEW .... 1. ALTERNATIVELY, YOU CAN SEE A SAXOPHONE PLAYER 2. ... THIS IMAGE IS A CLASSIC EXAMPLE OF AN AMBIGUOUS FIGURE OR AN OPTICAL ILLUSION. WHAT PEOPLE TYPICALLY SEE: TWO FACING PROFILES .... A VASE ..... THE IMAGE SHOWS A BLACK SILHOUETTE OF A PERSON'S HEAD AND UPPER BODY AGAINST A WHITE BACKGROUND. ... THE IMAGE HAS A MINIMALIST AND ABSTRACT QUALITY TO IT.
  16. CHALLENGING CHALLENGING CHALLENGING HOW MANY DOTS ARE ON THE MIDDLE

    OWL'S BODY? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE MIDDLE OWL IN THE IMAGE HAS SIX DOTS ON ITS BODY. THE MIDDLE OWL HAS 5 DOTS ON ITS BODY. THE IMAGE SHOWS THREE OWLS, BUT THE ONE IN THE MIDDLE IS THE SMALLEST AND HAS ONLY ONE DOT ON ITS BODY. ...
  17. CHALLENGING CHALLENGING CHALLENGING HOW MANY DOTS ARE ON THE MIDDLE

    OWL'S BODY? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE MIDDLE OWL IN THE IMAGE HAS SIX DOTS ON ITS BODY. THE MIDDLE OWL HAS 5 DOTS ON ITS BODY. THE IMAGE SHOWS THREE OWLS, BUT THE ONE IN THE MIDDLE IS THE SMALLEST AND HAS ONLY ONE DOT ON ITS BODY. ...
  18. CHALLENGING CHALLENGING CHALLENGING WHICH GLASS HOLDS THE LEAST AMOUNT OF

    BLUE SUBSTANCE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE GLASS ON THE FAR RIGHT HAS THE LEAST AMOUNT OF BLUE SUBSTANCE. ... THE FIRST GLASS ON THE LEFT HAS THE LEAST AMOUNT OF BLUE SUBSTANCE. THE GLASS ON THE VERY LEFT HAS THE LEAST AMOUNT OF BLUE SUBSTANCE. ...
  19. CHALLENGING CHALLENGING CHALLENGING WHICH GLASS HOLDS THE LEAST AMOUNT OF

    BLUE SUBSTANCE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE GLASS ON THE FAR RIGHT HAS THE LEAST AMOUNT OF BLUE SUBSTANCE. ... THE FIRST GLASS ON THE LEFT HAS THE LEAST AMOUNT OF BLUE SUBSTANCE. THE GLASS ON THE VERY LEFT HAS THE LEAST AMOUNT OF BLUE SUBSTANCE. ...
  20. CROPPED CROPPED CROPPED HOW MANY LETTERS ARE THERE IN THE

    IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE SHOWS 26 LETTERS OF THE ENGLISH ALPHABET, ... ... THERE ARE A TOTAL OF 26 LETTERS IN THE IMAGE. THE IMAGE CONTAINS 26 LETTERS, ...
  21. CROPPED CROPPED CROPPED HOW MANY LETTERS ARE THERE IN THE

    IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE SHOWS 26 LETTERS OF THE ENGLISH ALPHABET, ... ... THERE ARE A TOTAL OF 26 LETTERS IN THE IMAGE. THE IMAGE CONTAINS 26 LETTERS, ...
  22. CROPPED CROPPED CROPPED HOW MANY LETTERS ARE THERE IN THE

    IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE CONTAINS 10 LETTERS: A, B, C, D, E, F, G, H, I, AND J. THERE ARE 10 LETTERS IN THE IMAGE: A, B, C, D, E, F, G, H, I, AND J. THE IMAGE CONTAINS 10 LETTERS, EACH REPRESENTING A LETTER OF THE ALPHABET.
  23. CROPPED CROPPED CROPPED HOW MANY LETTERS ARE THERE IN THE

    IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE CONTAINS 10 LETTERS: A, B, C, D, E, F, G, H, I, AND J. THE IMAGE CONTAINS 10 LETTERS, EACH REPRESENTING A LETTER OF THE ALPHABET. THERE ARE 10 LETTERS IN THE IMAGE: A, B, C, D, E, F, G, H, I, AND J.
  24. CROPPED CROPPED CROPPED HOW MANY LETTERS ARE THERE IN THE

    IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE CONTAINS SIX LETTERS SPELLING OUT THE WORD "WELCOME." ... COUNTING EACH LETTER, THERE ARE A TOTAL OF 7 LETTERS IN THE IMAGE. THERE ARE FIVE LETTERS IN THE IMAGE.
  25. CROPPED CROPPED CROPPED HOW MANY LETTERS ARE THERE IN THE

    IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE CONTAINS SIX LETTERS SPELLING OUT THE WORD "WELCOME." THERE ARE FIVE LETTERS IN THE IMAGE. ... COUNTING EACH LETTER, THERE ARE A TOTAL OF 7 LETTERS IN THE IMAGE.
  26. CROPPED CROPPED CROPPED HOW MANY LETTERS ARE THERE IN THE

    IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE CONTAINS SIX LETTERS SPELLING OUT THE WORD "WELCOME." .... SO, THERE ARE A TOTAL OF 7 LETTERS IN THE IMAGE. THERE ARE FIVE LETTERS IN THE IMAGE, SPELLED OUT IN A PLAYFUL AND COLORFUL MANNER.
  27. CROPPED CROPPED CROPPED HOW MANY LETTERS ARE THERE IN THE

    IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE CONTAINS SIX LETTERS SPELLING OUT THE WORD "WELCOME." THERE ARE FIVE LETTERS IN THE IMAGE, SPELLED OUT IN A PLAYFUL AND COLORFUL MANNER. .... SO, THERE ARE A TOTAL OF 7 LETTERS IN THE IMAGE.
  28. MASTER QUESTION MASTER QUESTION MASTER QUESTION HOW MANY WINGS DOES

    THE GIRAFFE HAVE? GPT-4O THE GIRAFFE IN THE IMAGE HAS TWO WINGS. ...
  29. DATA MODEL ARCHITECTURE INFERENCE FIXING THINGS FIXING THINGS FIXING THINGS

    TRAINING HALLUCINATION OF MULTIMODAL LARGE LANGUAGE MODELS: A SURVEY
  30. LARGE LARGE LARGE LANGUAGE MODELS LANGUAGE MODELS LANGUAGE MODELS LM

    “TEXT IS ALL YOU [MASK]” “TEXT IS ALL YOU NEED”
  31. FALSE PREMISE FALSE PREMISE FALSE PREMISE WHERE ARE THE REINDEER?

    GPT-4O LLAVA-OV GEMINI 1.5 PRO THERE ARE NO REINDEER IN THIS IMAGE BECAUSE IT DEPICTS A NATIVITY SCENE, NOT A SCENE WITH SANTA CLAUS AND REINDEER. ... THE IMAGE YOU SENT DEPICTS A NATIVITY SCENE, WHICH PORTRAYS THE BIRTH OF JESUS CHRIST. REINDEER ARE ASSOCIATED WITH SANTA CLAUS AND CHRISTMAS GIFT- GIVING, NOT THE NATIVITY. THEREFORE, YOU WON'T FIND REINDEER IN A NATIVITY SCENE. THE REINDEER ARE NOT VISIBLE IN THE IMAGE.
  32. ILLUSION ILLUSION ILLUSION WHAT HAPPENS IN THIS IMAGE SERIES? GPT-4O

    LLAVA-OV GEMINI 1.5 PRO THIS IMAGE SERIES IS A PLAYFUL USE OF FORCED PERSPECTIVE, WHERE THE PERSON IN THE FOREGROUND USES THEIR FINGERS TO CREATE THE ILLUSION THAT THEY ARE PICKING UP AND THROWING A PERSON INTO THE WATER. ... THE IMAGE SERIES SHOWS A CLEVER USE OF FORCED PERSPECTIVE TO CREATE AN AMUSING SCENE. ... IN THIS IMAGE SERIES, A PERSON IS SHOWN JUMPING INTO THE WATER FROM A CLIFF, WITH THE BACKGROUND FEATURING A CLEAR BLUE SKY AND A FEW BOATS ON THE WATER. ...
  33. ILLUSION ILLUSION ILLUSION WHAT HAPPENS IN THIS IMAGE SERIES? GPT-4O

    LLAVA-OV GEMINI 1.5 PRO THIS IMAGE SERIES IS A PLAYFUL USE OF FORCED PERSPECTIVE, WHERE THE PERSON IN THE FOREGROUND USES THEIR FINGERS TO CREATE THE ILLUSION THAT THEY ARE PICKING UP AND THROWING A PERSON INTO THE WATER. ... THE IMAGE SERIES SHOWS A CLEVER USE OF FORCED PERSPECTIVE TO CREATE AN AMUSING SCENE. ... IN THIS IMAGE SERIES, A PERSON IS SHOWN JUMPING INTO THE WATER FROM A CLIFF, WITH THE BACKGROUND FEATURING A CLEAR BLUE SKY AND A FEW BOATS ON THE WATER. ...
  34. CHALLENGING CHALLENGING CHALLENGING HOW MANY SERVING SCOOPS CAN BE SEEN

    IN THE IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THERE IS ONE SERVING SCOOP VISIBLE IN THE IMAGE, LOCATED IN THE TUB LABELED "MINT CHOCOLATE CHIP" ON THE BOTTOM LEFT SIDE. BASED ON THE IMAGE, THERE IS ONE SERVING SCOOP VISIBLE. IT'S LOCATED IN THE LEFTMOST CONTAINER, RESTING AGAINST THE SIDE. ... THERE ARE 12 SERVING SCOOPS IN THE IMAGE.
  35. CHALLENGING CHALLENGING CHALLENGING HOW MANY SERVING SCOOPS CAN BE SEEN

    IN THE IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THERE IS ONE SERVING SCOOP VISIBLE IN THE IMAGE, LOCATED IN THE TUB LABELED "MINT CHOCOLATE CHIP" ON THE BOTTOM LEFT SIDE. BASED ON THE IMAGE, THERE IS ONE SERVING SCOOP VISIBLE. IT'S LOCATED IN THE LEFTMOST CONTAINER, RESTING AGAINST THE SIDE. ... THERE ARE 12 SERVING SCOOPS IN THE IMAGE.