A fun little presentation about hallucinations in Multi-Modal Large Language Models (mainly focusing on the vision part), wrapped up in a little game show.
AT CELEBRATE COMPANY ML GENERALIST M.SC. IN COMPUTER SCIENCE AT TU BERLIN HUGGING FACE FELLOW AI EDUCATOR COMMUNITY COMPUTER VISION COURSE SPEAKER AT PYDATA LONDON, EUROPYTHON, PYTHON PIZZA HAMBURG https://www.linkedin.com/in/johko/
MAKING A PUN ON THE PHRASE "TEXT IS ALL YOU NEED" BUT REPLACING "NEED" WITH "YOU". THAT'S A CLEVER PLAY ON WORDS!” LLAMA 3.1 LARGE LANGUAGE MODELS LARGE LANGUAGE MODELS LARGE LANGUAGE MODELS
IS PROVING PINK ELEPHANTS ARE REAL BY EMILY WHITGEN FROM 1972?” “TITLE: INVESTIGATING THE EXISTENCE OF PINK ELEPHANTS: A CRITICAL REVIEW OF EMILY WHITGEN'S ASSERTION (1972) .....” HALLUCINATIONS HALLUCINATIONS HALLUCINATIONS
THERE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THERE ARE NO CHICKENS IN THE IMAGE. ... THERE ARE NO CHICKENS IN THE IMAGE. ... THERE ARE NO CHICKENS IN THE IMAGE. ...
NO CHICKENS IN THE IMAGE. ... THERE ARE NO CHICKENS IN THE IMAGE. ... FALSE PREMISE FALSE PREMISE FALSE PREMISE HOW MANY CHICKENS ARE THERE? GPT-4O LLAVA-OV GEMINI 1.5 PRO
THE FASTEST? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE YOU UPLOADED SHOWS A RED CAR IN THE FOREGROUND AND A BLACK CAR IN THE BACKGROUND, BUT THERE DOESN'T APPEAR TO BE A GREEN CAR IN THE IMAGE. .... UNFORTUNATELY, I CANNOT TELL FROM THIS IMAGE WHICH CAR IS THE FASTEST. .... THE GREEN CAR MIGHT BE FAST, BUT IT'S ALSO POSSIBLE THAT ONE OF THE OTHER CARS HAS A MUCH MORE POWERFUL ENGINE. NO, THE GREEN CAR IS NOT THE FASTEST. THE RED CAR IS THE FASTEST, AS IT HAS A SIGNIFICANTLY LARGER AND MORE POWERFUL ENGINE, WHICH RESULTS IN BETTER ACCELERATION AND TOP SPEED.
THE FASTEST? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE YOU UPLOADED SHOWS A RED CAR IN THE FOREGROUND AND A BLACK CAR IN THE BACKGROUND, BUT THERE DOESN'T APPEAR TO BE A GREEN CAR IN THE IMAGE. .... UNFORTUNATELY, I CANNOT TELL FROM THIS IMAGE WHICH CAR IS THE FASTEST. .... THE GREEN CAR MIGHT BE FAST, BUT IT'S ALSO POSSIBLE THAT ONE OF THE OTHER CARS HAS A MUCH MORE POWERFUL ENGINE. NO, THE GREEN CAR IS NOT THE FASTEST. THE RED CAR IS THE FASTEST, AS IT HAS A SIGNIFICANTLY LARGER AND MORE POWERFUL ENGINE, WHICH RESULTS IN BETTER ACCELERATION AND TOP SPEED.
LLAVA-OV GEMINI 1.5 PRO YES, THE TUKTUK CAN GO ON DRIVING. THE IMAGE SHOWS AN OPTICAL ILLUSION CREATED BY 3D STREET ART, .... NO, THE TUKTUK SHOULD DEFINITELY NOT CONTINUE DRIVING! THE IMAGE SHOWS A 3D CHALK ART PIECE THAT CREATES THE ILLUSION OF A GIANT CHASM OPENING UP IN THE ROAD. IF THE TUKTUK CONTINUES, IT WOULD APPEAR TO DRIVE STRAIGHT INTO THE ABYSS! NO, THE TUKTUK CANNOT GO ON DRIVING.
LLAVA-OV GEMINI 1.5 PRO YES, THE TUKTUK CAN GO ON DRIVING. THE IMAGE SHOWS AN OPTICAL ILLUSION CREATED BY 3D STREET ART, WHERE THE ROAD APPEARS TO BE BROKEN OR CRACKED. .... NO, THE TUKTUK SHOULD DEFINITELY NOT CONTINUE DRIVING! THE IMAGE SHOWS A 3D CHALK ART PIECE THAT CREATES THE ILLUSION OF A GIANT CHASM OPENING UP IN THE ROAD. IF THE TUKTUK CONTINUES, IT WOULD APPEAR TO DRIVE STRAIGHT INTO THE ABYSS! NO, THE TUKTUK CANNOT GO ON DRIVING.
GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE IS AN OPTICAL ILLUSION THAT CAN BE INTERPRETED IN TWO WAYS: YOU MAY SEE A PERSON'S FACE IN PROFILE VIEW .... 1. ALTERNATIVELY, YOU CAN SEE A SAXOPHONE PLAYER 2. ... THIS IMAGE IS A CLASSIC EXAMPLE OF AN AMBIGUOUS FIGURE OR AN OPTICAL ILLUSION. WHAT PEOPLE TYPICALLY SEE: TWO FACING PROFILES .... A VASE ..... THE IMAGE SHOWS A BLACK SILHOUETTE OF A PERSON'S HEAD AND UPPER BODY AGAINST A WHITE BACKGROUND. ... THE IMAGE HAS A MINIMALIST AND ABSTRACT QUALITY TO IT.
GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE IS AN OPTICAL ILLUSION THAT CAN BE INTERPRETED IN TWO WAYS: YOU MAY SEE A PERSON'S FACE IN PROFILE VIEW .... 1. ALTERNATIVELY, YOU CAN SEE A SAXOPHONE PLAYER 2. ... THIS IMAGE IS A CLASSIC EXAMPLE OF AN AMBIGUOUS FIGURE OR AN OPTICAL ILLUSION. WHAT PEOPLE TYPICALLY SEE: TWO FACING PROFILES .... A VASE ..... THE IMAGE SHOWS A BLACK SILHOUETTE OF A PERSON'S HEAD AND UPPER BODY AGAINST A WHITE BACKGROUND. ... THE IMAGE HAS A MINIMALIST AND ABSTRACT QUALITY TO IT.
OWL'S BODY? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE MIDDLE OWL IN THE IMAGE HAS SIX DOTS ON ITS BODY. THE MIDDLE OWL HAS 5 DOTS ON ITS BODY. THE IMAGE SHOWS THREE OWLS, BUT THE ONE IN THE MIDDLE IS THE SMALLEST AND HAS ONLY ONE DOT ON ITS BODY. ...
OWL'S BODY? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE MIDDLE OWL IN THE IMAGE HAS SIX DOTS ON ITS BODY. THE MIDDLE OWL HAS 5 DOTS ON ITS BODY. THE IMAGE SHOWS THREE OWLS, BUT THE ONE IN THE MIDDLE IS THE SMALLEST AND HAS ONLY ONE DOT ON ITS BODY. ...
BLUE SUBSTANCE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE GLASS ON THE FAR RIGHT HAS THE LEAST AMOUNT OF BLUE SUBSTANCE. ... THE FIRST GLASS ON THE LEFT HAS THE LEAST AMOUNT OF BLUE SUBSTANCE. THE GLASS ON THE VERY LEFT HAS THE LEAST AMOUNT OF BLUE SUBSTANCE. ...
BLUE SUBSTANCE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE GLASS ON THE FAR RIGHT HAS THE LEAST AMOUNT OF BLUE SUBSTANCE. ... THE FIRST GLASS ON THE LEFT HAS THE LEAST AMOUNT OF BLUE SUBSTANCE. THE GLASS ON THE VERY LEFT HAS THE LEAST AMOUNT OF BLUE SUBSTANCE. ...
IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE SHOWS 26 LETTERS OF THE ENGLISH ALPHABET, ... ... THERE ARE A TOTAL OF 26 LETTERS IN THE IMAGE. THE IMAGE CONTAINS 26 LETTERS, ...
IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE SHOWS 26 LETTERS OF THE ENGLISH ALPHABET, ... ... THERE ARE A TOTAL OF 26 LETTERS IN THE IMAGE. THE IMAGE CONTAINS 26 LETTERS, ...
IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE CONTAINS 10 LETTERS: A, B, C, D, E, F, G, H, I, AND J. THERE ARE 10 LETTERS IN THE IMAGE: A, B, C, D, E, F, G, H, I, AND J. THE IMAGE CONTAINS 10 LETTERS, EACH REPRESENTING A LETTER OF THE ALPHABET.
IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE CONTAINS 10 LETTERS: A, B, C, D, E, F, G, H, I, AND J. THE IMAGE CONTAINS 10 LETTERS, EACH REPRESENTING A LETTER OF THE ALPHABET. THERE ARE 10 LETTERS IN THE IMAGE: A, B, C, D, E, F, G, H, I, AND J.
IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE CONTAINS SIX LETTERS SPELLING OUT THE WORD "WELCOME." ... COUNTING EACH LETTER, THERE ARE A TOTAL OF 7 LETTERS IN THE IMAGE. THERE ARE FIVE LETTERS IN THE IMAGE.
IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE CONTAINS SIX LETTERS SPELLING OUT THE WORD "WELCOME." THERE ARE FIVE LETTERS IN THE IMAGE. ... COUNTING EACH LETTER, THERE ARE A TOTAL OF 7 LETTERS IN THE IMAGE.
IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE CONTAINS SIX LETTERS SPELLING OUT THE WORD "WELCOME." .... SO, THERE ARE A TOTAL OF 7 LETTERS IN THE IMAGE. THERE ARE FIVE LETTERS IN THE IMAGE, SPELLED OUT IN A PLAYFUL AND COLORFUL MANNER.
IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THE IMAGE CONTAINS SIX LETTERS SPELLING OUT THE WORD "WELCOME." THERE ARE FIVE LETTERS IN THE IMAGE, SPELLED OUT IN A PLAYFUL AND COLORFUL MANNER. .... SO, THERE ARE A TOTAL OF 7 LETTERS IN THE IMAGE.
GPT-4O LLAVA-OV GEMINI 1.5 PRO THERE ARE NO REINDEER IN THIS IMAGE BECAUSE IT DEPICTS A NATIVITY SCENE, NOT A SCENE WITH SANTA CLAUS AND REINDEER. ... THE IMAGE YOU SENT DEPICTS A NATIVITY SCENE, WHICH PORTRAYS THE BIRTH OF JESUS CHRIST. REINDEER ARE ASSOCIATED WITH SANTA CLAUS AND CHRISTMAS GIFT- GIVING, NOT THE NATIVITY. THEREFORE, YOU WON'T FIND REINDEER IN A NATIVITY SCENE. THE REINDEER ARE NOT VISIBLE IN THE IMAGE.
LLAVA-OV GEMINI 1.5 PRO THIS IMAGE SERIES IS A PLAYFUL USE OF FORCED PERSPECTIVE, WHERE THE PERSON IN THE FOREGROUND USES THEIR FINGERS TO CREATE THE ILLUSION THAT THEY ARE PICKING UP AND THROWING A PERSON INTO THE WATER. ... THE IMAGE SERIES SHOWS A CLEVER USE OF FORCED PERSPECTIVE TO CREATE AN AMUSING SCENE. ... IN THIS IMAGE SERIES, A PERSON IS SHOWN JUMPING INTO THE WATER FROM A CLIFF, WITH THE BACKGROUND FEATURING A CLEAR BLUE SKY AND A FEW BOATS ON THE WATER. ...
LLAVA-OV GEMINI 1.5 PRO THIS IMAGE SERIES IS A PLAYFUL USE OF FORCED PERSPECTIVE, WHERE THE PERSON IN THE FOREGROUND USES THEIR FINGERS TO CREATE THE ILLUSION THAT THEY ARE PICKING UP AND THROWING A PERSON INTO THE WATER. ... THE IMAGE SERIES SHOWS A CLEVER USE OF FORCED PERSPECTIVE TO CREATE AN AMUSING SCENE. ... IN THIS IMAGE SERIES, A PERSON IS SHOWN JUMPING INTO THE WATER FROM A CLIFF, WITH THE BACKGROUND FEATURING A CLEAR BLUE SKY AND A FEW BOATS ON THE WATER. ...
IN THE IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THERE IS ONE SERVING SCOOP VISIBLE IN THE IMAGE, LOCATED IN THE TUB LABELED "MINT CHOCOLATE CHIP" ON THE BOTTOM LEFT SIDE. BASED ON THE IMAGE, THERE IS ONE SERVING SCOOP VISIBLE. IT'S LOCATED IN THE LEFTMOST CONTAINER, RESTING AGAINST THE SIDE. ... THERE ARE 12 SERVING SCOOPS IN THE IMAGE.
IN THE IMAGE? GPT-4O LLAVA-OV GEMINI 1.5 PRO THERE IS ONE SERVING SCOOP VISIBLE IN THE IMAGE, LOCATED IN THE TUB LABELED "MINT CHOCOLATE CHIP" ON THE BOTTOM LEFT SIDE. BASED ON THE IMAGE, THERE IS ONE SERVING SCOOP VISIBLE. IT'S LOCATED IN THE LEFTMOST CONTAINER, RESTING AGAINST THE SIDE. ... THERE ARE 12 SERVING SCOOPS IN THE IMAGE.