Business Thoughts: 12/08/23

Unveiling the Deception: Google's Astonishing Gemini Demo Fabrication

Google's latest Gemini AI model made its highly anticipated debut yesterday, sparking a mixed reception among users. However, the company's technology and integrity now face scrutiny as it has come to light that the most impressive demo showcasing Gemini was, in fact, largely fabricated.

A captivating video entitled "Hands-on with Gemini: Interacting with multimodal AI" rapidly gained one million views within a day, and its popularity is understandable. The demo showcased a range of interactions with Gemini, highlighting the model's ability to combine language and visual understanding in a flexible and responsive manner.

The video begins by showcasing the evolution of a simple squiggle into a detailed drawing of a duck, albeit in an unrealistic color. It then feigns surprise ("What the quack!") upon encountering a toy blue duck. The demo proceeds to demonstrate Gemini's capabilities by responding to various voice queries about the toy. Furthermore, it flaunts other impressive feats such as tracking a ball in a cup-switching game, recognizing gestures in shadow puppetry, and rearranging sketches of planets, among others.

The responsiveness of Gemini is particularly striking, although the video does admit that certain aspects have been altered for brevity and to minimize latency. Thus, occasional hesitations and excessively long answers have been edited out. Overall, this demonstration served as a truly awe-inspiring display of Gemini's prowess in the realm of multimodal understanding. Personally, my skepticism about Google's ability to deliver a competitive AI model wavered after witnessing this hands-on presentation.

While it is true that Gemini appears to generate the responses showcased in the video, there is a significant discrepancy between the actual speed, accuracy, and mode of interaction with the model, leaving viewers misled.

For example, at 2:45 in the video, a hand silently performs a series of gestures, and Gemini promptly responds, "I know what you're doing! You're playing Rock, Paper, Scissors!" However, the documentation clearly states that the model does not reason based on individual gestures. In reality, all three gestures must be shown simultaneously, accompanied by the prompt, "What do you think I'm doing? Hint: it's a game." Only then does Gemini respond with, "You're playing rock, paper, scissors."

These interactions do not feel equivalent; they seem fundamentally different. One is an intuitive evaluation that captures an abstract idea effortlessly, while the other is a contrived and heavily guided interaction that reveals as many limitations as capabilities. Gemini demonstrated the latter, not the former. The "interaction" depicted in the video did not actually occur.

Furthermore, when three sticky notes with doodles of the Sun, Saturn, and Earth are placed on the surface, Gemini is asked in the video, "Is this the correct order?" It promptly responds with the correct answer. However, the genuine written prompt asks, "Is this the right order? Consider the distance from the sun and explain your reasoning." Did Gemini genuinely get it right, or did it require assistance to produce an answer suitable for the video? Did it even recognize the planets, or did it need help in that aspect as well?

Similarly, in the video, a ball of paper is swapped under a cup, and Gemini seemingly and instantly detects and tracks the movement. However, in the accompanying post, not only does the activity have to be explained, but the model also needs to be trained (albeit quickly and using natural language) to perform it. The examples go on.

These instances may appear trivial at first glance. After all, the ability to recognize hand gestures as a game with such speed is genuinely impressive for a multimodal model. The same goes for making a judgment call on whether an incomplete picture depicts a duck or not. However, now, with the absence of an explanation for the duck sequence in the blog post, doubts arise regarding the authenticity of that interaction as well.

Had the video explicitly stated, "This is a stylized representation of interactions our researchers tested," few would have questioned it—we often expect such videos to blend fact and aspiration. However, the video is titled "Hands-on with Gemini," and when it claims to present "our favorite interactions," it implies that the interactions depicted are authentic. They were not. Some were embellished, others entirely different, and some even appeared nonexistent. Furthermore, the video fails to specify which model it represents—the currently available Gemini Pro or the upcoming Ultra version scheduled for release next year.

Should we have assumed that Google was merely providing us with a glimpse of what to expect when they described it in the manner they did? Perhaps we should now assume that all capabilities showcased in Google AI demos are exaggerated for dramatic effect. In the headline, I assert that this video was "faked." Initially, I questioned whether such strong language was warranted, and Google's spokesperson requested a change. However, despite containing genuine elements, the video simply does not reflect reality. It is, indeed, fake.

Google asserts that the video "shows real outputs from Gemini," which is technically true. However, the claim that they made only a few edits to the demo while being transparent about it is misleading. This was not a genuine demo, and the interactions shown in the video were significantly different from those created to inform it.

Business Thoughts

Friday, December 8, 2023

Unveiling the Deception: Google's Astonishing Gemini Demo Fabrication

Tesla Slashes Model Y, S, X Prices Ahead of Earnings Announcement

Report Abuse

Labels