Google’s Will Smith Double and the Crunchy AI Spaghetti

On Tuesday, Google launched Veo 3, a new AI video synthesis model that introduces a breakthrough in AI video generation: the synchronization of audio tracks with video. Between 2022 and 2024, early AI video generators could create silent, short-duration clips. However, Veo 3 can now deliver eight-second high-definition video clips complete with voices, dialogue, and sound effects.

In light of this advancement, many are curious about Veo 3's ability to simulate Oscar-winning actor Will Smith consuming spaghetti. This interest stems from March 2023, when an early example using an open-source model called ModelScope went viral for its comedic imperfections. The portrayal of "AI Will Smith" eating spaghetti became iconic, even leading to a parody by Smith himself in February 2024.

Notably, the ModelScope's rendition wasn't the most sophisticated execution available at the time. Another AI video generator, Gen-2 from Runway, had achieved better results but wasn’t as famous due to lack of public access. The ModelScope video lived on as humorous reference material for AI video progression.

This week, AI app developer Javi Lopez tested Veo 3 with the now-famous "Smith eating spaghetti" scenario, sharing his results online. Interestingly, the video’s soundtrack featured a distinctive crunching sound during the imagined meal. This odd audio anomaly is attributed to Veo 3's algorithmic reliance on training data that disproportionately included crunching sound effects associated with chewing.

In a personal test of Veo 3, using the prompt "A black man eating spaghetti," the system produced a similar crunchy sound, showcasing the experimental nature of Veo 3’s audio capabilities. This might be due to content filters blocking the likeness of Will Smith. Nonetheless, the results mirrored Lopez’s findings, highlighting the sound synthesis challenges.

Veo 3's potential doesn't end there—the model can generate coherent dialogues and music. Examples on social media demonstrate its ability to create lifelike scenarios, such as a man singing a comedic opera about eating spaghetti. This leap marks a notable progression from 2023 and suggests AI video technology will continue evolving, touching cultural and ethical fronts as it advances.

Interestingly, while current celebrity likeness filters limit certain possibilities in Veo 3, the model represents a step toward a future where virtually any scene or scenario could be fabricated with unprecedented realism. This progression brings to mind discussions about the "cultural singularity," where AI blurs the lines of media reality.

In conclusion, as AI video models become increasingly sophisticated, the balance between technological possibilities and ethical considerations remains crucial. Veo 3 exemplifies both the remarkable growth of AI media technology and the humorous challenges that still persist. Be on the lookout for more detailed examinations of AI's trajectory in future explorations. Bon appétit!