/IMAGINE
Posts
OpenAI Breaks The Internet: AI Generated Videos Changed Forever 🤯

OpenAI Breaks The Internet: AI Generated Videos Changed Forever 🤯

And → Gemini 1.5 Announcement, Meta Release, and more!

Jake Baumann
February 16, 2024

Welcome back, my AI-obsessed friends.

Shake off the sleep, and have your coffee—it’s Friday. The AI world got even crazier today! This is the latest issue of /Imagine, your go-to source for the freshest news and developments in AI. Here’s EVERYTHING you need to know. Let’s dive in…

🎥 OpenAI Just Teased Sora: Largest Leap Forward For AI Video Yet

The Bytes:

✣ OpenAI has dropped an absolute bomb on our heads and launched Sora, a groundbreaking AI model capable of generating high-definition videos from text prompts. Sora stands out for its ability to create realistic and imaginative scenes, far surpassing anything we’ve seen on the market to date.

✣ With Sora, users can generate videos up to a minute long that maintain visual quality and adhere closely to the user’s instructions. From bustling city streets to serene natural landscapes, Sora can bring any scenario to life with stunning detail. This blows the Runways/Pikas out of the water with their four-second generations.

✣ Sora empowers users to craft complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. Sam Altman had some fun on X today creating user requests on demand.

✣ OpenAI is sharing Sora with a small group of safety testers and handpicked creators to ensure the model’s reliability and ethical use. While there are no immediate plans for a public release, it’s safe to assume it will be in our hands soon.

✣ Sora represents a significant leap in content creation, offering a glimpse into a future where AI assists in producing cinematic-quality videos with ease.

Why You Should Care: It’s hard to describe this as anything other than remarkable. Sora’s text-to-video generation opens up a world of possibilities for creators, artists, educators, and businesses alike. It’s a tool that could redefine the landscape of visual media and storytelling. If you have 12 minutes, I suggest watching Marques’s video on it I shared above. Buckle up, my friends. Things only get weirder from here.

🤖 Google’s Gemini 1.5: A Milestone in AI with 1 Million Token Context

Image Credit: Google

The Bytes:

✣ Google has introduced Gemini 1.5, the latest iteration of its AI model, building on the capabilities of the previous version, Gemini 1.0 Ultra. This next-generation model boasts a remarkable ability to understand and process up to 1 million freaking tokens, offering the longest context window for a model yet.

✣ Gemini 1.5 utilizes a new Mixture-of-Experts (MoE) architecture, enhancing its efficiency. This allows the model to route requests to specialized “expert” neural networks, making it a speed demon of quality responses.

✣ The 1 million token context window is an absolute game-changer, enabling the model to process vast amounts of information, such as entire PDFs, books, code repositories, or lengthy videos in a single prompt. It is currently available in private preview for AI Studio and Vertex AI users.

Why You Should Care: The release of Gemini 1.5 represents a significant leap forward in AI technology, with its extended context window paving the way for new capabilities and applications. Google is picking up the pace. I can’t wait to get my hands on this.

🌐 Meta AI’s V-JEPA: Teaching Machines to Grasp the Physical World

Researchers at Meta recently shared MAGNeT, a single non-autoregressive transformer model for text-to-music & text-to-sound generation capable of generating audio on-par with the quality of SOTA models — at 7x the speed.
MAGNeT is open source as part of AudioCraft. Hear audio… twitter.com/i/web/status/1…
— AI at Meta (@AIatMeta)
5:53 PM • Feb 14, 2024

The Bytes:

✣ Meta AI has released V-JEPA (Video Joint Embedding Predictive Architecture), a pioneering AI model that learns to understand and model the physical world by observing videos. The march towards advanced machine intelligence continues.

✣ V-JEPA’s approach mirrors human learning, where understanding is gleaned through observation rather than explicit instruction.

✣ The model has reached competitive performance levels on standard benchmarks, showcasing its ability to understand complex interactions and temporal scene evolution.

✣ In line with their open source practices, Meta AI has released V-JEPA under a Creative Commons NonCommercial license, allowing anyone to dive in and build further.

Why You Should Care: V-JEPA’s innovative approach to video comprehension could make AI more human-like and might transform how machines understand and interact with our world.

AI Puppies From Sora 🐶

Yann LeCun on AI-Language Models 🌐

Another Crazy Sora Generation🔮

/Imagination of The Day: Let's finish the day with something creative. Today, I’m sharing more Star Wars generations and the prompt formula. Try it out, tweet at me what you create with this, or reply to this newsletter. Check it out:

Created by Jake Baumann with Midjourney

Image Created by Jake Baumann with Midjourney

Prompt Share: CGI Star Wars character + they embody the personality and identity of [state] + background terrain of [state] + [state identifier, ex. landmark or what they are known for] + 8k, surreal clarity, powerful --style raw --stylize 750 --v 6

Join Us: Don't forget to follow me on Twitter/X at @jake_joseph and the brand on its fresh new page @slashimagineai. Let’s be friends.

Share The Insights: Know someone else who's obsessed with AI? Forward this newsletter and let's expand the community.

Feedback

How did we do with today's newsletter?

Share your vote to help us deliver the #1 resource on AI for our community.

Your perspectives help shape the content and hopefully help me make the newsletter suck less every day. Share your feedback on this issue or what you'd like to see next by directly replying to this email or by reacting to the poll above.