loading…
Sora can raise the bar for the benefits of generative AI in the creative world. (Photo: Gizchina)
JAKARTA – OpenAI is increasingly innovative with the ability to convert text to very realistic video. This new feature, called Sora, can raise the bar for the benefits of generative AI in the creative world.
According to CNet, Saturday (17/2/2024), like Google's kevvideo text converter, Lumiere, Sora's availability is limited. However, unlike Lumiere, Sora can produce videos up to 1 minute long.
The feature of converting text to video is now the latest attraction in the generative AI development of OpenAI, Google, Microsoft and others. It is believed that the trend in text and image generation will stabilize the revenue position of the three providers to reach 1.3 trillion US dollars or approximately Rp. 16 trillion in 2032.
An official post from OpenAI explains that Sora will be useful for the visual creator, designer and filmmaker segments. Its testing is considered crucial to addressing the potential for deepfakes in the use of AI to create images and videos.
In addition to getting feedback from outside the organization, the AI startup wants to share its capabilities for further AI exploration.
One thing that might differentiate Sora lies in its ability to interpret long prompts of up to 135 words. Sample videos shared by OpenAI last week show that Sora can create a variety of characters and scenes, from people and animals, furry monsters, to cityscapes, landscapes, zen gardens and even New York City submerged underwater. .
This is thanks in part to OpenAI's previous work with its Dall-E and GPT models. The Dall-E 3 text-to-image generator was released in September 2023 as a big step up from Dall-E 2 from 2022, in addition to OpenAI's latest AI model, GPT-4 Turbo which was released last November.
In particular, Sora adopts the Dall-E 3 recapsulation technique, which OpenAI claims produces highly descriptive descriptions of visual training data.
“Sora is able to produce complex scenes with multiple characters, specific types of movement, and accurate details of subjects and backgrounds,” the post states. “This model understands not only what the user is asking for in the prompt, but also how those things exist in the physical world,” OpenAI wrote.