China launches Video O1... for AI-powered video editing.
Arabian Sea Newspaper - Special
The Chinese AI company "Kling" has launched the "Video O1" model, saying it is "the world's first unified multi-modal video model," combining video production and editing capabilities within a single platform without the need for separate tools. The model allows for the generation of short clips ranging from 3 to 10 seconds based on text prompts or reference images, and also allows for editing original footage such as swapping characters, changing the weather, and modifying the visual style and colors, all through a single command. It also allows for adding new elements, changing the background, and modifying the artistic style simultaneously within the same context. The "Video O1" model is based on simultaneous processing of multiple types of inputs, up to seven elements including images, videos, and texts. The user can edit the video via simple commands, such as "remove passersby" or "turn daylight into night," without the need for any manual work. The model allows uploading characters, tools, or scenes for use in different contexts, with the ability to maintain the consistency of elements across multiple shots. According to the company, Kling AI conducted internal comparisons between Video O1 and both Google's Veo 3.1 and Runway Aleph models. The results showed a clear superiority in video creation tasks based on a reference image, surpassing the performance of Google's "Ingredients to Video" feature. Also according to the company, evaluators preferred Video O1 over Runway Aleph in video transformation tasks by a rate of 230%. The O1 model is currently available through the Kling platform via the web, while its launch comes in a market witnessing fierce competition between AI companies.