
Generative AI has captured the public imagination with a leap into creating elaborate, plausibly real text and imagery out of verbal prompts. But the catch — and there is often a catch — is that the results are often far from perfect when you look a little closer.
People point out strange fingers, floor tiles slip away and math problems are precisely that: problematically, sometimes they don’t add up.
Now, Synthesia — one of the ambitious AI startups working in video, specifically custom avatars designed for business users to create promotional, training and other enterprise video content — is releasing an update that it hopes will help it leapfrog over some of the challenges in its particular field. Its latest version features avatars — built based on actual humans captured in their studio — which provide more emotion, better lip tracking and what it says are more expressive natural and human movements when they are fed text to generate videos.
The release comes on the heels of some impressive progress for the company to date. Unlike other generative AI players like OpenAI, which has built a two-pronged strategy — raising huge public awareness with consumer tools like ChatGPT while also building out a B2B offering, with its APIs used by independent developers as well as giant enterprises — Synthesia is leaning into the approach that some other prominent AI startups are taking.
Similar to Perplexity’s focus on really nailing generative AI search, Synthesia is focused on really nailing how to build the most humanlike generative video avatars possible. More specifically, it is looking to do this only for the business market and use cases like training and marketing.
That focus has helped Synthesia stand out in what has become a very crowded AI market that runs the risk of getting commoditized when hype settles down into more long-term concerns like ARR, unit economics and operational costs attached to AI implementations.
Synthesia describes its new Expressive Avatars, the version being released Thursday, as a first of their kind: “The world’s first avatars fully generated with AI.” Built on large, pretrained models, Synthesia says its breakthrough has been in how they are combined to achieve multimodal distributions that more closely mimic how actual humans speak.
Boston, MA July 15
These are generated on the fly, Synthesia says, which is meant to be closer to the experience we go through when we speak or react in life. This stands in contrast to how a lot of AI video tools based around avatars work today: Typically these are actually many pieces of video that get quickly stitched together to create facial responses that line up, more or less, with the scripts that are fed into them. The aim is to appear less robotic and more lifelike.
Previous version:
-
Etiketler: