VentureBeat presents: AI Unleashed – An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More
Reka, the AI startup founded by researchers from DeepMind, Google, Baidu and Meta, has announced Yasa-1, a multimodal AI assistant that goes beyond text to understand images, short videos and audio snippets.
Available in private preview, Yasa-1 can be customized on private datasets of any modality, allowing enterprises to build new experiences for a myriad of use cases. The assistant supports 20 different languages and also brings the ability to provide answers with context from the internet, process long context documents and execute code.
It comes as the direct competitor of OpenAI’s ChatGPT, which recently got its own multimodal upgrade with support for visual and audio prompts.
“I’m proud of what the team has achieved, going from an empty canvas to an actual full-fledged product in under 6 months,” Yi Tay, the chief scientist and co-founder of the company, wrote on X (formerly Twitter).
An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.
This, Reka said, included everything, right from pretraining the base models and aligning for multimodality to optimizing the training and serving infrastructure and setting up an internal evaluation framework.
However, the company also emphasized that the assistant is still very new and has some limitations – which will be ironed out over the coming months.
Yasa-1 and its multimodal capabilities
Available via APIs and as docker containers for on-premise or VPC deployment, Yasa-1 leverages a single unified model trained by Reka to deliver multimodal understanding, where it understands not only words and phrases but also images, audio and short video clips.
This capability allows users to combine traditional text-based prompts with multimedia files to get more specific answers.
For instance, Yasa-1 can be prompted with the image of a product to generate a social media post promoting it, or it could be used to detect a particular sound or the source that made it, whether it was an instrument, a machine, or an organism.
Reka says the assistant can even tell what’s going on in a video, complete with the topics being discussed, and predict what the subject may do next. This kind of comprehension can come in handy for video analytics but it seems there are still some kinks in the technology.
“For multimodal tasks, Yasa excels at providing high-level descriptions of images, videos, or audio content,” the company wrote in a blog post. “However, without further customization, its ability to discern intricate details in multimodal media is limited. For the current version, we recommend audio or video clips be no longer than one minute for the best experience.”
It also said that the model, like most LLMs out there, can hallucinate and should not be solely relied upon for critical advice.
Beyond multimodality, Yasa-1 also brings additional features such as support for 20 different languages, long context document processing and the ability to actively execute code (exclusive to on-premise deployments) to perform arithmetic operations, analyze spreadsheets or create visualizations for specific data points.
“The latter is enabled via a simple flag. When active, Yasa automatically identifies the code block within its response, executes the code, and appends the result at the end of the block,” the company wrote.
Moreover, users will also get the option to have the latest content from the web incorporated into Yasa-1’s answers. This will be done through another flag, which will connect the assistant to various commercial search engines in real-time, allowing it to use up-to-date information without any cut-off date restriction.
Notably, ChatGPT was also recently been updated with the same capability using a new foundation model, GPT-4V. However, for Yasa-1, Reka notes that there’s no guarantee that the assistant will fetch the most relevant documents as citations for a particular query.
In the coming weeks, Reka plans to give more enterprises access to Yasa-1 and work towards improving the capabilities of the assistant while ironing out its limitations.
“We are proud to have one of the best models in its compute class, but we are only getting started. Yasa is a generative agent with multimodal capabilities. It is a first step towards our long-term mission to build a future where superintelligent AI is a force for good, working alongside humans to solve our major challenges,” the company noted.
While having a core team with researchers from companies like Meta and Google can give Reka an advantage, it is important to note that the company is still very new in the AI race. It came out of stealth just three months ago with $58 million in funding from DST Global Partners, Radical Ventures and multiple other angels and is competing against deep-pocketed players, including Microsoft-backed OpenAI and Amazon-backed Anthropic.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.