ad
ad

The Voice AI Nobody Expected (AI News You Can Use)

Science & Technology


Introduction

Another wild week in the world of AI, and my team and I have been tracking all the releases and testing some of the most significant ones for you. So, just like every single Friday, let's have a look at every single piece of AI news that you can actually use. This means all of these applications are usable or there's some sort of action that is the filter we apply here.

Moshi AI: An Open-Source Voice Assistant

Starting with the very first one: someone has done the unthinkable. A brand new company out of France named Cute AI Labs has unveiled the open-source Moshi AI. What is the Moshi AI? It is a web interface that is super low latency, and you can talk to it. Now, it tries to be the voice assistant everyone was so excited about, but it’s not quite there yet. Their base model has 7 billion parameters compared to state-of-the-art models like GPT-4 or more modern models which likely have around 400 billion parameters.

Nevertheless, the most interesting thing about Moshi AI is they plan to open-source the code for it. So, expect to see this integrated into various applications soon.

Testing Moshi AI

I tried out Moshi AI, and while the latency is indeed low, the voice assistant features didn't perform as well as advertised. The emotional awareness and tone modification were mostly absent. Occasionally, it could detect emotions, but it often interpreted shouting as happiness.

Despite these shortcomings, it’s open-source nature holds promise for future improvements. Check it out while it is free, and see for yourself how it performs. Expect better iterations down the line as the base model gets more advanced.

Gen-1 Generator: New State-of-the-Art Video Generator

Next up, let's talk about the biggest release of this week: Gen-1 Generator. This state-of-the-art video generator has become widely available. Although practical applications are still limited, I found one worth mentioning: Motorola used AI video tools in their ad campaign.

Testing the Gen-1 Generator

Intrigued, I tested it myself. The video quality is impressive, but the models are only as good as their training data. Generating something specific, like an otter surfing a wave, proved challenging and costly. Each 10-second clip costs around $ 1 to generate, and achieving a satisfactory result requires multiple iterations.

However, simpler prompts, such as a lighthouse view from a drone, worked beautifully, suggesting the model excels with more common visual elements.

Eleven Labs Reader App & Other New Releases

11 Labs, known for their AI voices, released a new app called 11 Labs Reader. Available for iOS in the US, it can read out any text with some of the best AI voices around. They also introduced "iconic voices" like James Dean and Burt Reynolds that can read your text. Additionally, they now offer a tool to isolate voices in noisy audio.

Similarly, startup Suna released a mobile app for generating AI music on the go, also currently limited to iOS and US users with more features to come.

Lumalabs Keyframes & Motorola AI Ad Campaign

Lumalabs released a new feature called Luma Keyframes, which allows creating smooth transitions in AI video. While initial impressions suggest varying levels of effectiveness, the potential is certainly there.

Motorola also showcased a real-world use of AI in their ad campaign, creating visually interesting videos using ControlNet and Stable Diffusion technologies.

Perplexity AI's New Pro Search Feature

Perplexity AI introduced a new pro search feature which includes multi-step reasoning and integrations with math, programming, and Wolfram Alpha. More advanced functionalities for subscribers are making AI searches even smarter and more intuitive.

Fun AI Applications

Not everything in AI has to be about productivity gains. For example, the "Interdimensional Cable" site from WebSim AI creates random, often hilarious videos akin to the animated show Rick and Morty’s multiverse TV. It’s fun, unpredictable, and showcases the lighter side of AI.

Open-Source & Uncensored Models

There's also growing interest in fully uncensored models. Dolphin Vision 72B, although requiring robust computing power, exemplifies the potential and ethics involved when open-source communities ramp up their efforts.

Figma announced multiple new AI features, with the most notable being the "prompt to UI" feature, which was later disabled for resembling Apple's weather app interface too closely. Another impressive feature is the visual search enabled by multimodal models, allowing users to search for visuals via natural language.

Google Crossword Game

Google introduced an AI-powered crossword game, which uses simple yes/no responses to guide users through solving puzzles. It’s a fun and simple application of AI.

Hugging Face's New Leaderboard

Hugging Face revamped their leaderboard to include new benchmarks and community features for more reliable model evaluation. This includes MLU, MSLP, and others, offering tools for better AI landscape navigation.


Keywords

  • Moshi AI
  • Open-source
  • Voice Assistant
  • Gen-1 Generator
  • Eleven Labs Reader
  • AI Music
  • Luma Keyframes
  • Motorola Ad Campaign
  • Perplexity AI
  • Fun AI Applications
  • Dolphin Vision 72B
  • Figma AI
  • Visual Search
  • Google Crossword Game
  • Hugging Face Leaderboard

FAQ

What is Moshi AI?

Moshi AI is an open-source voice assistant developed by Cute AI Labs. It features low latency and aims to have emotional awareness and tone modification.

How does the Gen-1 Generator perform?

Gen-1 Generator is excellent at creating high-quality video content, though it requires substantial iterations, especially for unique or uncommon prompts.

What new features has Eleven Labs released?

Eleven Labs released the Eleven Labs Reader app which can read any text with their advanced AI voices. They also launched "iconic voices" like James Dean and Burt Reynolds, and a new voice isolation tool.

What is Luma Keyframes and how does it work?

Luma Keyframes is a feature that allows for smooth transitions between scenes in AI video. However, effectiveness can vary based on the complexity of the transition.

How is AI being used in advertising?

Motorola utilized AI video tools to create specific visuals for their ad campaign, showcasing creative uses of AI in real-world marketing.

What is Pro Search by Perplexity AI?

Pro Search is a premium feature by Perplexity AI that offers multi-step reasoning and integrations with math, programming, and Wolfram Alpha to enhance search functionalities.

How does the Google Crossword Game use AI?

The game provides yes/no responses to aid users in solving crossword puzzles, making it a simple yet effective application of AI.

What improvements have been made to the Hugging Face leaderboard?

Hugging Face introduced new benchmarks and normalizations for their leaderboard, improving the reliability and reproducibility of AI model evaluations.