Bits & Neurons
Posts
ChatGPT can now see, hear, and speak

ChatGPT can now see, hear, and speak

Ts Yotov
September 25, 2023

Today's issue is brought to you by NightCafe - the AI art studio with a twist!

In today’s edition:

🗞️ News

📰 Amazon to invest up to $4 billion in AI startup Anthropic

📰 ChatGPT can now see, hear, and speak

📚 Nerd section - Meet BlindChat: Operating solely within the browser and more

💡 Small bits - Getty Changes Tune on AI, Spotify to clone voices, and more

🎨 BONUS - Goodbye Summer Prompt

📰 Amazon to invest up to $4 billion in AI startup Anthropic

Amazon is investing in the AI startup Anthropic, with an initial $1.25 billion for a minority stake and a potential total of $4 billion, bolstering its AI presence against rivals.

Anthropic operates an AI-powered chatbot, similar to Google's Bard and Microsoft-backed OpenAI.
Anthropic aims to raise up to $5 billion in two years and recently introduced its premium subscription for chatbot Claude 2.
They're developing a more advanced AI model named "Claude-Next" which is projected to be 10 times more powerful than current leading AIs.
As part of the deal, Anthropic will primarily use Amazon’s AWS for crucial tasks, including safety research and model development.
Amazon's CEO, Andy Jassy, emphasizes the collaboration's potential to enhance various customer experiences.

Why it Matters: The partnership solidifies Amazon's position in the competitive AI landscape, marking a strategic alliance with an emerging player. It reflects the e-commerce giant's ambitions in the rapidly evolving AI sector, potentially reshaping industry dynamics.

📰 ChatGPT can now see, hear, and speak

OIpenAI introduced ChatGPT’s new voice and image capabilities, letting users converse using voice and share images to discuss topics more interactively.

ChatGPT now supports voice interactions and can process and respond to images.
Users can snap pictures (e.g., landmarks or items in their pantry) and discuss them with ChatGPT.
Voice feature is available on iOS and Android, while image feature is accessible across all platforms.
The voice feature uses a new text-to-speech model, with voice samples crafted in collaboration with professional actors.
Image processing is driven by multimodal GPT-3.5 and GPT-4, which can interpret and discuss a range of images.

Why it Matters: This leap signifies ChatGPT's evolution into multi-modal LLMs, eclipsing competitors like Google Geminig in versatility. The enhancements not only improve user interactions but also mark a major stride in AI capabilities and applications.

📚 Nerd section 📚

🪟 Microsoft introduces KOSMOS-2.5, a multimodal model adept at reading text-rich images, combining visual and textual data within one Transformer-based system. While it excels in tasks like document-level text recognition and converting images to markdown text, there are limitations in controlling document elements' spatial positions.

🔒 Meet BlindChat by MithrilSecurity - an open-source conversational AI offering complete user privacy by operating solely within the browser, eliminating third-party data access risks. Prioritizing user data security, BlindChat performs local inference and uses secure enclaves, ensuring user data remains confidential. It offers two privacy settings: on-device, where inference is handled locally, and Zero-trust AI APIs, where data is sent to a secure enclave for remote inference.

📖 Researchers from China Introduce A Large-Scale, Real-World Multi-View Dataset Named ‘FreeMan’ - to enhance 3D human pose estimation. Addressing the shortcomings of current datasets, which are collected in controlled environments, FreeMan boasts 11 million frames from 8,000 sequences across various real-world scenarios.

💡 Small Bits 💡

💵 Novo Clinches AI-Driven Drug Deal Worth as Much as $2.7 Billion - agreement with Valo will harness its AI platform for research.

📸 Getty Changes Tune on AI, Reveals Art Generator Trained on Its Own Images - using Nvidia Picasso’s platform, offering users copyright-safe content and a diverse range of editing options.

🎙️ Spotify is going to clone podcasters’ voices - and translate them to other languages. A partnership with OpenAI will let podcasters replicate their voices to automatically create foreign-language versions of their shows.

👩‍🏫 ‘AI has killed the industry’ : Frederik Pedersen, co-founder of EasyTranslate, believes traditional translation is obsolete due to AI advancements, and is steering his company towards generative AI content creation, having previously pioneered in translation software.

🤖 Meta could announce AI chatbots for young people on Instagram and Facebook soon - targeting younger users with multiple engaging personas, including a "sassy robot" and celebrity chatbots

💁🏽 Swedish gaming company replaces half its staff with AI - Mindark, a gaming company based in Gothenburg, will cut up to 25 jobs.

🦾 Tesla Optimus can now sort objects autonomously - looking very fluid and natural with dexterity and movements.

🎨 BONUS - Goodbye Summer Prompt 🎨

It’s time to say goodbye to summer.

Prompt: minimalistic photograph of a straw hat on sand, contrasting with the ocean in the background --ar 4:3

…better than a real picture 🙂

Thank you for reading today’s edition!

We love to hear back from you!

Feel free to reply to our emails with questions, suggestions, or topics you'd like to see covered, or drop us a message on Twitter or Facebook.

Until tomorrow,
- Tsvetelin (Bits and Neurons)Amazon Brings Generative AI To Alexa And Fire TV