OpenAI announces major update to ChatGPT enabling it to now analyze and react to images, as well as maintain voice conversations with users.
OpenAI, the startup behind the increasingly popular chatbot, announced on Monday that it would be rolling out new features including the ability to let users engage in voice conversation with ChatGPT.
Learn the benefits of becoming a Valuetainment Member and subscribe today!
ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). https://t.co/uNZjgbR5Bm pic.twitter.com/paG0hMshXb
— OpenAI (@OpenAI) September 25, 2023
“We are beginning to roll out new voice and image capabilities in ChatGPT,” OpenAI said on their website. “They offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about.”
OpenAI enabled its GPT-3.5 and GPT-4 AI models which on its mobile app, will add speech synthesis option that will enable full verbal conversations with the AI assistant. The new voice features resemble Amazon’s Alexa or Apple’s Siri voice assistants.
The company further mentioned that this new feature can be used to “request a bedtime story for your family or settle a dinner table debate.”
During a demo of the new update reported by CNN Business, a user asks ChatGPT to come up with a story about a “sunflower hedgehog named Larry.” The chatbot was able to narrate a story out loud with a human-sounding voice.
OpenAI says it is collaborating with some other organizations, allowing them to develop synthetic voices. One example is its partnership with Spotify, where a tool helps translate podcaster’s voices into other languages is being developed.
This has been possible to carry out for at least 1 year i.e. the voice interaction aspect. Including the Image recognition and detection alongside object and character rec- and detection. It all depends on the setup of the environment. Much easier of course to carry out using the OpenAi API, but possible to do using the web-version. Additionally, the use of third-party URLs that are public allowed for analyzing several formats – (not audio, or lengthy audio as the data is/was too big for the virutal environment to processes, potential cap of resources added, but if chunked then it would have also worked).