NVIDIA’s new text-to-speech model to make AI voices more realistic
Category: #technology  By Pranali Mehta  Date: 2021-09-02
  • share
  • Twitter
  • Facebook
  • LinkedIn
NVIDIA’s new text-to-speech model to make AI voices more realistic

American multinational technology company, Nvidia Corporation has introduced a new tool that can capture natural speech qualities by allowing users to train the AI solution using their own voice.

According to sources, the company’s text-to-speech research team has built a new model called ‘RAD-TTS’ that allows an individual to train a text-to-speech system using their voice, covering details like pacing, timbre, tonality, and others.

A key feature of RAD-TTS is voice-conversion which allows the user to present words of one speaker using another person’s voice. This interface allows greater control over the energy, duration, and pitch of a synthesized voice.

By providing this latest technology, researchers at NVIDIA have developed a more conversational-noise narration for its ‘I AM AI’ video series that uses synthesized instead of human voices.

According to a statement by NVIDIA, through the interface, the firm’s video producers will be able to record themselves while reading the video script and then use the AI system to convert a male speech into a female narrator’s voice.

By using the baseline narration, the producer can direct the AI similar to a voice actor- slightly adjusting the synthesized voice to emphasize certain words, and enhance the pace of the narration to prominently express the video’s tone.

According to NVIDIA, most of the models are trained using tens and thousands of hours of audio data present on its DGX systems. Additionally, developers will be allowed to set any model according to their need, accelerate training using mixed-precision computing on the company’s Tensor Core GPUs.

Citing sources, NVIDIA is dispensing some of its research to run efficiently on its GPUs to anyone who wishes to try it through open-source via the NeMo Python toolkit built for GPU-led conversational AI, present on the tech giant’s NGC pool of software and containers.

Source Credit: https://techcrunch.com/2021/08/31/nvidias-latest-tech-makes-ai-voices-more-expressive-and-realistic/

 

 

  • share
  • Twitter
  • Facebook
  • LinkedIn

About Author

Pranali Mehta

Pranali Mehta    

Pranali Mehta boasts of over three years of experience as a content writer. Having completed her graduation in chemical engineering, she worked as safety & environment associate in a chemical company for a year. Harnessing her passion for writing however, Pranali deci...

Read More >>

More News By Pranali Mehta

R&D projects worth USD 5.5 Mn approved under India-Israel tech fund
R&D projects worth USD 5.5 Mn approved under India-Israel tech fund
By Pranali Mehta

As a part of discussions on expanding the reach of India Israel Industrial R&D along with Technological Innovation Fund, the two nations have approved three joint R&D projects costing USD 5.5 million and discussed methods to develop a bigger ...

Amazon’s technology centers in Texas and Arizona to offer 2,550 jobs
Amazon’s technology centers in Texas and Arizona to offer 2,550 jobs
By Pranali Mehta

American e-commerce giant, Amazon has recently announced its plans to boost the hiring spree at its major technology hubs located in Arizona and Texas. Reportedly, the leading web service provider is working towards addition of 2,500 openings at its...

Apple expands its digital CarKey features to Hyundai & Genesis vehicles
Apple expands its digital CarKey features to Hyundai & Genesis vehicles
By Pranali Mehta

Hyundai and its luxury brand Genesis are reportedly planning to embed Apple's digital CarKey feature, which allows users to unlock their vehicles using the iPhone's Wallet app. This attribute could be available in the vehicles as early as sum...