• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

AI Weekly: These researchers are improving AI’s ability to understand different accents

March 6, 2021   Big Data
 AI Weekly: These researchers are improving AI’s ability to understand different accents

Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.


The pandemic appears to have supercharged voice app usage, which was already on an upswing. According to a study by NPR and Edison Research, the percentage of voice-enabled device owners who use commands at least once a day rose between the beginning of 2020 and the start of April. Just over a third of smart speaker owners say they listen to more music, entertainment, and news from their devices than they did before, and owners report requesting an average of 10.8 tasks per week from their assistant this year compared with 9.4 different tasks in 2019. According to a new report from Juniper Research, consumers will interact with voice assistants on 8.4 billion devices by 2024.

But despite their growing popularity, assistants like Alexa, Google Assistant, and Siri still struggle to understand diverse regional accents. According to a study by the Life Science Centre, 79% of people with accents alter their voice to make sure that they’re understood by their digital assistants. And in a recent survey commissioned by the Washington Post, popular smart speakers made by Google and Amazon were 30% less likely to understand non-American accents than those of native-born users.

Traditional approaches to narrowing the accent gap would require collecting and labeling large datasets of different languages, a time- and resource-intensive process. That’s why researchers at MLCommons, a nonprofit related to MLPerf, an industry-standard set of benchmarks for machine learning performance, are embarking on a project called 1000 Words in 1000 Languages. It’ll involve creating a freely available pipeline that can take any recorded speech and automatically generate clips to train compact speech recognition models.

“In the context of consumer electronic devices, for instance, you don’t want to have to go out and build new language datasets because that’s costly, tedious, and error-prone,” Vijay Janapa Reddi, an associate professor at Harvard and a contributor on the project, told VentureBeat in a phone interview. “What we’re developing is a modular pipeline where you’ll be able to plug in different sources speech and then specify the [words] for training that you want.”

While the pipeline will be limited in scope in that it’ll only create training datasets for small, low-power models that continually listen for specific keywords (e.g. “OK Google” or “Alexa”), it could represent a significant step toward truly accent-agnostic speech recognition systems. By convention, training a new keyword-spotting model would require manually collecting thousands of examples of labeled audio clips for each keyword. When the pipeline is released, developers will be able to simply provide a list of keywords they wish to detect along with a speech recording and the pipeline will automate the extraction, training, and validation of models without requiring any labeling.

“It’s not even really creating a dataset, it’s just training a dataset that comes about as a result of searching the larger corpus,” Reddi explained. “It’s like doing a Google search. What you’re trying to do is find a needle in a haystack — you end up with a subset of results with different accents and whatever else you have in there.”

The 1000 Words in 1000 Languages project builds on existing efforts to make speech recognition models more accessible — and equitable. Mozilla’s Common Voice, an open source and annotated speech dataset, consists of voice snippets and voluntarily contributed metadata useful for training speech engines like speakers’ ages, sex, and accents. As a part of Common Voice, Mozilla maintains a dataset target segment that aims to collect voice data for specific purposes and use cases, including the digits “zero” through “nine” as well as the words “yes,” “no,” “hey,” and “Firefox.” For its part, in December, MLCommons released the first iteration of a public 86,000-hour dataset for AI researchers, with later versions due to branch into more languages and accents.

“The organizations that have a huge amount of speech are often large organizations, but speech is something that has many applications,” Reddi said. “The question is, how do you get this into the hands of small organizations that don’t have the same scale as big entities like Google and Microsoft? If they have a pipeline, they can just focus on what they’re building.”

For AI coverage, send news tips to Khari Johnson and Kyle Wiggers — and be sure to subscribe to the AI Weekly newsletter and bookmark our AI channel, The Machine.

Thanks for reading,

Kyle Wiggers

AI Staff Writer

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more

Become a member

Let’s block ads! (Why?)

Big Data – VentureBeat

Ability, accents, AI’s, different, Improving, researchers, These, Understand, Weekly
  • Recent Posts

    • Accelerate Your Data Strategies and Investments to Stay Competitive in the Banking Sector
    • SQL Server Security – Fixed server and database roles
    • Teradata Named a Leader in Cloud Data Warehouse Evaluation by Independent Research Firm
    • Derivative of a norm
    • TODAY’S OPEN THREAD
  • Categories

  • Archives

    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited