• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

Researchers find that large language models struggle with math

March 9, 2021   Big Data

The power of audio

From podcasts to Clubhouse, branded audio is more important than ever. Learn how brands are increasing customer loyalty and personalization with these best practices.

Register Now


Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.


Mathematics is the foundation of countless sciences, allowing us to model things like planetary orbits, atomic motion, signal frequencies, protein folding, and more. Moreover, it’s a valuable testbed for the ability to problem solve, because it requires problem solvers to analyze a challenge, pick out good methods, and chain them together to produce an answer.

It’s revealing, then, that as sophisticated as machine learning models are today, even state-of-the-art models struggle to answer the bulk of math problems correctly. A new study published by researchers at the University of California, Berkeley finds that large language models including OpenAI’s GPT-3 can only complete 2.9% to 6.9% of problems from a dataset of over 12,500. The coauthors believe that new algorithmic advancements will likely be needed to give models stronger problem-solving skills.

Prior research has demonstrated the usefulness of AI that has a firm grasp of mathematical concepts. For example, OpenAI recently introduced GPT-f, an automated prover and proof assistant for the Metamath formalization language. GPT-f found new short proofs that have been accepted into the main Metamath library, the first time a machine learning-based system contributed proofs that were adopted by a formal mathematics community. For its part, Facebook also claims to have experimented successfully with math-solving AI algorithms. In a blog post last January, researchers at the company said they’d taught a model to view complex mathematical equations “as a kind of language and then [treat] solutions as a translation problem.”

“While most other text-based tasks are already nearly solved by enormous language models, math is notably different. We showed that accuracy is slowly increasing and, if trends continue, the community will need to discover conceptual and algorithmic breakthroughs to attain strong performance on math,” the coauthors wrote. “Given the broad reach and applicability of mathematics, solving math datasets with machine learning would be of profound practical and intellectual significance.”

To measure the problem-solving ability of large and general-purpose language models, the researchers created a dataset called MATH, which consists of 12,500 problems taken from high school math competitions. Given a problem from MATH, language models must generate a sequence that reveals the final answer.

 Researchers find that large language models struggle with math

Above: A comparison of a MATH dataset problem with problems from DeepMind’s Mathematics Dataset and a Metamath module.

Image Credit: MATH

Problems in MATH are labeled by difficulty from 1 to 5 and span seven subjects, including geometry, algebra, calculus, statistics, linear algebra, and number theory. They also come with step-by-step solutions so that language models can learn to answer new questions they haven’t seen before.

Training models on the fundamentals of mathematics required the researchers to create a separate dataset with hundreds of thousands of solutions to common math problems. This second dataset, the Auxiliary Mathematics Problems and Solutions (AMPS), comprises more than 100,000 problems from Khan Academy with solutions and over 5 million problems generated using Mathematica scripts based on 100 hand-designed modules. In total, AMPS contains 23GB of content.

As the researchers explain, the step-by-step solutions in the datasets allow the language models to use a “scratch space” much like a human mathematician might. Rather than having to arrive at the correct answer right away, models can first “show their work” in partial solutions that step toward the right answer.

Even with the solutions, the coauthors found that accuracy remained low for the large language models they benchmarked: GPT-3 and GPT-2, GPT-3’s predecessor. Having the models generate their own solutions before producing an answer actually degraded accuracy because while many of the steps were related to the question, they were illogical. Moreover, simply increasing the amount of training time and the number of parameters in the models, which sometimes improves performance, proved to be impractically costly. (In machine learning, parameters are variables whose values control the learning process.)

This being the case, the researchers showed that step-by-step solutions still provide benefits in the form of improved performance. In particular, providing models with solutions at training time increased accuracy substantially, with pretraining on AMPS boosting accuracy by around 25% — equivalent to a 15 times increase in model size.

“Despite these low accuracies, models clearly possess some mathematical knowledge: they achieve up to 15% accuracy on the easiest difficulty level, and they are able to generate step-by-step solutions that are coherent and on-topic even when incorrect,” the coauthors wrote. “Having models train on solutions increases relative accuracy by 10% compared to training on the questions and answers directly.”

The researchers have released MATH and AMPS in open source to, along with existing mathematics datasets like DeepMind’s, spur further research along this direction.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Let’s block ads! (Why?)

Big Data – VentureBeat

Find, language, large, MATH, Models, researchers, struggle
  • Recent Posts

    • Accelerate Your Data Strategies and Investments to Stay Competitive in the Banking Sector
    • SQL Server Security – Fixed server and database roles
    • Teradata Named a Leader in Cloud Data Warehouse Evaluation by Independent Research Firm
    • Derivative of a norm
    • TODAY’S OPEN THREAD
  • Categories

  • Archives

    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited