• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

A super-fast machine learning model for finding user search intent

November 30, 2019   Big Data

In April 2019, Benjamin Burkholder (who is awesome, by the way) published a Medium article showing off a script he wrote that uses SERP result features to infer a user’s search intent. The script uses the SerpAPI.com API for its data and labels search queries in the following way:

  • Informational — The person is looking for more information on a topic. This is indicated by whether an answer box or PAA (people also ask) boxes are present.
  • Navigational — The person is searching for a specific website. This is indicated by whether a knowledge graph is present or if site links are present.
  • Transactional — The person is aiming to purchase something. This is indicated by whether shopping ads are present.
  • Commercial Investigation — The person is aiming to make a purchase soon but is still researching. This is indicated by whether paid ads are present, an answer box is present, PAAs are present, or if there are ads present at the bottom of the SERP.

This is one of the coolest ways to estimate search intent, because it uses Google’s understanding of search intent (as expressed by the SERP features shown for that search).

The one problem with Burkholder’s approach is its reliance on the Serp API. If you have a large set of search queries you want to find intent for, you need to pass each query phrase through the API, which then actually does the search and returns the SERP feature results, which Burkholder’s script can then classify. So on a large set of search queries, this is time consuming and prohibitively expensive.

SerpAPI charges ~$ 0.01 per keyword, so analyzing 5,000 keywords will cost you $ 50. Running these results through Burkholder’s labeler script also takes 3 to 5 hours to get through these 5,000 keywords.

So I got to thinking: What if I adapted Burkholder’s approach so that, rather than use it to classify intent directly, I could use it to train a machine learning model that I would then use to classify intent? In other words, I’d incur one-time costs to produce my Burkholder-labeled training set, and, assuming it was accurate enough, I could then use that training set for all further classification, cost free.

With an accurate training set, anyone could label huge numbers of keywords super quickly, without spending a dime.

Finding a model

Hamlet Batista has written a few stellar posts about how to leverage Natural Language models like BERT for labeling intent.

In his posts, he uses an existing intent labeling model that returns categories from Kaggle’s Question Answering Dataset. While these labels can be useful, they are not really “intent categories” in line with what we typically think of for intent taxonomy categories and instead have labels such as Description, Entity, Human, Numeric, and Location.

He achieved excellent results by training a BERT encoder, getting near 90% accuracy in predicting labels for new/unlabeled search keywords.

The big question for me was, could I leverage the same tech (Uber’s Ludwig BERT encoder) to create an accurate model using the search intent labels I’d get from Burkholder’s code?

It turns out the answer is yes!

How to do it

Here’s how the process works:

1. Gather your list of keywords. If you’re planning on training your own model, I recommend doing so within a specific category/niche. Training on clothing-related keywords and then using that model to label finance related keywords will likely be significantly less accurate than training on clothing related keywords and then using that model to label other unlabeled clothing related keywords. That said, I did try using a model labeled on one category/niche to label another, and the results still seemed quite good to me.

2. Run Burkholder’s script over your list of keywords from Step 1. This will require signing up for SerpAPI.com and buying credits. I recommend getting labels for at least 10,000 search queries with this script to use for training. The more training data, the more accurate your model will likely be.

3. Use the labeled data from the previous step as your training data for the BERT model. Batista’s code to do this is very straightforward, and this article will guide you through the process. I was able to get about ~72% accuracy using about 10,000 labels of training data.

4. Use your model from Step 3 to label unlabeled search data, and then take a look at your results!

The results

I ran through this process using a huge list (13,000 keywords) of clothing/fashion-related search terms from SEMrush as my training data. My resulting model gets just about 80% accuracy.

It seems likely that training the model with more data will continue to improve its accuracy up to a point. If any of you attempt it and improve on 80% accuracy, I would love to hear about it. I think with 20,000+ labeled searches, we could see up to maybe 85-90% accuracy.

This means when you ask this model to predict the intent of unlabeled search queries, 8 times out of 10 it will give you the same label as what would have been returned by Burkholder’s Serp API rules-based classifier. It can also do this for free, in large volumes and incredibly fast.

So something that would have taken a few thousand dollars and days of scraping can now be done for free in just minutes.

In my case I used keywords from a related domain (makeup) instead of clothing keywords, and overall I think it did a pretty good job. Labeling 5,000 search queries took under two minutes with the BERT model. Here’s what my results looked like:

The implications

For SEO tools to be useful, they need to be scalable. Keyword research, content strategy, PPC strategy, and SEO strategy usually rely on being able to do analysis across entire niches/themes/topics/websites.

In many industries, the keyword longtails can extend into the millions. So a faster, more affordable approach to Burkholder’s solution can make a lot of difference.

I forsee AI and machine learning tools being used more and more in our industry, enabling SEOs, paid search specialists, and content marketers to gain superpowers that haven’t been possible before these new AI breakthroughs.

Happy analyzing!

Kristin Tynski is a founder and the SVP of Creative at Fractl, a boutique growth agency based in Delray Beach, FL.

Let’s block ads! (Why?)

Big Data – VentureBeat

finding, Intent, Learning, Machine, model, search, superfast, User
  • Recent Posts

    • InfoWars Surrenders
    • Invest Your Time in the Right Skills to Become a Data Scientist in 2021
    • Facebook’s new computer vision model achieves state-of-the-art performance by learning from random images
    • Now make soup!
    • Attach2Dynamics Or SharePoint Security Sync – Choose your smart app for effective document management in Dynamics 365 CRM/Power Apps.
  • Categories

  • Archives

    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited