How did the phrase “defiantly recommend”, as in “I defiantly recommend this product,” come into common usage on the internet? The answer is a good look inside the workings of supervised learning.
Supervision, generally from humans, is instrumental in much of statistical and machine learning. Google’s precise search algorithms are not public, but the general approach is to return to the user a set of links that are statistically “close” to the search string. A similar approach is used in spell-check. If the user types something that is not in the dictionary, the spell-checker provides legitimate dictionary words that are close to the misspelled term.
Big Data Enables Supervision
Next comes the supervision part. Users choose the link or the word, that matches what they are looking for. Again and again – Google processes over 40,000 such queries per second. Over time, for each search string, and each misspelling, Google observes which of its suggestions receives the most votes, and moves it to the top of the list.
“Defiant” recommendations arose out of an initial misspelling – users meant “definitely recommend” but some typed “definatly recommend.” Recognizing that “definatly” was not a correct spelling, Google suggested alternatives. In early days, without any supervision to go on, Google listed “defiantly” ahead of “definitely” because it was closer to “definatly” in its spelling – it’s the same set of letters, with two needing swapping. “Definitely” needs more changes – lose the “a”, and add an “e” and “i” – so early on it was listed second.
If supervision is working properly, the users would correct the error by choosing the correct spelling. Unfortunately, the early human supervisors were lazy. They simply OK’d the first option on offer – “defiantly” – and Google learned that this was the proper correction for “definatly.”
Supervised learning, in its first appearances, was a modified form of traditional statistical modeling, in which a model is fit to a set of data in order to describe and, hopefully, explain the relationship between predictor variables and an outcome. Supervised learning adds two elements:
-
The primary purpose becomes predicting outcomes for new records
-
The model is validated and adjusted using out-of-sample data (a sample withheld from the original model fit)
Predicting credit scores using logistic regression was an early (and continuing) application of supervised learning. The same paradigm was applied to data-centric algorithms that do not impose a linear or other structural models, such as
-
Nearest neighbor algorithms (label new cases as other, similar, cases are labeled)
-
Tree algorithms (repeatedly split the data according to predictors that do a good job of separating the outcomes)
-
Naive Bayes (identify the most probable outcome, given the predictors)
-
Neural nets (repeatedly pass the cases through a set of weights applied to predictors, iteratively adjusting the weights to improve predictions)
The term “artificial intelligence” has somewhat eclipsed “machine learning” in the popular imagination but supervised learning remains a core function in data science