On Wednesday, March 27, the 2018 Turing Award in computing was given to Yoshua Bengio, Geoffrey Hinton and Yann LeCun for their work on deep learning. Deep learning by complex neural networks lies behind the applications that are finally bringing artificial intelligence out of the realm of science fiction into reality. Voice recognition allows youContinue reading “A Deep Dive into Deep Learning”
Author Archives: Dave Flatley
Industry Spotlight: Credit Scoring
In the U.S., credit scoring is dominated by three companies – Experian, TransUnion and Equifax, employing roughly 30,000 people. An important player in the scoring methodology is FICO, previously Fair Isaac Corporation, and the scores are typically called “FICO scores.” Credit scoring is the oldest application of predictive modeling, fulfilling a need that has beenContinue reading “Industry Spotlight: Credit Scoring”
Industry Spotlight: The IRS is Watching You
The IRS (U.S. Internal Revenue Service) has been using computers to choose tax returns for audit since 1962. Early on, the selection was rule-based, but the IRS turned to statistical modeling in 1969, using the oldest predictive analytics model in the toolbox – discriminant analysis. Discriminant analysis, a linear classification technique, was first proposed byContinue reading “Industry Spotlight: The IRS is Watching You”
Book Review: Weapons of Math Destruction
Cathy O’Neil’s Weapons of Math Destruction, when it was first published in 2016, sounded an early alarm about the big data algorithms and their potential for social evil. The cover is adorned with a robotic death’s head and the subtitle reads “How Big Data Increases Inequality and Threatens Democracy.” O’Neil’s book begins with stories thatContinue reading “Book Review: Weapons of Math Destruction”
Historical Spotlight: Alan Turing
80 years ago, in 1939, Alan Turing began work on the code-breaking system that would eventually prove key in helping Britain survive the German submarine threat in the Atlantic. Last month, the Turing Award in computer science prize (sometimes referred to as the “Nobel Prize of Computing”) was given to three researchers, Yann LeCunn, GeoffreyContinue reading “Historical Spotlight: Alan Turing”
Confusing Terms in Data Science – A Look at Synonyms
To a statistician, a sample is a collection of observations (cases). To a machine learner, it’s a single observation. Modern data science has its origin in several different fields, which leads to potentially confusing synonyms, like these:
Confusing Terms in Data Science – A Look at Homonyms and more Synonyms
To a statistician, a sample is a collection of observations (cases). To a machine learner, it’s a single observation. Modern data science has its origin in several different fields, which leads to potentially confusing homonyms like these:
Confusing Terms in Data Science – A Look at Synonyms, Homonyms and more
To a statistician, a sample is a collection of observations (cases). To a machine learner, it’s a single observation. Modern data science has its origin in several different fields, which leads to potentially confusing homonyms and synonyms, like these: Homonyms (words with multiple meanings): Bias: To a lay person, bias refers to an opinion about somethingContinue reading “Confusing Terms in Data Science – A Look at Synonyms, Homonyms and more”
Industry Spotlight: Package Delivery
Nothing better illustrates the encroachment of data science and analytics on the older “economy of tangible things” than the business of delivering packages. The use of analytics in package delivery is not new. Companies like UPS and Fedex are longtime users of operations research methods like optimization and simulation to route inter-city shipments, site newContinue reading “Industry Spotlight: Package Delivery”
Ethical Practice in Data Mining
Prior to the advent of internet-connected devices, the largest source of big data was public interaction on the internet. Social media users, as well as shoppers and searchers on the internet, make an implicit deal with the big companies that provide these services: users can take advantage of powerful search, shopping and social interaction toolsContinue reading “Ethical Practice in Data Mining”
Job Spotlight: Sports Statistician
The field of sports statistician is not exactly new; the American Statistical Association’s section on Sports Statistics was formed in 1992. Three of Statistics.com’s instructors have professional experience in sports statistics – Ben Baumer (SQL) served as statistician for the NY Mets, Stephanie Kovalchik (Meta Analysis in R) with Tennis Australia, and Joe Hilbe, whoContinue reading “Job Spotlight: Sports Statistician”
Industry Spotlight: Baseball – Opening Day & Statistics in Sports
The U.S. baseball season opens Thursday, March 28, and celebrates the 48th season of analytics in baseball, beginning with the founding of the Sabermetric Society in 1971 (the same year that Satchel Paige entered the Hall of Fame). Analytics has come a long way in sports, and now has its own conference, the MIT SportsContinue reading “Industry Spotlight: Baseball – Opening Day & Statistics in Sports”
Jaquard’s coefficient
When variables have binary (yes/no) values, a couple of issues come up when measuring distance or similarity between records. One of them is the “yacht owner” problem.
Darwin’s Legacy in Statistics
Charles Darwin, the most famous grandson of the Enlightenment thinker Erasmus Darwin, published his ground-breaking theory of evolution, “The Origin of Species,”160 years ago. Another grandson of Erasmus, Francis Galton, became one of the founding fathers of statistics (correlation, the “wisdom of the crowd,” regression and regression to the mean are all Galton’s ideas). HeavilyContinue reading “Darwin’s Legacy in Statistics”
Industry Spotlight: Customer Segmentation
Are you “young and rustic?” Or perhaps a “toolbelt traditionalist?” These are nicknames given to customer segments identified by market research firm Claritas, with its statistical clustering tool. Long before the advent of individualized product recommendations, business sought to segment customers into distinct groups on the basis of purchase behavior, demographic variables, and geography, toContinue reading “Industry Spotlight: Customer Segmentation”
Industry Spotlight: CROs
CRO’s, or contract research organizations, are a $40 billion industry, growing at close to 12% per year. They provide contract services to the pharmaceutical industry, including statistical design and analysis, laboratory services, administration of clinical trials, and monitoring of drugs once they are on the market. Developing a new drug and bringing it to marketContinue reading “Industry Spotlight: CROs”
Handling the Noise – Boost It or Ignore It?
In most statistical modeling or machine learning prediction tasks, there will be cases that can be easily predicted based on their predictor values (signal), as well as cases where predictions are unclear (noise). Two statistical learning methods, boosting and ProfWeight, use those difficult cases in exactly opposite ways – boosting up-weights them, and ProfWeight down-weightsContinue reading “Handling the Noise – Boost It or Ignore It?”
Problem of the Week: Probability
Your country is at war, and an enemy plane has crashed on your territory. It bears the number 60, and a spy has told you that the aircraft are numbered serially. Can you make a guess about the total number of aircraft the enemy has produced? Solution: This problem is one of those published byContinue reading “Problem of the Week: Probability”
Rectangular data
Rectangular data are the staple of statistical and machine learning models. Rectangular data are multivariate cross-sectional data (i.e. not time-series or repeated measure) in which each column is a variable (feature), and each row is a case or record.
“Defiant” Supervision
How did the phrase “defiantly recommend”, as in “I defiantly recommend this product,” come into common usage on the internet? The answer is a good look inside the workings of supervised learning. Supervision, generally from humans, is instrumental in much of statistical and machine learning. Google’s precise search algorithms are not public, but the generalContinue reading ““Defiant” Supervision”