Which would you rather have? A large sample that is biased, or a representative sample that is small? The American Statistical Association committee that reviewed the 1948 Kinsey report on male sexual behavior, based on interviews with over 5000 men, left no doubt of their preference for the latter. The statisticians – William Cochran, FrederickContinue reading “Big Sample, Unreliable Result”
Category Archives: Blog Type
Mixed Models – When to Use
Companies now have a lot of data on their customers at an individual level. Suppose you are tasked with forecasting customer spending at a grocery chain, and you want to understand how customer attributes, local economic factors, and store issues affect customer spending. You could design your study with hierarchical and mixed linear modeling methodsContinue reading “Mixed Models – When to Use”
The Normal Share of Paupers
In 2009, China began regional pilot programs that repurposed credit scores to a broader purpose – scoring a person’s “social credit.” 100 years earlier, at the height of the eugenics craze, the famous statistician Francis Galton undertook to repurpose statistical concepts in service of social engineering. The starting point was a social survey of LondonContinue reading “The Normal Share of Paupers”
UpLift and Persuasion
The goal of any direct mail campaign, or other messaging effort, is to persuade somebody to do something. In the business world, it is usually to buy something. In the political world, it is usually to vote for someone (or, if you think you know who they will vote for, to encourage them to actuallyContinue reading “UpLift and Persuasion”
Lift and Persuasion
Predicting the probability that something or someone will belong to a certain category (classification problems) is perhaps the oldest type of problem in analytics. Consider the category “repays loan.” Equifax, the oldest of the agencies that provides credit scores, was founded in 1899 as the Retail Credit Company by two brothers, Cator and Guy Woolford. Continue reading “Lift and Persuasion”
Going Beyond the Canary Trap
In 2008, Elon Musk was concerned about leaks of sensitive information at Tesla Motors. To catch the leaker, he prepared multiple unique versions of a new nondisclosure agreement he asked senior officers to sign. Whichever version got leaked would reveal the leak source. This is known as a “canary trap.” The canary trap only worksContinue reading “Going Beyond the Canary Trap”
Book Review: Mining Your Own Business by Gerhard Pilcher and Jeff Deal
This is a short book, Mining Your Own Business: A Primer for Executives on Understanding and Employing Data Mining and Predictive Analytics” befitting its intended audience – managers and executives with responsibility for data science and analytics projects. It outlines the requirements for success – not technical model success, but rather successful implementation in a way that buildsContinue reading “Book Review: Mining Your Own Business by Gerhard Pilcher and Jeff Deal”
Choosing the Right Analytics Problem
The “streetlight effect:” A man is looking for his keys under a streetlight. Policeman: “Where did you lose them?” Man: “In the alley, near the door to the bar.” Policeman: “Why are you looking here?” Man: “The light’s better.” This is related to the more general “Statistical Type 4 Error” – asking the wrong question, andContinue reading “Choosing the Right Analytics Problem”
Ethical Dilemmas in Data Science
Know those ads that follow you around the web, as a result of tracking cookies? Many see them as an invasion of privacy, and EU rules made them subject to user consent. Google recently announced that Chrome will eventually stop supporting these cookies. A win for the consumer? Perhaps, but there is another side toContinue reading “Ethical Dilemmas in Data Science”
Not Glamorous, But Lucrative
What do stormy days, weekend evenings, and the last day of the month have in common? They are all good times to negotiate a good price for a new car. Inclement days yield less customer traffic in auto showrooms, which is good for the buyer. Weekend evenings, just before closing time, may make sales peopleContinue reading “Not Glamorous, But Lucrative”
Historical Spotlight: Statistical Analysis and Human Rights
Artificial intelligence and analytics have gotten some bad press recently, from the role that social media has played in fracturing and heightening divisions in democratic society to the “big brother” role that data mining and image recognition have played in China’s suppression of minorities. But statistical analysis has also long played a role in documenting,Continue reading “Historical Spotlight: Statistical Analysis and Human Rights”
Simulating the Complex Sale
Every 30 minutes a new business book is published; many of them purport to teach effective selling. Most of them make sense, but solid quantitative analysis is rarely on the front burner. This is strange, because effective selling requires demonstrating value. Sales professionals are taught to show components of value such as cost savings orContinue reading “Simulating the Complex Sale”
Analytics Meets the Cardboard Box
“Do you have a bag?“ or “Would you like a bag?” have become common parts of the brick-and-mortar retail transaction. Reusable bags, or simply doing without, have reduced the flow of plastic and paper into recycling. E-commerce is a different matter. I just unpacked a box of wine, and dealing with the protective spacers andContinue reading “Analytics Meets the Cardboard Box”
Detecting a Slots Payout Difference of 2%
Most businesses use statistics and analytics to one degree or another, but there is only one industry that is built solely on this discipline. This week we look at the casino business – in particular, the odds on slots. Slot machines are a casino’s best friend. Able to run 24/7 with consistently-sized bets, slots realizeContinue reading “Detecting a Slots Payout Difference of 2%”
Book Review: Big Data in Practice by Bernard Marr
This short book is essentially an enriched list of 45 examples of how companies have used big data analytics. Marr sticks to high level generalities, and the book is in the spirit of light business journalism rather than detailed expositions that walk you through a successful big data implementation in detail. However, private companies, andContinue reading “Book Review: Big Data in Practice by Bernard Marr “
Problem of the Week: Missing Data
Question: You have a supervised learning task with 30 predictors, in which 5% of the observations are missing. The missing data are randomly distributed across variables and records. If your strategy for coping with missing data is to drop records with missing data, what proportion of the records will be dropped? Is the assumption ofContinue reading “Problem of the Week: Missing Data”
“Money and Brains” and “Furs and Station Wagons”
“Money and Brains” and “Furs and Station Wagons” were evocative customer shorthands that the marketing company Claritas came up with over a half century ago. These names, which facilitated the work of marketers and sales people, were shorthand descriptions of segments of customers identified through statistical cluster analysis. Cluster analysis is also used in marketContinue reading ““Money and Brains” and “Furs and Station Wagons””