There are more than 3 dozen curses in Harry Potter. Data scientists have only one – the “curse of dimensionality.” Dimensionality is the number of predictors or input variables in a model, and the “curse” refers to the problems that result from including too many features (predictor variables) in a model. Old curses are awakenedContinue reading “The Curse of Dimensionality”
Author Archives: Dave Flatley
Student Spotlight: Peter Mulready
Peter Mulready is an independent consultant, who worked previously as a system architect at Boehringer Ingelheim, one of the world’s largest pharmaceutical companies. Peter got his degree in biology, but his focus shifted to managing and optimizing the use of data in drug discovery research. Specifically, he lead the information technology team responsible for managingContinue reading “Student Spotlight: Peter Mulready”
Anomaly Detection via Conversation: “How was your vacation?”
A friendly query about your holiday might be a question you get from a roaming agent in the check-in area at the Tel Aviv airport. Israel, considered to have the most effective airport security in the world, does not rely solely on routine mechanical screening of passengers and baggage by low-paid workers. It also usesContinue reading “Anomaly Detection via Conversation: “How was your vacation?””
e-cigarettes
Last week, the Trump administration announced a forthcoming ban on e-cigarettes, following news stories of a spate of deaths from vaping. The Wall Street Journal, on Friday the 13th, published both an editorial and an op-ed piece suggesting that any harm from e-cigarettes is minor and unproven, and counterbalanced by the good they do inContinue reading “e-cigarettes”
Book Review: Bandit Algorithms for Website Optimization, by John Myles White
Bandit Algorithms for Website Optimization, by John Myles White A classic statistical experimental design comparing treatments (two treatments, treatment versus control, multiple treatments) specifies a sample size, collection of data, then a decision, typically based on hypothesis-testing: the winning treatment must attain a level of statistical significance, otherwise you go with the default “null hypothesis.”Continue reading “Book Review: Bandit Algorithms for Website Optimization, by John Myles White”
“Islands in Search of Contents”
“Islands in Search of Continents” is the subtitle of an article by Michael Clarke and Iain Chalmers in the Journal of the American Medical Association (1998; 280: 280-282). It refers to the fact that many studies are conducted and reported in isolation from other studies on the same subject. A good review of the subject can beContinue reading ““Islands in Search of Contents””
Meta Analysis
1.2 million scientific papers were indexed by PubMed in 2011 (see Are Scientists Doing Too Much Research), ample proof that there are lots of people studying the same or similar things. For example, there have been Over 100 studies of suicide following psychiatric institutionalization 38 studies of whether e-cigarettes help you quit smoking – 38 studies Continue reading “Meta Analysis”
Industry Spotlight: Health Analytics
Patient Data Management Health analytics is a hot topic now, but to do the analytics you need data – this is where Electronic Health Records (EHR) come in. An integrated, standardized system for sharing and accessing health data has been “just around the corner” now for more than a decade. Despite a big push byContinue reading “Industry Spotlight: Health Analytics”
Superusers
“Superusers” of medical services are the small fraction of patients that account for huge consumption of medical services. An article published August 14, 2019 in JAMA Surgery (online) reports on the application of machine learning methods to Medicare data on 1,049,160 Medicare patients who underwent surgery, and were then tracked over the next year to assessContinue reading “Superusers”
Job Spotlight: Biostatisticans
Biostatisticians are the shepherds (and the police) that guide the science of developing new therapies for disease. They come in several different flavors: Those involved in gathering information, designing experiments and analyzing data at the drug discovery stage – trying to sort out what works and what doesn’t, and learning which research directions have potentialContinue reading “Job Spotlight: Biostatisticans”
Aug 16: Statistics in Practice
Here in Part 2 of the Weekly Brief, we offer some tools to help you with the question, “what is the optimal set of alternatives to offer consumers?” Our course spotlight is on: Aug 30 – Sep 27: Discrete Choice Modeling and Conjoint Analysis See you in class! – Peter Bruce, Chief Academic Officer, Author, Instructor, andContinue reading “Aug 16: Statistics in Practice”
Problem of the Week: The Second Heads
QUESTION: A friend tosses two coins, and you ask “Is one of them a heads?” The friend replies “Yes.” What is the probability that the other is a heads? ANSWER: One-third. There are four ways the coins could have landed originally: HH: 0.25 probability HT 0.25 probability TH 0.25 probability TT Continue reading “Problem of the Week: The Second Heads”
Aug 13: Statistics in Practice
This week we discuss the distinction between explanatory and predictive modeling and spotlight the workhorses of statistical modeling: Oct 4 – Nov 1: Regression Analysis Oct 4 – Nov 1: Categorical Data Analysis See you in class! – Peter Bruce, Chief Academic Officer, Author, Instructor, and Founder The Institute for Statistics Education at Statistics.com Explain or Predict? Are you flummoxed by the profusion ofContinue reading “Aug 13: Statistics in Practice”
Explain or Predict?
A casual user of machine learning methods like CART or naive Bayes is accustomed to evaluating a model by measuring how well it predicts new data. When examining the output of statistical models, they are often flummoxed by the profusion of assessment metrics. Typical multiple linear regression output will contain, in addition to a distributionContinue reading “Explain or Predict?”
Intervals (confidence, prediction and tolerance)
Small Ball: Calling all thinkers!
I was visiting New York a couple of weeks ago, transferring from Amtrak to the PATH trains at Newark. PATH takes you to Wall Street – the #1 financial center in the world – and yet the process of paying for my $2.75 PATH ticket was excruciating. When I arrived at Newark, my colleague, whoContinue reading “Small Ball: Calling all thinkers!”
Aug 9: Statistics in Practice
We continue Monday’s discussion of “people analytics’ with a look from the customer’s side and a call for all thinkers! (see below) Our course spotlight is on: Sep 6 – Oct 4: Predictive Analytics 1 – Machine Learning Tools Sep 6 – Oct 4: Programming 1 (R or Python) See you in class! – Peter Bruce, Chief AcademicContinue reading “Aug 9: Statistics in Practice”
Industry Spotlight: HR (People Analytics)
Analytics has come to HR. It’s partly Orwellian, tracking what employees do on the computer, and partly warm and fuzzy, leveraging the true informal organizational structure via network analysis (jump into Friday’s Network Analysis course to learn the basics). One dimension assumes the worst about employees, and gives bosses extra powers to keep tabs onContinue reading “Industry Spotlight: HR (People Analytics)”
Aug 5: Statistics in Practice
In this week’s Brief, analytics comes to the HR department (“people analytics”), and our course spotlight is on: Sep 6 – Oct 4: Predictive Analytics 1 Sep 6 – Oct 4: Programming 1 (R or Python) These courses are excellent entry points into our data science certificate programs: Analytics for Data Science (focuses on offContinue reading “Aug 5: Statistics in Practice”
Aug 2: Statistics in Practice
In part 1 of this week’s brief, we looked at political analytics; in Part 2 we extend that look to commercial domains. Our course spotlight is Persuasion Analytics, taught by Ken Strasma, who pioneered the use of statistical modeling to microtarget voters in the 2004 U.S. presidential campaign. Aug 23 – Sep 20: Persuasion Analytics SeeContinue reading “Aug 2: Statistics in Practice”