In a couple of days, theWall Street Journalwill come out with its November survey of economists’ forecasts. It’s a particularly sensitive time, with elections in a few days and President Trump attacking the Federal Reserve for for raising interest rates. It’s a good time to recall major forecasting gaffes of the past. In 1987, best-sellingContinue reading “Examples of Bad Forecasting”
Blog
Historical Spotlight: Risk Simulation – Since 1946
Simulation – a Venerable History One of the most consequential and valuable analytical tools in business is simulation, which helps us make decisions in the face of uncertainty, such as these: An airline knows on average, what proportion of ticketed passengers show up for a flight, but the number for any given flight is uncertain. Continue reading “Historical Spotlight: Risk Simulation – Since 1946”
VARIANCE
It is 100 years since R A Fischer introduced the concept of “variance” (in his 1918 paper The Correlation Between Relatives on the Supposition of Mendelian Inheritance).
“out-of-bag,” as in “out-of-bag error”
“Bag” refers to “bootstrap aggregating,” repeatedly drawing of bootstrap samples from a dataset and aggregating the results of statistical models applied to the bootstrap samples. (A bootstrap sample is a resample drawn with replacement.)
BOOTSTRAP
I used the term in my message about bagging and several people asked for a review of the bootstrap. Put simply, to bootstrap a dataset is to draw a resample from the data, randomly and with replacement.
Same thing, different terms..
100 years of variance
It is 100 years since R A Fischer introduced the concept of “variance“(in his 1918 paper “The Correlation Between Relatives on the Supposition of Mendelian Inheritance“). There is much that statistics has given us in the century that followed. Randomized clinical trials, and the means to analyze them, moved medicine fully into the modern, science-based era.Continue reading “100 years of variance”
Early Data Scientists
Casting back long before the advent of Deep Learning for the “founding fathers” of data science, at first glance you would rule out antecedents who long predate the computer and data revolutions of the last quarter century. But some consider John Tukey (below), the Princeton statistician who named and developed the field of “exploratory data analysis,”Continue reading “Early Data Scientists”
Python for Analytics
Python started out as a general purpose language when it was created in 1991 by Guido van Rossum. It was embraced early on by Google founders Sergei Brin and Larry Page (“Python where we can, C++ where we must” was reputedly their mantra). In 2006, van Rossum (right) went to work at Google, where heContinue reading “Python for Analytics”
Course Spotlight: Deep Learning
Deep learning is essentially “neural networks on steroids” and it lies at the core of the most intriguing and powerful applications of artificial intelligence. Facial recognition (which you encounter daily in Facebook and other social media) harnesses many levels of data science tools, including algorithms that compare images and match those with similar measurements betweenContinue reading “Course Spotlight: Deep Learning”
Course Spotlight: Structural Equation Modelling (SEM)
SEM stands for “structural equation modeling,” and we are fortunate to have Prof. Randall Schumacker teaching this subject at Statistics.com. Randy created the Structural Equation Modeling (SEM) journal in 1994 and the Structural Equation Modeling Special Interest Group (SIG) at the American Educational Research Association (AERA) He has also co-authored several books, including: A Beginner’sContinue reading “Course Spotlight: Structural Equation Modelling (SEM)”
Benford’s Law Applies to Online Social Networks
Fake social media accounts and Russian meddling in US elections have been in the news lately, with Mark Zuckerberg (Facebook founder) testifying this week before the US Congress. Dr. Jen Golbeck, who teaches Network Analysis at Statistics.com, published an ingenious way to determine whether a Facebook, Twitter or other social media account is fraudulent. HerContinue reading “Benford’s Law Applies to Online Social Networks”
The Real Facebook Controversy
Cambridge Analytica’s wholesale scraping of Facebook user data is big news now, and people are shocked that personal data is being shared and traded on a massive scale on the internet. But the real issue with social media is not harming to individual users whose information was shared, but sophisticated and sometimes subtle mass manipulationContinue reading “The Real Facebook Controversy”
Course Spotlight: Two statistical modeling courses
Two important statistical modeling courses are coming up in May. May 18 – Jun 15: Principal Components and Factor Analysis May 18 – Jun 15: Modeling Count Data Factor analysis is used frequently in social science research where you want to examine that which you cannot observe (latent variables) using data that you canContinue reading “Course Spotlight: Two statistical modeling courses”
Masters Programs versus an Online Certificate in Data Science from Statistics.com
We just attended the analytics conference of INFORMS’ (The Institute for Operations Research and the Management Sciences) this week in Baltimore, and they held a special meeting for directors of academic analytics programs to better align what universities are producing with what industry is seeking. The number of such programs is still growing rapidly (>200),Continue reading “Masters Programs versus an Online Certificate in Data Science from Statistics.com”
Course Spotlight: Likert scale assessment surveys
Do you work with multiple choice tests, or Likert scale assessment surveys? Rasch methods help you construct linear measures from these forms of scored observations and analyze the results from such surveys and tests. “Practical Rasch Measurement – Core Topics“ In this course, you will learn practical aspects of data setup, analysis, output interpretation, fitContinue reading “Course Spotlight: Likert scale assessment surveys”
Course Spotlight: Customer Analytics in R
“The customer is always right” was the motto Selfridge’s department store coined in 1909. “We’ll tell the customer what they want” was Madison Avenue’s mantra starting in the 1950’s. Now data scientists like Karolis Urbonas help companies like Amazon (where he works in Europe as Head of Data Science, Amazon Devices) use data to figureContinue reading “Course Spotlight: Customer Analytics in R”
Course Spotlight: Predictive Analytics
Predicting whether an internet user will click on a link or buy a product, whether an insurance claim is fraudulent, whether a home mortgage will be paid on time (or early), how much a house will sell for, what internet ad you should see next, whether a discharged patient will need to return to theContinue reading “Course Spotlight: Predictive Analytics”
Course Spotlight: Spatial Statistics Using R
Have you ever needed to analyze data with a spatial component? Geographic clusters of disease, crimes, animals, plants, events?Or describing the spatial variation of something, and perhaps correlating it with some other predictor? Assessing whether the geographic distribution of something departs from randomness? Location data is ubiquitous, as are maps drawn by GIS software. SkilledContinue reading “Course Spotlight: Spatial Statistics Using R”
“Money and Brains” and “Furs and Station Wagons”
“Money and Brains” and “Furs and Station Wagons” were evocative customer shorthands that the marketing company Claritas came up with over a half century ago. These names, which facilitated the work of marketers and sales people, were shorthand descriptions of segments of customers identified through statistical cluster analysis. Cluster analysis is also used in marketContinue reading ““Money and Brains” and “Furs and Station Wagons””