Category Archives: General Post
Three Myths in Data Science
Predicting “Do Not Disturbs”
Safety in Numbers – Calculating Probabilities for Convoys
Rare Event Syndrome
Ethical Data Science
Statistical Arbitrage
When Probabilities Sum to More than One
Tracking Your Wanderings, for the Public Good
Evolutionary Algorithms
COVID-19: Sensitivity, Specificity, and More
Conversations with Data Scientists about R and Python
Elder Research Capabilities
Coronavirus Death Toll
P-Values – Are They Needed?
Covid-19 Parameters
There are many moving parts in modeling the spread of an epidemic, a subject that has lately attracted the attention of great numbers of statistically-oriented non-epidemiologists (like me). I’ve put together a “lay statistician’s guide” to some of the important parameters and factors (and I welcome corrections/additions!). Terms Case fatality rate or CFR: Deaths asContinue reading “Covid-19 Parameters”
Coronavirus – in Search of the Elusive Denominator
Anyone with internet access these days has their eyes on two constellations of data – the spread of the coronavirus, and the resulting collapse of the financial markets. Following the 13% one-day drop of the stock market a week ago, The Wall Street Journal forecast a quarterly GDP drop of as much as 10% –Continue reading “Coronavirus – in Search of the Elusive Denominator”
Ensemble Learning
In his book, The Wisdom of Crowds, James Surowiecki recounts how Francis Galton, a prominent statistician from the 19th century, attended an event at a country fair in England where the object was to guess the weight of an ox. Individual contestants were relatively well informed on the subject (the audience was farmers), but theirContinue reading “Ensemble Learning”
Big Sample, Unreliable Result
Which would you rather have? A large sample that is biased, or a representative sample that is small? The American Statistical Association committee that reviewed the 1948 Kinsey report on male sexual behavior, based on interviews with over 5000 men, left no doubt of their preference for the latter. The statisticians – William Cochran, FrederickContinue reading “Big Sample, Unreliable Result”
Mixed Models – When to Use
Companies now have a lot of data on their customers at an individual level. Suppose you are tasked with forecasting customer spending at a grocery chain, and you want to understand how customer attributes, local economic factors, and store issues affect customer spending. You could design your study with hierarchical and mixed linear modeling methodsContinue reading “Mixed Models – When to Use”