Blog

Historical Spotlight: Risk Simulation – Since 1946

Simulation – a Venerable History One of the most consequential and valuable analytical tools in business is simulation, which helps us make decisions in the face of uncertainty, such as these: An airline knows on average, what proportion of ticketed passengers show up for a flight, but the number for any given flight is uncertain. Continue reading “Historical Spotlight: Risk Simulation – Since 1946”

VARIANCE

It is 100 years since R A Fischer introduced the concept of “variance” (in his 1918 paper The Correlation Between Relatives on the Supposition of Mendelian Inheritance).

“out-of-bag,” as in “out-of-bag error”

“Bag” refers to “bootstrap aggregating,” repeatedly drawing of bootstrap samples from a dataset and aggregating the results of statistical models applied to the bootstrap samples. (A bootstrap sample is a resample drawn with replacement.)

BOOTSTRAP

I used the term in my message about bagging and several people asked for a review of the bootstrap. Put simply, to bootstrap a dataset is to draw a resample from the data, randomly and with replacement.

Same thing, different terms..

The field of data science is rife with terminology anomalies, arising from the fact that the field comes from multiple disciplines.

100 years of variance

It is 100 years since R A Fischer introduced the concept of “variance“(in his 1918 paper “The Correlation Between Relatives on the Supposition of Mendelian Inheritance“). There is much that statistics has given us in the century that followed. Randomized clinical trials, and the means to analyze them, moved medicine fully into the modern, science-based era.Continue reading “100 years of variance”

Early Data Scientists

Casting back long before the advent of Deep Learning for the “founding fathers” of data science, at first glance you would rule out antecedents who long predate the computer and data revolutions of the last quarter century. But some consider John Tukey (below), the Princeton statistician who named and developed the field of “exploratory data analysis,”Continue reading “Early Data Scientists”

Python for Analytics

Python started out as a general purpose language when it was created in 1991 by Guido van Rossum. It was embraced early on by Google founders Sergei Brin and Larry Page (“Python where we can, C++ where we must” was reputedly their mantra). In 2006, van Rossum (right) went to work at Google, where heContinue reading “Python for Analytics”

Course Spotlight: Deep Learning

Deep learning is essentially “neural networks on steroids” and it lies at the core of the most intriguing and powerful applications of artificial intelligence. Facial recognition (which you encounter daily in Facebook and other social media) harnesses many levels of data science tools, including algorithms that compare images and match those with similar measurements betweenContinue reading “Course Spotlight: Deep Learning”

Course Spotlight: Structural Equation Modelling (SEM)

SEM stands for “structural equation modeling,” and we are fortunate to have Prof. Randall Schumacker teaching this subject at Statistics.com. Randy created the Structural Equation Modeling (SEM) journal in 1994 and the Structural Equation Modeling Special Interest Group (SIG) at the American Educational Research Association (AERA) He has also co-authored several books, including: A Beginner’sContinue reading “Course Spotlight: Structural Equation Modelling (SEM)”

Benford’s Law Applies to Online Social Networks

Fake social media accounts and Russian meddling in US elections have been in the news lately, with Mark Zuckerberg (Facebook founder) testifying this week before the US Congress. Dr. Jen Golbeck, who teaches Network Analysis at Statistics.com, published an ingenious way to determine whether a Facebook, Twitter or other social media account is fraudulent. HerContinue reading “Benford’s Law Applies to Online Social Networks”

The Real Facebook Controversy

Cambridge Analytica’s wholesale scraping of Facebook user data is big news now, and people are shocked that personal data is being shared and traded on a massive scale on the internet. But the real issue with social media is not harming to individual users whose information was shared, but sophisticated and sometimes subtle mass manipulationContinue reading “The Real Facebook Controversy”

Course Spotlight: Two statistical modeling courses

Two important statistical modeling courses are coming up in May. May 18 – Jun 15: Principal Components and Factor Analysis May 18 – Jun 15: Modeling Count Data Factor analysis is used frequently in social science research where you want to examine that which you cannot observe (latent variables) using data that you canContinue reading “Course Spotlight: Two statistical modeling courses”

Masters Programs versus an Online Certificate in Data Science from Statistics.com

We just attended the analytics conference of INFORMS’ (The Institute for Operations Research and the Management Sciences) this week in Baltimore, and they held a special meeting for directors of academic analytics programs to better align what universities are producing with what industry is seeking. The number of such programs is still growing rapidly (>200),Continue reading “Masters Programs versus an Online Certificate in Data Science from Statistics.com”

Course Spotlight: Likert scale assessment surveys

Do you work with multiple choice tests, or Likert scale assessment surveys? Rasch methods help you construct linear measures from these forms of scored observations and analyze the results from such surveys and tests. “Practical Rasch Measurement – Core Topics“ In this course, you will learn practical aspects of data setup, analysis, output interpretation, fitContinue reading “Course Spotlight: Likert scale assessment surveys”

Course Spotlight: Customer Analytics in R

“The customer is always right” was the motto Selfridge’s department store coined in 1909. “We’ll tell the customer what they want” was Madison Avenue’s mantra starting in the 1950’s. Now data scientists like Karolis Urbonas help companies like Amazon (where he works in Europe as Head of Data Science, Amazon Devices) use data to figureContinue reading “Course Spotlight: Customer Analytics in R”

Course Spotlight: Predictive Analytics

Predicting whether an internet user will click on a link or buy a product, whether an insurance claim is fraudulent, whether a home mortgage will be paid on time (or early), how much a house will sell for, what internet ad you should see next, whether a discharged patient will need to return to theContinue reading “Course Spotlight: Predictive Analytics”

Course Spotlight: Text Mining

The term text mining is sometimes used in two different meanings in computational statistics: Using predictive modeling to label many documents (e.g. legal docs might be “relevant” or “not relevant”) – this is what we call text mining. Using grammar and syntax to parse the meaning of individual documents – we use the term naturalContinue reading “Course Spotlight: Text Mining”

CONVOLUTION and TENSOR

Today’s Words of the Week are convolution and tensor, key components of deep learning.