Word of the Week Archives - Page 5 of 6 - Statistics.com: Data Science, Analytics & Statistics Courses

Week #8 – Confusion matrix

In a classification model, the confusion matrix shows the counts of correct and erroneous classifications. In a binary classification problem, the matrix consists of 4 cells.

Week #5 – Features vs. Variables

The predictors in a predictive model are sometimes given different terms by different disciplines. Traditional statisticians think in terms of variables.

Week #48 – Structured vs. unstructured data

Structured data is data that is in a form that can be used to develop statistical or machine learning models (typically a matrix where rows are records and columns are variables or features).

Word #39 – Censoring

Censoring in time-series data occurs when some event causes subjects to cease producing data for reasons beyond the control of the investigator, or for reasons external to the issue being studied.

Work #32 – Predictive modeling

Predictive modeling is the process of using a statistical or machine learning model to predict the value of a target variable (e.g. default or no-default) on the basis of a series of predictor variables (e.g. income, house value, outstanding debt, etc.).

Week #29 – Goodness-of-fit

Goodness-of-fit measures the difference between an observed frequency distribution and a theoretical probability distribution which

Week #23 – Adjacency Matrix

An adjacency matrix describes the relationships in a network. Nodes are listed in the top..

Week #51 – Type 1 error

In a test of significance (also called a hypothesis test), Type I error is the error of rejecting the null hypothesis when it is true — of saying an effect or event is statistically significant when it is not.

Week #49 – Data partitioning

Data partitioning in data mining is the division of the whole data available into two or three non-overlapping sets: the training set (used to fit the model), the validation set (used to compared models), and the test set (used to predict performance on new data).

Week #43 – Longitudinal data

Longitudinal data records multiple observations over time for a set of individuals or units. A typical..

Week #42 – Cross-sectional data

Cross-sectional data refer to observations of many different individuals (subjects, objects) at a given time, each observation belonging to a different individual. A simple…

Week #32 – CHAID

CHAID stands for Chi-squared Automatic Interaction Detector. It is a method for building classification trees and regression trees from a training sample comprising already-classified objects.

Week # 29 – Training data

Also called the training sample, training set, calibration sample. The context is predictive modeling (also called supervised data mining) – where you have data with multiple predictor variables and a single known outcome or target variable.

Week #18 – Centroid

The centroid is a measure of center in multi-dimensional space.

2013 – The International Year of Statistics

Promoting better understanding of statistics throughout the world.

Congratulations to Michelle Everson!

New Editor of Journal of Statistics Education

Airline Passenger Screening Can Be Random

Read Peter’s Letter to the Editor in Saturday’s Washington Post.

Churn Trigger

Last year’s popular story out of the Predictive Analytics World conference series was Andrew Pole’s presentation of Target’s methodology for predicting which customers were pregnant.

Randomized Trials on online learning

Evidence show that there is no significant difference between taking an online introductory statistics course and a traditional in-person class.

Facebook IPO

Facebook began trading around 11:30 this morning, and I spent 8 minutes