Glossaries Archive - Page 25 of 33 - Statistics.com: Data Science, Analytics & Statistics Courses

Autoregressive (AR) Models

Autoregressive (AR) Models: The autoregressive (AR) models are used in time series analysis . to describe stationary time series . These models represent time series that are generated by passing the white noise through a recursive linear filter . The output of such a filter at the moment is a...

View Full Description

Association Rules

Association Rules: Association rules is a method of data mining . The idea is to find a statistical association between some items in a large set of items, e.g. items purchased in a supermarket by a customer in one visit. In contrast to deterministic (non-statistical) rules, that are formulated as...

View Full Description

Average Group Linkage

Average Group Linkage: The average group linkage is a method of calculating distance between clusters in hierarchical cluster analysis . The linkage function specifying the distance between two clusters is computed as the distance between the average values (the mean vectors or centroids ) of the two clusters. Browse Other...

View Full Description

Average Linkage Clustering

Average Linkage Clustering: The average linkage clustering is a method of calculating distance between clusters in hierarchical cluster analysis . The linkage function specifying the distance between two clusters is computed as the average distance between objects from the first cluster and objects from the second cluster. The averaging is...

View Full Description

Bernoulli Distribution (Graphical)

Bernoulli Distribution: A random variable x has a Bernoulli distribution with parameter 0 < p < 1 if where P(A) is the probability of outcome A. The parameter p is often called the "probability of success". For example, a single toss of a coin has a Bernoulli distribution with p=0.5...

View Full Description

Beta Distribution (Graphical)

Beta Distribution: Suppose x1, x2, ... , xn are n independent values of a random variable uniformly distributed within the interval [0,1]. If you sort the values in ascending order, then the k-th value will have a beta distribution with parameters , . The density of beta distribution is given...

View Full Description

Bias

Bias: A general statistical term meaning a systematic (not random) deviation of an estimate from the true value. A bias of a measurement or a sampling procedure may pose a more serious problem for a researcher than random errors because it cannot be reduced by simply increasing the sample size....

View Full Description

Bonferroni Adjustment (Graphical)

Bonferroni Adjustment: Bonferroni adjustment is used in multiple comparison procedures to calculate an adjusted probability of comparison-wise type I error from the desired probability of family-wise type I error. The calculation guarantees that the use of the adjusted in pairwise comparisons keeps the actual probability of family-wise type I errors...

View Full Description

Calibration Sample

Calibration Sample: The calibration sample is the subset of the data available to a data mining routine used as the training set . Browse Other Glossary Entries

View Full Description

Classification and Regression Trees (CART)

Classification and Regression Trees (CART): Classification and regression trees (CART) are a set of techniques for classification and prediction. The technique is aimed at producing rules that predict the value of an outcome (target) variable from known values of predictor (explanatory) variables. The predictor variables may be a mixture of...

View Full Description

White Hat Bias

White Hat Bias is bias leading to distortion in, or selective presentation of, data that is considered by investigators or reviewers to be acceptable because it is in the service of righteous goals. The term was coined by Cope and Allison in 2009, and is exemplified by the view of...

View Full Description

Natural Language

Natural Language: A natural language is what most people outside the field of computer science think of as just a language (Spanish, English, etc.). The term "natural" simply signifies that the reference is not to a programming language (C++, Java, etc.). The context is usually "natural language processing (NLP)" or...

View Full Description

Tokenization

Tokenization: In processing unstructured text, tokenization is the step by which the character string in a text segment is turned into units - tokens - for further analysis. Ideally, those tokens would be words, but numbers and other characters can also count as tokens. A big challenge in tokenization is...

View Full Description

Z score (Graphical)

Z score: An observation´s z-score tells you the number of standard deviations it lies away from the population mean (and in which direction). The calculation is as follows: where x is the observation itself, is the mean of the distribution, is the standard deviation of the distribution Browse Other Glossary...

View Full Description

Weighted Mean (Calculation)

Statistical Glossary Weighted Mean (Calculation): To simplify calculation of the weighted mean , weights are often standardized to make their sum equal to the unit value, i.e. by dividing every weight by the total sum of all weights: Then, the weighted mean is computed using weights , standardized according to...

View Full Description

Weighted Mean

Statistical Glossary Weighted Mean: The weighted mean is a measure of central tendency . The weighted mean of a set of values is computed according to the following formula: where are non-negative coefficients, called "weights", that are ascribed to the corresponding values . Only the relative values of the weights...

View Full Description

White Noise

White Noise: The white noise is a stationary time series or a stationary random process with zero autocorrelation. In other words, in white noise any pair of values and taken at different moments and of time are not correlated - i.e. the correlation coefficient is equal to null. The white...

View Full Description

Ward´s Linkage

Ward´s Linkage: Ward´s linkage is a method for hierarchical cluster analysis . The idea has much in common with analysis of variance (ANOVA). The linkage function specifying the distance between two clusters is computed as the increase in the "error sum of squares" (ESS) after fusing two clusters into a...

View Full Description

Variate

Variate: The term "variate" is often used as synonym for "variable". Some definitions require that variate values be numeric. Sometimes "variate" is used as a synonym for "a value of the given variable for particular element of the sample " - e.g. sex is a variable, its value, say, male...

View Full Description

Variable-Selection Procedures (Graphical)

Variable-Selection Procedures: In regression analysis, variable-selection procedures are aimed at selecting a reduced set of the independent variables - the ones providing the best fit to the model. The criterion for selecting is usually the following F-statistic: where n is the total number of data points, SSE is the sum...

View Full Description