Register today for our Generative AI Foundations course. Use code GenAI99 for a discount price of $99!
Skip to content

Validity

Validity: Validity characterises the extent to which a measurement procedure is capable of measuring what it is supposed to measure. Normally, the term "validity" is used in situations where measurement is indirect, imprecise and cannot be precise in principle, e.g. in psychological IQ tests purporting to measure intellect. (In direct...

View Full Description

Validation Sample

Validation Sample: The validation sample is the subset of the data available to a data mining routine used as the validation set . Browse Other Glossary Entries

View Full Description

Validation Set

Validation Set: A validation set is a portion of a data set used in data mining to assess the performance of prediction or classification models that have been fit on a separate portion of the same data set (the training set ). Typically both the training and validation set are...

View Full Description

Uniform Distribution

Uniform Distribution: The uniform distribution describes probabilistic properties of a continuous random variable that is equally likely to take any value within an interval , and never takes on values outside this interval. The uniform distribution is characterised by two parameters - the lower and the upper boundaries of the...

View Full Description

t-statistic (Graphical)

t-statistic: T-statistic is a statistic whose sampling distribution is a t-distribution. Often, the term "t-statistic" is used in a narrower sense - as the standardized difference between a sample mean and a population mean , where N is the sample size: where and are the mean and the standard deviation...

View Full Description

Differencing (of Time Series)

Differencing (of Time Series): Differencing of a time series in discrete time is the transformation of the series to a new time series where the values are the differences between consecutive values of . This procedure may be applied consecutively more than once, giving rise to the "first differences", "second...

View Full Description

Test-Retest Reliability

Test-Retest Reliability: The test-retest reliability of a survey instrument, like a psychological test, is estimated by performing the same survey with the same respondents at different moments of time. The closer the results, the greater the test-retest reliability of the survey instrument. The correlation coefficient between such two sets of...

View Full Description

Negative Binomial

Negative Binomial: The negative binomial distribution is the probability distribution of the number of Bernoulli (yes/no) trials required to obtain r successes. Contrast it with the binomial distribution - the probability of x successes in n trials. Also with the Poisson distribution - the probability distribution of the number of...

View Full Description

Trimmed Mean

Statistical Glossary Trimmed Mean: The trimmed mean is a family of measures of central tendency . The -trimmed mean of of values is computed by sorting all the values, discarding % of the smallest and % of the largest values, and computing the of the remaining values. For example, to...

View Full Description

Triangular Filter

Statistical Glossary Triangular Filter: The triangular filter is a linear filter that is usually used as a smoother . The output of the rectangular filter at the moment is the weighted mean of the input values at the adjacent moments of discrete time . In contrast to the rectangular filter...

View Full Description

Training Set

Training Set: A training set is a portion of a data set used to fit (train) a model for prediction or classification of values that are known in the training set, but unknown in other (future) data. The training set is used in conjunction with validation and/or test sets that...

View Full Description

Systematic Error

Statistical Glossary Systematic Error: Systematic error is the error that is constant in a series of repetitions of the same experiment or observation. Usually, systematic error is defined as the expected value of the overall error. An example of systematic error is an electronic scale that, if loaded with a...

View Full Description

t-distribution (Graphical)

t-distribution: A continuous distribution, with single peaked probability density symmetrical around the null value and a bell-curve shape. T-distribution is specified completely by one parameter - the number of degrees of freedom. If X and Y are independent random variables, X has the standard normal distribution and Y - chi-square...

View Full Description

Test Set

Test Set: A test set is a portion of a data set used in data mining to assess the likely future performance of a single prediction or classification model that has been selected from among competing models, based on its performance with the validation set. While the validation set provides...

View Full Description

Survey

Survey: Statistical surveys are general methods to gather quantitative information about a particular population. "Population" here does not necessarily mean a set of human beings, but may consist of other type of units - firms, households, universities, hospitals, etc. While there are types and forms of surveys, they have one...

View Full Description

Sufficient Statistic (Graphical)

Sufficient Statistic: Suppose X is a random vector with probability distribution (or density) P(X | V), where V is a vector of parameters, and Xo is a realization of X. A statistic T(X) is called a sufficient statistic if the conditional probability (density) does not depend upon V for any...

View Full Description

Split-Halves Method

Statistical Glossary Split-Halves Method: In psychometric surveys, the split-halves method is used to measure the internal consistency reliability of survey instruments, e.g. psychological tests. The idea is to split the items (questions) related to the same construct to be measured, e.d. the anxiety level, and to compare the results obtained...

View Full Description

Standard error

Standard error: The standard error measures the variability of an estimator (or sample statistic) from sample to sample. There are two approaches to estimating standard error: 1. The bootstrap. With the bootstrap, you take repeated simulated samples (usually resamples from the observed data, of the same size as the original...

View Full Description

Spline

Spline: A spline is a continuous function which coincides with a polynomial on every subinterval of the whole interval on which is defined. In other words, splines are functions which are piecewise polynomial. The coefficients of the polynomial differs from interval to interval, but the order of the polynomial is...

View Full Description