Exponential Filter: The exponential filter is the simplest linear recursive filter . Exponential filters are widely used in time series analysis , especially for forecasting time series (see the short course Time Series Forecasting ). The exponential filter is described by the following expression: where is the output of the...
View Full Description
Error: Error is a general concept related to deviation of the estimated quantity from its true value: the greater the deviation, the greater the error. Errors are categorised according to their probabilistic nature into systematic errors and random errors , and, according to their relation to the true value, into...
View Full Description
Endogenous Variable: Endogenous variables in causal modeling are the variables with causal links (arrows) leading to them from other variables in the model. In other words, endogenous variables have explicit causes within the model. The concept of endogenous variable is fundamental in path analysis and structural equation modeling . The...
View Full Description
Data Partition: Data partitioning in data mining is the division of the whole data available into two or three non-overlapping sets: the training set , the validation set , and the test set . If the data set is very large, often only a portion of it is selected for...
View Full Description
Econometrics: Econometrics is a discipline concerned with the application of statistics and mathematics to various problems in economics and economic theory. This term literally means "economic measurement". A central task is quantification (measurement) of various qualitative concepts of economic theory - like demand , supply , propensity to spend ,...
View Full Description
Divisive Methods (of Cluster Analysis): In divisive methods of hierarchical cluster analysis , the clusters obtained at the previous step are subdivided into smaller clusters. Such methods start from a single cluster comprising of all N objects, and, after N-1 steps, they end with N clusters comprising a single object....
View Full Description
Divergent Validity: In psychometrics , the divergent validity of a survey instrument, like an IQ-test, indicates that the results obtained by this instrument do not correlate too strongly with measurements of a similar but distinct trait. For example, if a test is supposed to measure suitability of applicants to a...
View Full Description
Dispersion (Measures of): Measures of dispersion express quantitatively the degree of variation or dispersion of values in a population or in a sample . Along with measures of central tendency , measures of dispersion are widely used in practice as descriptive statistics . Some measures of dispersion are the standard...
View Full Description
Discrete Distribution: A discrete distribution describes the probabilistic properties of a random variable that takes on a set of values that are discrete, i.e. separate and distinct from one another - a discrete random variable . Discrete values are separated only by a finite number of units - in flipping...
View Full Description
Design of Experiments: Design of experiments is concerned with optimization of the plan of experimental studies. The goal is to improve the quality of the decision that is made from the outcome of the study on the basis of statistical methods, and to ensure that maximum information is obtained from...
View Full Description
Dichotomous: Dichotomous (outcome or variable) means "having only two possible values", e.g. "yes/no", "male/female", "head/tail", "age > 35 / age <= 35" etc. For example, the outcome of an experiment with coin tossing is dichotomous ("head" or "tail"); the variable "biological sex" in a social study is dichotomous ("male" or...
View Full Description
Dependent and Independent Variables: Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called independent variables. While analysts typically specify variables in a model to reflect their understanding or theory of "what causes what," setting up a model in this...
View Full Description
Data Mining: Data mining is concerned with finding latent patterns in large data bases. The goal is to discover unsuspected relationships that are of practical importance, e.g., in business. A broad range of statistical and machine learning approaches are used in data mining. See, for example, XLMiner online help for...
View Full Description
Dendrogram: The dendrogram is a graphical representation of the results of hierarchical cluster analysis . This is a tree-like plot where each step of hierarchical clustering is represented as a fusion of two branches of the tree into a single one. The branches represent clusters obtained on each step of...
View Full Description
Cover time: Cover time is the expected number of steps in a random walk required to visit all the vertices of a connected graph (a graph in which there is always a path, consisting of one or more edges, between any two vertices). Blom, Holst and Sandell (in Problems and...
View Full Description
Density Functions: A probability density function or curve is a non-negative function ( ) that describes the distribution of a continuous random variable . If is known, then the probability that a value of the variable is within an interval is described by the following integral For very small intervals...
View Full Description
Cross-Validation: Cross-validation is a general computer-intensive approach used in estimating the accuracy of statistical models. The idea of cross-validation is to split the data into N subsets, to put one subset aside, to estimate parameters of the model from the remaining N-1 subsets, and to use the retained subset to...
View Full Description
Clustered Sampling: Clustered sampling is a sampling technique based on dividing the whole population into groups ("clusters"), then using random sampling to select elements from the groups. For example, if the target population is the whole population of a city, a researcher might select 100 households at random and to...
View Full Description
Data: Data are recorded observations made on people, objects, or other things that can be counted, measured, or quantified in some way. In statistics, data are categorized according to several criteria, for example, according to the type of the values used to quantify the observations e.g. categorical data or continuous...
View Full Description
Criterion Validity: The criterion validity of survey instruments, like the tests used in psychometrics , is a measure of agreement between the results obtained by the given survey instrument and more "objective" results for the same population. The "objective" results are obtained either by a well established instrument ("the gold...
View Full Description