Input variable
see dependent and independent variables Browse Other Glossary Entries
View Full Descriptionsee dependent and independent variables Browse Other Glossary Entries
View Full DescriptionOutput variable Browse Other Glossary Entries
View Full DescriptionSee dependent and independent variables Browse Other Glossary Entries
View Full Descriptionsee dependent and independent variables Browse Other Glossary Entries
View Full Descriptionsee dependent and independent variables Browse Other Glossary Entries
View Full DescriptionDendrogram: Statistical distance is a measure calculated between two records that are typically part of a larger dataset, where rows are records and columns are variables. To calculate Euclidean distance, one possible distance metric, the steps are: 1. [Typically done, but not always] Convert all the values in each...
View Full DescriptionDecision Trees: In the machine learning community, a decision tree is a branching set of rules used to classify a record, or predict a continuous value for a record. For example, one path in a tree modeling customer churn (abandonment of subscription) might look like this: IF payment is month-to-month,...
View Full DescriptionFeature Selection: In predictive modeling, feature selection, also called variable selection, is the process (usually automated) of sorting through variables to retain variables that are likely to be informative in prediction, and discard or combine those that are redundant. “Features” is a term used by the machine learning community, sometimes...
View Full DescriptionBagging: In predictive modeling, bagging is an ensemble method that uses bootstrap replicates of the original training data to fit predictive models. For each record, the predictions from all available models are then averaged for the final prediction. For a classification problem, a majority vote of the models is used....
View Full DescriptionDecile Lift: In predictive modeling, the goal is to make predictions about outcomes on a case-by-case basis: an insurance claim will be fraudulent or not, a tax return will be correct or in error, a subscriber will terminate a subscription or not, a customer will purchase $X, etc. Lift is...
View Full Descriptionboosting: In predictive modeling, boosting is an iterative ensemble method that starts out by applying a classification algorithm and generating classifications. The classifications are then assessed, and a second round of model-fitting occurs in which the records classified incorrectly in the first round are given a higher weight in the...
View Full DescriptionIn predictive modeling, ensemble methods refer to the practice of taking multiple models and averaging their predictions. In the case of classification models, the average can be that of a probability score attached to the classification. Models can differ with respect to algorithms used (e.g. neural net, logistic regression), settings...
View Full DescriptionA Priori Probability: A priori probability is the probability estimate prior to receiving new information. See also Bayes Theorem and posterior probability. Browse Other Glossary Entries
View Full DescriptionBayes´ Theorem: Bayes theorem is a formula for revising a priori probabilities after receiving new information. The revised probabilities are called posterior probabilities. For example, consider the probability that you will develop a specific cancer in the next year. An estimate of this probability based on general population data would...
View Full DescriptionBootstrapping: Bootstrapping is sampling with replacement from observed data to estimate the variability in a statistic of interest. See also permutation tests, a related form of resampling. A common application of the bootstrap is to assess the accuracy of an estimate based on a sample of data from a larger...
View Full DescriptionCategorical Data Analysis: Categorical data analysis is a branch of statistics dealing with categorical data . This sort of analysis is of great practical importance because a wide variety of data are of a categorical nature. The most common type of data analyzed in categorical data analysis are contingency table...
View Full DescriptionCollinearity: In regression analysis , collinearity of two variables means that strong correlation exists between them, making it difficult or impossible to estimate their individual regression coefficients reliably. The extreme case of collinearity, where the variables are perfectly correlated, is called singularity . See also: Multicollinearity Browse Other Glossary Entries
View Full DescriptionComplete Statistic: A sufficient statistic T is called a complete statistic if no function of it has zero expected value for all distributions concerned unless this function itself is zero for all possible distributions concerned (except possibly a set of measure zero). The property of completeness of a statistic guarantees...
View Full DescriptionContingency Table: A contingency table is a tabular representation of categorical data . A contingency table usually shows frequencies for particular combinations of values of two discrete random variable s X and Y. Each cell in the table represents a mutually exclusive combination of X-Y values. For example, consider a...
View Full DescriptionContinuous Random Variable: A continuous random variable is any random variable which takes on values on a continuous scale. Browse Other Glossary Entries
View Full Description