Analytics Knowledge Sharing Forum: November 2010

Sunday, November 21, 2010

A Powerful Classification Technique in Data Mining - Discriminant Analysis(part – III)

Terminologies:

• F values : Is the ratio of the between sum of squares to the within sum of squares of variable.

• Wilks’ Lambda: Is the ratio of the within sum of squares to the total sum of squares for the entire set of variables in analysis. Wilks’ Lambda varies between 0 to 1. Also called U statistics.

• Classification matrix: Is a matrix that contains the number of correctly classified and misclassified cases.

• Hit Ratio: Percentage of cases correctly classified by the discriminant function.

The DISCRIM Procedure

• PROC DISCRIM can be used for many different types of analysis including

• canonical discriminant analysis

• assessing and confirming the usefulness of the functions (empirical validation and crossvalidation)

• predicting group membership on new data using the functions (scoring)

• linear and quadratic discriminant analysis

• nonparametric discriminant analysis

Discriminant Function

Linear discriminant analysis constructs one or more discriminant equations (linear combinations of the predictor variables Xk) such that the different groups differ as much as possible on Z.

Where,

Z = Discriminant score, a number used to predict group membership of a case

a = Discriminant constant

Wk = Discriminant weight or coefficient, a measure of the extent to which variable Xk discriminates among the groups of the DV

Xk = An Independent Variable or Predictor variable. Can be metric or non-metric.

Number of discriminant functions = min (number of groups – 1, k).

k = Number of predictor variables.

Discriminant Function : Interpretation

• The weights are chosen so that one will be able to compute a discriminant score for each subject and then do an ANOVA on Z.
• More precisely, the weights of the discriminant function are calculated in such a way, that the ratio (between groups SS)/(within groups SS) is as large as possible.
• The value of this ratio is the eigenvalue
• First discriminant function Z1 distinguishes first group from groups 2,3,..N.
• Second discriminant function Z2 distinguishes second group from groups 3, 4…,N. etc
Note : Discriminant analysis uses OLS to estimate the values of the parameters (a) and Wk that minimize the Within Group SS.

Partitioning Sums of Squares in Discriminant Analysis

In Linear Regression:

• Total sums of squares are partitioned into Regression sums of squares and Residual sums of squares.

• And Goal is to estimate parameters that minimize the Residual SS.

In Discriminant Analysis:

• The Total sums of squares is partitioned into Between Group sums of squares and Within Groups sums of squares

Where,
i = an individual case,

j = group j

Zi = individual discriminant score

Z = grand mean of the discriminant scores

Zj = mean discriminant score for group j

Here, Goal is to estimate parameters that minimize the Within Group Sums of Squares

A Powerful Classification Technique in Data Mining - Discriminant Analysis(part – II)

Discriminant Analysis attempts to find a rule that separates clusters to the maximum possible extent.

Discriminant Analysis - Assumptions

The underlying assumptions of Discriminant Analysis (DA) are:

– Each group is normally distributed, Discriminant Analysis is relatively robust to departures from normality.

– The groups defined by the dependent variable exist a priori.

– The Predictor variable, Xk are multivariate normally distributed, independent, and non-collinear

– The variance/covariance matrix of the predictor variable across the various groups are the same in the population, (i.e. Homogeneous)

– The relationship is linear in its parameters

– Absence of leverage point outliers

– The sample is large enough: Unequal sample sizes are acceptable. The sample size of the smallest group needs to exceed the number of predictor variables. As a “rule of thumb”, the smallest sample size should be at least 20 for a few (4 or 5) predictors. The maximum number of independent variables is n - 2, where n is the sample size. While this low sample size may work, it is not encouraged, and generally it is best to have 4 or 5 times as many observations and independent variables

– Errors are randomly distributed

Drawback of Discriminant Analysis

– An important drawback of discriminant analysis is its dependence on a relatively equal distribution of group membership. If one group within the population is substantially larger than the other group, as is often the case in real life, Discriminant analysis might classify all observations in only one group. An equal good-bad sample should be chosen for building the discriminant analysis model.

– Another significant restriction of discriminant analysis is that it can’t handle categorical independent variables.

– Discriminant analysis is more rigid than logistic regression in its assumptions. In contrast to ordinary linear regression, discriminant analysis does not have unique coefficients. Each of the coefficients depends on the other coefficients in the estimation and therefore there is no way of determining the absolute value of any coefficient.

Discriminant Analysis Vs Logistic Regression

– Similarity: Both techniques examine an entire set of interdependent relationships

Discriminant Analysis Vs ANOVA

– Similarity: Both techniques examine an entire set of interdependent relationships

– Difference: In Discriminant analysis, Independent variables are metric where as in ANOVA it is categorical.

Reference:

http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discrim.pdf
www.shsu.edu/~icc_cmf/cj_742/stats7.doc

Saturday, November 6, 2010

A Powerful Classification Technique in Data Mining - Discriminant Analysis(part – I)

Classification is a data mining technique used to predict group membership for data instances. In predictive customer analytics, classification techniques are deployed frequently and are true across most applications including acquisition, cross-sell, attrition, credit scoring, collections and classifying first time buyer etc. The objective of any classification model is to classify customers in two or more groups based on a predicted outcome associated with each customer e.g. responder or non-responder, defaulter or non-defaulter, churner or non-churner, valuable customers or non valuable customers etc. Businesses are interested in predicting likelihood of each customer behaving in a particular fashion, and classification techniques provide them with predictive models for the same.

Various parametric and non-parametric methods are used to solve classification related problems. Traditional statistical methods are parametric in nature based on the assumptions about the nature of the distributions and estimate the parameters of the distributions to solve the problem. Non-parametric methods, on the other hand, make no assumptions about the specific distributions involved, and are therefore distribution-free.

Discriminant analysis is a technique for classifying a set of observations into two or more predefined classes. The purpose is to determine the class of an observation based on a set of variables known as predictors or input variables (analogous to independent variables in regression). The model is built based on a set of observations for which the classes are known. This set of observations is sometimes referred to as the training set. Based on the training set, the technique constructs a set of linear functions of the predictors, known as discriminant functions, such that

L = b1x1 + b2x2 + …… + bnxn + c , where the b's are discriminant coefficients, the x's are the input variables or predictors and c is a constant.

These discriminant functions are used to predict the class of a new observation with unknown class. For a k class problem k discriminant functions are constructed. Given a new observation, all the k discriminant functions are evaluated and the observation is assigned to class i if the ith discriminant function has the highest value.

Discriminant Analyis (DA), a multivariate statistical technique is commonly used to build a predictive / descriptive model of group discrimination based on observed predictor variables and to classify each observation into one of the groups. In DA multiple quantitative attributes are used to discriminate single classification variable. DA is different from the cluster analysis because prior knowledge of the classes, usually in the form of a sample from each class is required.

The common objectives of DA are

i. To investigate differences between groups

ii. To discriminate groups effectively;

iii. To identify important discriminating variables;

iv. To perform hypothesis testing on the differences between the expected groupings

v. To classify new customers into pre-existing groups.

Commonly used DA techniques available in the SAS systems are :

DISCRIM: Computes various discriminant functions for classifying observations. Linear or quadratic discriminant functions can be used for data with approximately multivariate normal within-class distributions. Nonparametric methods can be used without making any assumptions about these distributions.

CANDISC: Performs a canonical analysis to find linear combinations of the quantitative variables that best summarize the differences among the classes.

STEPDISC: It uses forward selection, backward elimination, or stepwise selection to try to find a subset of quantitative variables that best reveals differences among the classes.

Reference:
http://www2.sas.com/proceedings/sugi27/p247-27.pdf

Sunday, November 21, 2010

A Powerful Classification Technique in Data Mining - Discriminant Analysis(part – III)

Thursday, November 11, 2010

A Powerful Classification Technique in Data Mining - Discriminant Analysis(part – II)

Saturday, November 6, 2010

A Powerful Classification Technique in Data Mining - Discriminant Analysis(part – I)