Multi-dimensional Statistical Classification (MESTEV)
Fraunhofer ITWM
DASMOD
MESTEV is one of the projects making up the Cluster of Excellence Dependable Adaptive Systems and Mathematical Modeling (DASMOD). Participants within the Fraunhofer ITWM are the departments System Analysis, Prognosis and Control, Financial Mathematics and Models and Algorithms in Image Processing. The overall-goal of MESTEV is to deepen the existing knowledge in the area of classification methods, partially gained in co operations with enterprises of the private sector. A description of the MESTEV project can be found on the official DASMOD webpage .
In the working group Decision Support in the Medical Science and in Technology classification methods like for example discriminant analysis or cluster analysis are important tools. Consequently the contribution of the department System Analysis, Prognosis and Control lies in that fields:
Discriminant Analysis
Discriminant analysis in the simplest case deals with the following problem: the starting point is a finite set of data grouped into two groups such that every element of the set belongs to exactly one of the groups. Such an element consists of the values of a certain set of data attributes. The problem lies in determining a classification rule that allows to assign a sample to one of the groups based on the values of its attributes. Such a rule should be optimal in the sense that the assignment of elements to groups leads to sufficiently high hit rates in both groups. Furthermore samples not contained in the given data set should be classified correctly with a high probability.
In the case that the attribute values all are real numbers, the samples can be considered as points in some Euclidean space. The classification rule sought-after frequently can then be formulated utilizing a so-called discriminant function d(x). It assigns to each vector x of attribute values a real number d(x) that allows to classify x using the following scheme:
- The sample is assigned to the first group if d(x)<0 holds.
- The sample is assigned to the second group if d(x)>0 holds.
- If d(x)=0, then no definite assignment is possible.
The function d(x) itself is chosen from within a parameterised family of functions, such that a rate function typically derived from the hit rates in the two groups becomes optimal.
Kernel methods in Discriminant Analysis
In the year 1999 a flexible method for determining non-linear discriminant functions, based on the theory of reproducing kernels in Hilbert spaces was published. Roughly speaking this method consists of first transforming the data using a non-linear map and consequently computing a linear discriminant function following the well-known Fisher method. The non-linear transformation itself is determined in an elegant way based on a so-called kernel function K(x,y) defined for vectors x and y from the Euclidean space considered. Altogether this procedure leads to discriminant functions of the form
The subsequent graphic displays a grouping of a two-dimensional set of data that can be separated only poorly using straight lines - the two groups are symbolized by blue and red dots. One can also see the level sets of a discriminant function estimated from the data utilizing a polynomial kernel of degree three. The line given by the equation d(x)=0 and separating the two groups is plotted in black colour.
The distribution of the function values d(x) within the two groups is visualised in the subsequent graphic:
In the context of MESTEV properties of Kernel-Fisher-Discriminant Analysis in comparison to other methods are studied. A particular focus herein lies on the practical relevance, which is also the motivation for investigating a second method of discriminant analysis:
ROC-based Discriminant Analysis
The goodness of a discriminant function mostly is measured based on a function derived from the group-wise hit rates. However it is frequently not advisable to use those hit rates directly. Besides the resulting risk of over fitting, a practical reason appearing for example in the medical science is that depending on the present situation one would like to change parameters of the discriminant function in order to modify the hit rate in one group at the expense of the hit rate in the other one. To this end it is reasonable to find a parameterised family of discriminant functions with the property that all of its members possess a sufficient goodness but having an ample variation in the group-wise hit rates at the same time.
To establish a family of discriminant functions possessing the properties just described the so-called Receiver-Operating-Characteristic (ROC) can be employed. In the simplest case of a family {d(x;p)} of functions depending on a real parameter p, the ROC is acquired as follows: one first marks one of the two groups as being distinguished; of course the marking is context-dependent. So if the group 1 for example is the distinguished one, then to each parameter value p one assigns the number FP(p) of samples in group 2 falsely classified by the discriminant function d(x;p) and the number TP(p) of samples in group 1 correctly classified by d(x;p). The plane curve obtained through the function p->(FP(p),TP(p)) is called ROC-curve of the family {d(x;p)} considered.
As a measure of goodness of a family {d(x;p)} taking care of the properties discussed one can use the area under the associated ROC-curve - usually called AUC. To find such a suitable family from within a given number of candidates one thus has to optimise with respect to AUC - a task that seems not be solved in a satisfactory way until now.
The subsequent graphic shows a set of two-dimensional data with a grouping that can be separated quite well using affine discriminant functions of the form d(x;n,p):=<x,n>+p - through straight lines thus, geometrically speaking. </x,n>
If one uses Fisher's method to estimate the parameters n and p of a discriminant function d(x;n,p)=<x,n>+p from the data, then the normal vector n is chosen such that the ratio between the between-groups-variance and the sum of the inner-group-variances is maximal. The value of this so-called Fisher discriminant depending on the location of the normal n on the unit circle is displayed as the red-coloured curve in the subsequent graphic. The ordinate of the coordinate system in that graphic shows the angle between the normal n and the x-axis. </x,n>
For fixed normal n one can compute the ROC-curve of the family {d(x;p)=<x,n>+p} as well as the area under that curve. The AUC-values depending on n thus obtained are displayed in the foregoing graphic as a blue-coloured curve. One can see that the maxima of the Fisher discriminant and the AUC-values coincide at approximately 135 degrees. The AUC-curve is rather flat for angles between approximately 50 to 160 degrees and takes values above 0.85. This means that given a normal n in that region one can always find a parameter p such that the associated discriminant function d(x;p)=<x,n>+p possesses sufficiently high hit rates in both of the groups. Thus through an analysis of the AUC-curve one gets hints for the definition of a suitable affine discriminant function that are of practical relevance. </x,n></x,n>
Further Information
- Type of Project: Cluster of Excellence in Rhineland-Palatinate
- Project Partners: Departments System Analysis, Prognosis and Control, Financial Mathematics and Models and Algorithms in Image Processing (Fraunhofer Institute for Industrial Mathematics)
- Duration: October 2005 - December 2007



