Statistical evaluation of large data sets
Fraunhofer ITWM
Large-scale measurement campaigns and warranty databases can yield a considerable amount of information about vehicle usage that is not structured according to the requirements of experimental design. Each individual record may be classified according to region, mission type, or technical characteristics of the vehicle, but the overall distribution of the data among possible states is highly irregular. Thus, an evaluation using standard statistical tools is often difficult or impossible.
ITWM develops methods that make it easier to draw conclusions from unbalanced data. For example, we might ask the question whether nominally distinct operating conditions do actually differ significantly with regards to service loads or fuel consumption. An important tool for this purpose are 'Bayesian factor models', an extension of multivariate analysis of variance. They enable multiple comparisons for data sets with an arbitrary structure. Of course, detecting a particular effect still requires a certain minimum amount of data, but the necessary requirements for being able to perform the analysis at all are considerably reduced.