Fraunhofer Parallel File System (FhGFS)
Fraunhofer ITWM
Scalable Storage with the Fraunhofer Parallel File System
FraunhoferFS (FhGFS) is the high-performance parallel file system from the Fraunhofer Competence Center for High Performance Computing. Its distributed metadata architecture has been designed to provide the scalability and flexibility that is required to run today's most demanding HPC applications.
With the constantly increasing performance of modern processors and network technologies, which enable integration with increasingly larger computer clusters, the demand for increasingly realistic and detailed simulation results is also growing. Such simulations also require work with large data sets which now often lie in the range of several hundred gigabytes or even in the terabyte range. In the process, however, it is problematic that the performance of hard discs lies significantly below that of the remaining system components, so that the run-time of a compute job is often primarily determined by the speed of the hard disc access.
In order to counteract this, the CC HPC has been working on the parallel file system FhGFS for several years now. With this file system, the individual files are distributed accross multiple servers chunk by chunk and, in doing so, can be read or written in parallel. This method enables the processing of data sets at many times the conventional speed and thereby has an immediate, positive effect on the length of time until reaching the calculation result. Along with a very good scalability of the system, the developer team placed major importance on uncomplicated use through the preparation of graphic management tools and a high degree of flexibility in the installation. In this way, FhGFS makes it possible to use separate servers as a common parallel storage in a cluster as well as to connect the hard discs of the cluster compute nodes themselves in this manner. In addition, the distribution pattern of the data can be flexibly adapted to the requirements of users, such as geographically separate data centers, in order to further reduce the access time to the data.
In recent years, cooperation with industry partners like SGI already showed that FhGFS can deliver a significantly better throughput rate for typical workloads than comparable commercial solutions. Therefore, the file system is also used to power the storage of of the Fraunhofer Cell Cluster (until November 2008 in 1st place of the worldwide Green Top 500 List), where it enables a data throughput of several gigabytes per second. This year the system was presented live at workshops and trade fairs and sparked great interest in the HPC community.
Currently, the file system is already in use on diverse clusters with a size of several hundred compute nodes. Early next year, the work on a high-availability mode should be completed and support for Windows will follow. This will make the file system also attractive to users outside of the HPC area, for example as a fail-safe project storage or for home directories. FhGFS can be downloaded free of charge at http://www.fhgfs.com. Commercial support is also available.

