Action Recognition

Content Based Video Analysis

Human actions can be considered as one of the most important parts of movies and videos. Recognizing these actions correctly is vital for content based video analysis. Typical actions observed in movies are for example »sitting down«, »shaking hands«, »driving«, »kissing«, and many others. Learning realistic actions from videos in unconstrained settings is a challenging task in visual recognition.

© Fraunhofer ITWM
Vordergrunddetektionen (»Poselets«) beschreiben die Pose einer Person und die Bewegung einzelner Körperteile über die Zeit.

Tabbed contents

Expand all Close all

Video Analysis and Action Recognition

The video presents aspects of content-based video analysis, in particular the automatic recognition and assignment of different actions in visual content.

Hollywood2 Benchmark

Characteristics of Modeling People and Context

For action recognition, the actor(s) and the tools they use as well as their motion are of central importance. We separate foreground items of an action from the background on the basis of motion cues. As a consequence, separate descriptors can be defined for the foreground regions, while combined foreground-background descriptors still capture the context of an action. Poselet activations in the foreground area indicate the actor and its pose. We track these poselets over time to obtain detailed motion features of the actor.

Soft saliency maps generated from our foreground segmentation on some sample videos from the Hollywood2 benchmark. Red colored areas reflect foreground regions. The main actors are well covered by our saliency maps and are clearly separated from the scene background.

Motion Saliency — © Fraunhofer ITWM
Detektierte Personen in Videos (Hollywood2 Benchmark). Rote Bereiche zeigen die Personen im Vordergrund. Dadurch kann die Videoszene eindeutig in Vorder- und Hintergrundbereiche zerlegt werden.