Action Recognition

Content Based Video Analysis

Human actions can be considered as one of the most important parts of movies and videos. Recognizing these actions correctly is vital for content based video analysis. Typical actions observed in movies are for example "sitting down", "shaking hands", "driving", "kissing", and many others. Learning realistic actions from videos in unconstrained settings is a challenging task in visual recognition.

 Illustration of our poselet motion features: We detect poselets in the foreground and describe their motions over time. Thus, we encode pose information as well as the movement of specific body parts into our feature representation.
© Fraunhofer ITWM
Illustration of our poselet motion features: We detect poselets in the foreground and describe their motions over time. Thus, we encode pose information as well as the movement of specific body parts into our feature representation.

Video Analysis & Action Recognition

Characteristics of Modeling People and Context

For action recognition, the actor(s) and the tools they use as well as their motion are of central importance. We separate foreground items of an action from the background on the basis of motion cues. As a consequence, separate descriptors can be defined for the foreground regions, while combined foreground-background descriptors still capture the context of an action. Poselet activations in the foreground area indicate the actor and its pose. We track these poselets over time to obtain detailed motion features of the actor.

Soft saliency maps generated from our foreground segmentation on some sample videos from the Hollywood2 benchmark. Red colored areas reflect foreground regions. The main actors are well covered by our saliency maps and are clearly separated from the scene background.

Detected people in videos (Hollywood2 benchmark). Red areas show the people in the foreground. This allows the video scene to be split into foreground and background areas.
© Fraunhofer ITWM
Detected people in videos (Hollywood2 benchmark). Red areas show the people in the foreground. This allows the video scene to be split into foreground and background areas.