Video Detection and Retrieval

Abstract

We have developed a solution for the detection of video scenes from different video clips in TV shows. Our project partners are a major German TV company (Bavarian Broadcasting) and AVID, which is a large audio and video production company.

Project Description

TV companies and broadcasters have a very large number of video clips, which are used for the production of TV shows. During the editing process the videos are often altered. Many scenes are shortened and the scene order is rearranged in some cases. Also, the video frames can be modified in various ways.
Typical examples are the insertion of TV logos, subtitles and patterns. Other modifications include a change of resolution, aspect ratio, sharpness, color and contrast changes. Sometimes, a video is also processed by a series of complex transformations. For example, if a video is projected onto a screen in a virtual TV studio using chroma keying technology. Such effects are frequently used in the production of news shows. These modifications make it very difficult to automatically detect and match the scenes from the original video clips and their modified versions in the TV shows.

© Fraunhofer ITWM
Detection of scenes from individual video clips in a TV broadcast.

Metadata

Most of the video clips are linked to metadata that allows for a mapping between the video clips and the tv shows using simple database queries. However, such a mapping does not include information about which scenes have been actually used and where they are located in the TV shows. This information is important for an accurate tracking of the editing and production process.

To solve this problem, we developed a reliable and accurate solution that is able to detect the video scenes in the TV shows, even if the original videos are significantly modified. To achieve this goal, we cooperated closely with our project partners. Our software is actively used by a major German TV company (Bavarian Broadcasting).

Detection Results

Our solution shows great robustness against

pixelated videos
low quality encoding
contrast and color changes
affine and perspective transformations (camcording)
geometrical deformations
pattern insertions (TV logos and subtitles)

The following figure shows some detection results. Despite the sometimes very strong changes, our procedure is able to detect the corresponding video scenes and assign the individual video frames correctly. Some more examples can be found in the animation at the beginning of this page.

Technical Background

In general, videos consist of hundreds of thousands of single frames. Processing and matching each frame would require a significant amount of computing resources. This is why we reduce the number of video frames by only considering a small amount of representative keyframes. To select the keyframes, we analyze motion information and look for salient motion patterns. Only 0.5%-1.5% of the overall frames in a video are typically selected as keyframes.

Compact Index Structure Describes Keyframes

To process the found keyframes efficiently, we compute a small and compact index representation for each keyframe. This is done by projecting high dimensional feature vectors (VLAD or Fisher vectors) to vectors of much lower dimension. The method we use is context adaptive and considers only the most significant information. This approach allows for a fast and efficient matching of a large number keyframes. We also use query expansion and reranking methods to further improve the matching results. Temporally consistent matches are grouped together by clustering.

Image Matching Verifies Frames

Eventually, each match is verified using a state of the art image matching method. This is a very important step, because it removes wrong matches and increases precision. We use an image matching method that achieves much better results than many standard approaches found in various computer vision libraries, such as OpenCV. An example can be seen in the following figure: on the left side, the results of the image matching used by us is shown while on the right side we show the results of a frequently used standard method. Correctly found local correspondences between image structures are visualized by orange colored lines.

Our software supports Windows and Linux based operating systems.

Partner

Bayerischer Rundfunk (Bavarian Broadcasting, BR)
AVID

© Fraunhofer ITWM
On the left is the result of our procedure, on the right a frequently used standard procedure. When comparing the images, local matches are visualized with an orange line.