A team of researchers from South Korea has developed an innovative system that enhances the ability to search through video content. As described in the International Journal of Computational Vision and Robotics, this new approach extracts spoken words from video recordings, converts them to text, and makes that text searchable. This method bypasses the need for embedded keywords, curated tags, or hashtags.

How It Works
The system relies on the dialogue or spoken commentary in videos, associating it with specific scenes that users may wish to search. While this feature might be redundant for videos with embedded subtitles, it is particularly beneficial for the vast amount of video content available online, aiding in cataloging and efficient searching.

Researchers Kitae Hwang, In Hwan Jung, and Jae Moon Lee from the School of Computer Engineering at Hansung University in Seoul have created an Android app that demonstrates this technology. However, due to name conflicts with existing apps, a renaming might be necessary if the app is released on the Google Play Store.

Key Features
The app employs the FFmpeg code to extract audio from videos, converting it into text in 10-second increments. This creates a searchable timeline for the video. Advanced speech recognition technology transcribes these audio segments, which are then indexed along the video timeline.

The process is impressively efficient: for a 20-minute video, the transcription and indexing are completed in just two to three minutes, running in the background as the video plays. Users can search for specific terms and find all instances within the video.

Applications and Benefits
This technology has significant potential applications in various fields:

Education: Students can quickly find specific information in lecture recordings.
News Analysis: Journalists can locate specific statements in interviews or news footage.
Content Cataloging: Libraries and streaming services can better manage and search their video archives.
This system could revolutionize how users interact with video content, providing quick access to specific information and enhancing the utility of video databases across the internet.

More: https://techxplore.com/news/2024-05-spoken-language-video-searchable-text.html