SEARCHABLE MULTIMEDIA STREAM
First Claim
1. A method in a streaming and archiving system for post-processing a multimedia stream converted from a conventional conference format coded data stream for the purpose of making the multimedia stream searchable, characterized inmonitoring in a H.323/SIP compatible conversion engine whether a H.323 or SIP coded data stream is received, and if so,converting the conventional conference format coded data stream to a multimedia stream in a defined multimedia streaming format including timing information related to respective fragments of the multimedia stream,analyzing fragments of sound from an audio part of said multimedia stream in a speech recognition engine bygenerating a model of each respective fragment of sound or sequences of fragments of sound,comparing the respective model of each respective fragment of sound or sequences of fragments of sound with reference models of pronunciations of known words or phonemes stored in a database,assigning a timing information referring to a fragment or a sequence of fragments whose model said speech recognition engine has found to match a reference model of a pronunciation of a known word in said database, and associatively storing the said timing information and said word in a text file.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a system and a method making an archived conference or presentation searchable after being stored in the archive server. According to the invention, one or more media streams coded according to H.323 or SIP are transmitted to a conversion engine for converting multimedia content into a standard streaming format, which may be a cluster of files, each representing a certain medium (audio, video, data) and/or a structure file that synchronizes and associates the different media together. When the conversion is carried out, the structure file is copied and forwarded to a post-processing server. The post-processing server includes i.a. a speech recognition engine generating a text file of alphanumeric characters representing all recognized words in the audio file. The text file is then entered into the cluster of files associating each identified word to a timing tag in the structure file. After this post-processing, finding key words and associated points of time in the media stream could easily be executed by a conventional search engine.
-
Citations
8 Claims
-
1. A method in a streaming and archiving system for post-processing a multimedia stream converted from a conventional conference format coded data stream for the purpose of making the multimedia stream searchable, characterized in
monitoring in a H.323/SIP compatible conversion engine whether a H.323 or SIP coded data stream is received, and if so, converting the conventional conference format coded data stream to a multimedia stream in a defined multimedia streaming format including timing information related to respective fragments of the multimedia stream, analyzing fragments of sound from an audio part of said multimedia stream in a speech recognition engine by generating a model of each respective fragment of sound or sequences of fragments of sound, comparing the respective model of each respective fragment of sound or sequences of fragments of sound with reference models of pronunciations of known words or phonemes stored in a database, assigning a timing information referring to a fragment or a sequence of fragments whose model said speech recognition engine has found to match a reference model of a pronunciation of a known word in said database, and associatively storing the said timing information and said word in a text file.
-
8. A system for post-processing a multimedia stream converted from a conventional conference format coded data stream for the purpose of making the multimedia stream searchable,
characterized in a converting engine configured to receive a H.323 or SIP coded data stream and converting the conventional conference format coded data stream to a multimedia stream in a defined multimedia streaming format including timing information related to respective fragments of the multimedia stream, a post-processing server configured to receive said multimedia stream or a copy of said multimedia stream, a speech recognition engine included in or connected to said post-processing server configured to analyze fragments of sound from an audio part of said multimedia stream and compare a model of each respective fragment of sound or sequences of fragments of sound with models of pronunciations of known words or phonemes stored in a database, a time assigning means configured to associate a timing information referring to a fragment or a sequence of fragments whose model said speech recognition engine has found to match a reference model of a pronunciation of a known word in said database, and associatively storing the said timing information and said word in a text file.
Specification