SEARCHABLE MULTIMEDIA STREAM

US 20070156843A1
Filed: 11/29/2006
Published: 07/05/2007
Est. Priority Date: 12/30/2005
Status: Active Grant

First Claim

Patent Images

1. A method in a streaming and archiving system for post-processing a multimedia stream converted from a conventional conference format coded data stream for the purpose of making the multimedia stream searchable, characterized inmonitoring in a H.323/SIP compatible conversion engine whether a H.323 or SIP coded data stream is received, and if so,converting the conventional conference format coded data stream to a multimedia stream in a defined multimedia streaming format including timing information related to respective fragments of the multimedia stream,analyzing fragments of sound from an audio part of said multimedia stream in a speech recognition engine bygenerating a model of each respective fragment of sound or sequences of fragments of sound,comparing the respective model of each respective fragment of sound or sequences of fragments of sound with reference models of pronunciations of known words or phonemes stored in a database,assigning a timing information referring to a fragment or a sequence of fragments whose model said speech recognition engine has found to match a reference model of a pronunciation of a known word in said database, and associatively storing the said timing information and said word in a text file.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a system and a method making an archived conference or presentation searchable after being stored in the archive server. According to the invention, one or more media streams coded according to H.323 or SIP are transmitted to a conversion engine for converting multimedia content into a standard streaming format, which may be a cluster of files, each representing a certain medium (audio, video, data) and/or a structure file that synchronizes and associates the different media together. When the conversion is carried out, the structure file is copied and forwarded to a post-processing server. The post-processing server includes i.a. a speech recognition engine generating a text file of alphanumeric characters representing all recognized words in the audio file. The text file is then entered into the cluster of files associating each identified word to a timing tag in the structure file. After this post-processing, finding key words and associated points of time in the media stream could easily be executed by a conventional search engine.

Citations

8 Claims

1. A method in a streaming and archiving system for post-processing a multimedia stream converted from a conventional conference format coded data stream for the purpose of making the multimedia stream searchable, characterized inmonitoring in a H.323/SIP compatible conversion engine whether a H.323 or SIP coded data stream is received, and if so,converting the conventional conference format coded data stream to a multimedia stream in a defined multimedia streaming format including timing information related to respective fragments of the multimedia stream,analyzing fragments of sound from an audio part of said multimedia stream in a speech recognition engine bygenerating a model of each respective fragment of sound or sequences of fragments of sound,comparing the respective model of each respective fragment of sound or sequences of fragments of sound with reference models of pronunciations of known words or phonemes stored in a database,assigning a timing information referring to a fragment or a sequence of fragments whose model said speech recognition engine has found to match a reference model of a pronunciation of a known word in said database, and associatively storing the said timing information and said word in a text file.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. An method according to claim 1, characterized in that the step of analyzing further includes:
    - extracting and temporarily storing information indicating a time position within said multimedia stream of the current fragment of sound,if a match between a model of a current fragment of sound or a sequence of fragments of sound with said current sound included and a reference model of a pronunciations of a known word or phoneme in said database is found, then using said time position as said timing information which associatively is being stored together with said word or an input word or tag in said text file.
  - 3. A method according to claim 1 or 2, characterized instoring, in the streaming and archiving system, said text file when all fragments of sound from said audio part of said multimedia stream are analyzed making said text file accessible for later search in said multimedia stream.
  - 4. A method according to one of the preceding claims, characterized in that said models and reference models include Markov models.
  - 5. A method according to one of the preceding claims, characterized in that said defined multimedia streaming format is an Active Stream Format (ASF).
  - 6. An method according to claim 5, characterized in that said timing information is a time field and/or an offset field of the ASF associated with the start or the end of matched fragment or sequence of fragments.
  - 7. A method according to one of the preceding claims, characterized in that conventional conference format coded data stream is a H.323, H.320 or SIP coded data stream.

8. A system for post-processing a multimedia stream converted from a conventional conference format coded data stream for the purpose of making the multimedia stream searchable,characterized ina converting engine configured to receive a H.323 or SIP coded data stream and converting the conventional conference format coded data stream to a multimedia stream in a defined multimedia streaming format including timing information related to respective fragments of the multimedia stream,a post-processing server configured to receive said multimedia stream or a copy of said multimedia stream,a speech recognition engine included in or connected to said post-processing server configured to analyze fragments of sound from an audio part of said multimedia stream and compare a model of each respective fragment of sound or sequences of fragments of sound with models of pronunciations of known words or phonemes stored in a database,a time assigning means configured to associate a timing information referring to a fragment or a sequence of fragments whose model said speech recognition engine has found to match a reference model of a pronunciation of a known word in said database, and associatively storing the said timing information and said word in a text file.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Original Assignee
Tandberg Telecom AS (Cisco Systems, Inc.)
Inventors
Christensen, Espen, Grodum, Nicolai, Sandbakken, Geir Arne, Lovhaugen, Norma, SAGEN, Hallgrim

Granted Patent

US 8,103,507 B2
Time in Patent Office

Days
Field of Search
US Class Current

709/217
CPC Class Codes

G10L 15/26   Speech to text systems G10L...

H04N 21/234309   by transcoding between form...

H04N 21/278   Content descriptor database...

SEARCHABLE MULTIMEDIA STREAM

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

SEARCHABLE MULTIMEDIA STREAM

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links