Method and apparatus for voice annotation and retrieval of multimedia data
First Claim
1. A method of voice annotating source digital media data, said method including the steps of:
- speech annotating one or more portions of said source digital media data with a speech annotation that is independent of the source digital media data thereby providing a speech annotated digital media data; and
indexing said speech annotated digital media data by said speech annotation to provide an indexed media content.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, an apparatus, a computer program product and a system for voice annotating and retrieving digital media content are disclosed. An annotation module (420) post annotates digital media data (410), including audio, image and/or video data, with speech. A word lattice (222) can be created from speech annotation (210) dependent upon acoustic and/or linguistic knowledge. An indexing module (430) then indexes the speech-annotated data (422). The word lattice (222) is reverse indexed (230), and content addressing (240) is applied to produce the indexed data (432, 242). A speech query (474) can be generated as input to a retrieval module (480) for retrieving a segment of the indexed digital media data (432). The speech query (474, 310) is converted into a word lattice (322), and a shortlist (344) is produced from it (322) by confidence filtering (330). The shortlist (344) is input to a lattice search engine (350) to search the indexed content (342) to obtain the search result (352).
263 Citations
71 Claims
-
1. A method of voice annotating source digital media data, said method including the steps of:
-
speech annotating one or more portions of said source digital media data with a speech annotation that is independent of the source digital media data thereby providing a speech annotated digital media data; and
indexing said speech annotated digital media data by said speech annotation to provide an indexed media content. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
applying said speech annotation to a lattice engine and further applying at least one of acoustic and linguistic knowledge to said lattice engine to generate a word lattice;
applying said word lattice to a reverse index engine to build a reversed index table for said lattice; and
applying said reversed index table to an addressing engine module to create the indexed media content.
-
-
8. The method according to claim 7, wherein said acoustic knowledge is based on a hidden Markov model.
-
9. The method according to claim 7, wherein said linguistic knowledge is an N-gram statistical linguistic model.
-
10. An apparatus for voice annotating source digital media data, said apparatus including:
-
means for speech annotating one or more portions of said source digital media data with a speech annotation that is independent of the source digital media data to provide a speech annotated digital media data; and
means for indexing said speech annotated digital media data by said speech annotation to provide an indexed media content. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
means for applying said speech annotation to a lattice engine and further applying at least one of acoustic and linguistic knowledge to said lattice engine to generate a word lattice;
means applying said word lattice to a reverse index engine to build a reversed index table for said lattice; and
means for applying said reversed index table to an addressing engine module to create the indexed media content.
-
-
17. The apparatus according to claim 16, wherein said acoustic knowledge is based on a hidden Markov model.
-
18. The apparatus according to claim 16, wherein said linguistic knowledge is an N-gram statistical linguistic model.
-
19. A computer program product having a computer readable medium having a computer program recorded therein for voice annotating source digital media data, said computer program product including:
-
means for speech annotating one or more portions of said source digital media data with a speech annotation that is independent of the source digital media data to provide a speech annotated digital media data; and
means for indexing said speech annotated digital media data by said speech annotation to provide an indexed media content. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
means for applying said speech annotation to a lattice engine and further applying at least one of acoustic and linguistic knowledge to said lattice engine to generate a word lattice;
means applying said word lattice to a reverse index engine to build a reversed index table for said lattice; and
means for applying said reversed index table to an addressing engine module to create the indexed media content.
-
-
26. The computer program product according to claim 25, wherein said acoustic knowledge is based on a hidden Markov model.
-
27. The computer program product according to claim 26, wherein said linguistic knowledge is an N-gram statistical linguistic model.
-
28. A method of voice retrieving digital media data annotated with speech, said method including the steps of:
-
providing an indexed digital media data, said indexed digital media data derived from a word lattice created from a speech annotation of said digital media data, wherein said speech annotation is independent of a source digital media data;
generating a speech query; and
retrieving one or more portions of said indexed digital media data dependent upon said speech query. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. An apparatus for voice retrieving digital media data annotated with speech, said apparatus including:
-
means for providing an indexed digital media data, said indexed digital media data derived from a word lattice created from a speech annotation of said digital media data, wherein said speech annotation is independent of a source digital media data;
means for generating a speech query; and
means for retrieving one or more portions of said indexed digital media data dependent upon said speech query. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45)
-
-
46. An computer program product having a computer readable medium having a computer program recorded therein for voice retrieving digital media data annotated with speech, said computer program product including:
-
means for providing an indexed digital media data, said indexed digital media data derived from a word lattice created from a speech annotation of said digital media data, wherein said speech annotation is independent of a source digital media data;
means for generating a speech query; and
means for retrieving one or more portions of said indexed digital media data dependent upon said speech query. - View Dependent Claims (47, 48, 49, 50, 51, 52, 53, 54)
-
-
55. A system for voice annotating and retrieving source digital media data, said system including:
-
means for speech annotating at least one segment of said source digital media data with a speech annotation that is independent of the source digital media data to provide a speech annotated digital media data;
means for indexing said speech-annotated digital media data by said speech annotation to provide an indexed digital media data;
means for generating a speech or voice query; and
means for retrieving one or more portions of said indexed digital media data dependent upon said speech query. - View Dependent Claims (56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68)
-
-
69. A method of voice annotating source digital media data, said method comprising the steps of:
-
speech annotating, independently of the source media data, one or more portions of said source media data using a formal spoken language;
applying said annotated speech to a lattice engine and further applying at least one of acoustic and linguistic knowledge to said lattice engine to generate a word lattice;
applying said word lattice to a reverse index engine to build a reversed index table for said lattice; and
applying said reversed index table to an addressing engine module to create indexed media content.
-
-
70. An apparatus for voice annotating source digital media data, comprising:
-
means for speech annotating, independently of the source media data, one or more portions of said source media data using a formal spoken language;
means for applying said annotated speech to a lattice engine and further applying at least one of acoustic and linguistic knowledge to said lattice engine to generate a word lattice;
means for applying said word lattice to a reverse index engine to build a reversed index table for said lattice; and
means for applying said reversed index table to an addressing engine module to create indexed media content.
-
-
71. A computer program product having a computer readable medium having a computer program recorded therein for voice annotating source digital media data, said computer program product including:
-
means for speech annotating, independently of the source media data, one or more portions of said source media data using a formal spoken language;
means for applying said annotated speech to a lattice engine and further applying at least one of acoustic and linguistic knowledge to said lattice engine to generate a word lattice;
means for applying said word lattice to a reverse index engine to build a reversed index table for said lattice; and
means for applying said reversed index table to an addressing engine module to create indexed media content.
-
Specification