Method and apparatus for voice annotation and retrieval of multimedia data

US 6,397,181 B1
Filed: 06/04/1999
Issued: 05/28/2002
Est. Priority Date: 01/27/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method of voice annotating source digital media data, said method including the steps of:

speech annotating one or more portions of said source digital media data with a speech annotation that is independent of the source digital media data thereby providing a speech annotated digital media data; and

indexing said speech annotated digital media data by said speech annotation to provide an indexed media content.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, an apparatus, a computer program product and a system for voice annotating and retrieving digital media content are disclosed. An annotation module (420) post annotates digital media data (410), including audio, image and/or video data, with speech. A word lattice (222) can be created from speech annotation (210) dependent upon acoustic and/or linguistic knowledge. An indexing module (430) then indexes the speech-annotated data (422). The word lattice (222) is reverse indexed (230), and content addressing (240) is applied to produce the indexed data (432, 242). A speech query (474) can be generated as input to a retrieval module (480) for retrieving a segment of the indexed digital media data (432). The speech query (474, 310) is converted into a word lattice (322), and a shortlist (344) is produced from it (322) by confidence filtering (330). The shortlist (344) is input to a lattice search engine (350) to search the indexed content (342) to obtain the search result (352).

263 Citations

71 Claims

1. A method of voice annotating source digital media data, said method including the steps of:
- speech annotating one or more portions of said source digital media data with a speech annotation that is independent of the source digital media data thereby providing a speech annotated digital media data; and
  
  indexing said speech annotated digital media data by said speech annotation to provide an indexed media content.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method according to claim 1, wherein said indexing step includes the step of creating a word lattice from said speech annotation.
  - 3. The method according to claim 2, wherein said step of creating said word lattice is dependent upon at least one of acoustic and linguistic knowledge.
  - 4. The method according to claim 2, further including the step of reverse indexing said word lattice to provide a reverse index table.
  - 5. The method according to claim 4, further including the step of content addressing said reverse index table.
  - 6. The method according to claim 1, wherein said annotating step is dependent upon at least one of a customised vocabulary and Backus-Naur Form grammar.
  - 7. The method according to claim 1, wherein said indexing step further includes the steps of:
8. The method according to claim 7, wherein said acoustic knowledge is based on a hidden Markov model.
9. The method according to claim 7, wherein said linguistic knowledge is an N-gram statistical linguistic model.

10. An apparatus for voice annotating source digital media data, said apparatus including:
- means for speech annotating one or more portions of said source digital media data with a speech annotation that is independent of the source digital media data to provide a speech annotated digital media data; and
  
  means for indexing said speech annotated digital media data by said speech annotation to provide an indexed media content.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The apparatus according to claim 10, wherein said indexing means includes means for creating a word lattice from said speech annotation.
  - 12. The apparatus according to claim 11, wherein said means for creating said word lattice is dependent upon at least one of acoustic and linguistic knowledge.
  - 13. The apparatus according to claim 11, further including means for reverse indexing said-word lattice to provide a reverse index table.
  - 14. The apparatus according to claim 13, further including means for content addressing said reverse index table.
  - 15. The apparatus according to claim 10, wherein said annotating means is dependent upon at least one of a customised vocabulary and Backus-Naur Form grammar.
  - 16. The apparatus according to claim 10, wherein said means for indexing further includes:
17. The apparatus according to claim 16, wherein said acoustic knowledge is based on a hidden Markov model.
18. The apparatus according to claim 16, wherein said linguistic knowledge is an N-gram statistical linguistic model.

19. A computer program product having a computer readable medium having a computer program recorded therein for voice annotating source digital media data, said computer program product including:
- means for speech annotating one or more portions of said source digital media data with a speech annotation that is independent of the source digital media data to provide a speech annotated digital media data; and
  
  means for indexing said speech annotated digital media data by said speech annotation to provide an indexed media content.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
- - 20. The computer program product according to claim 19, wherein said indexing means includes means for creating a word lattice from said speech annotation.
  - 21. The computer program product according to claim 19, wherein said means for creating said word lattice is dependent upon at least one of acoustic and linguistic knowledge.
  - 22. The computer program product according to claim 20, further including means for content addressing said reverse index table.
  - 23. The computer program product according to claim 20, further including means for reverse indexing said word lattice to provide a reverse index table.
  - 24. The computer program product according to claim 19, wherein said annotating means is dependent upon at least one of a customised vocabulary and Backus-Naur Form grammar.
  - 25. The computer program product according to claim 19, wherein said means for indexing further includes:
26. The computer program product according to claim 25, wherein said acoustic knowledge is based on a hidden Markov model.
27. The computer program product according to claim 26, wherein said linguistic knowledge is an N-gram statistical linguistic model.

28. A method of voice retrieving digital media data annotated with speech, said method including the steps of:
- providing an indexed digital media data, said indexed digital media data derived from a word lattice created from a speech annotation of said digital media data, wherein said speech annotation is independent of a source digital media data;
  
  generating a speech query; and
  
  retrieving one or more portions of said indexed digital media data dependent upon said speech query.
- View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36)
- - 29. The method according to claim 28, further including the step of creating a word lattice from said speech query.
  - 30. The method according to claim 29, further including the step of searching said indexed digital media data dependent upon said speech query by matching said word lattice created from said speech query with word lattices of said speech annotation of said indexed digital media data.
  - 31. The method according to claim 30, further including the step of confidence filtering said lattice created from said speech query to produce a short-list for said searching step.
  - 32. The method according to claim 29, wherein said word lattice is created dependent upon at least one of acoustic and linguistic knowledge.
  - 33. The method according to claim 32, wherein said acoustic knowledge is based on a hidden Markov model.
  - 34. The method according to claim 32, wherein said linguistic knowledge is an N-gram statistical linguistic model.
  - 35. The method according to claim 28, further including the step of searching said indexed digital media data dependent upon a text query.
  - 36. The method according to claim 28, wherein said speech query is generated dependent upon at least one of a customised vocabulary and Backus-Naur Form grammar.

37. An apparatus for voice retrieving digital media data annotated with speech, said apparatus including:
- means for providing an indexed digital media data, said indexed digital media data derived from a word lattice created from a speech annotation of said digital media data, wherein said speech annotation is independent of a source digital media data;
  
  means for generating a speech query; and
  
  means for retrieving one or more portions of said indexed digital media data dependent upon said speech query.
- View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45)
- - 38. The apparatus according to claim 37, further including means for creating a word lattice from said speech query.
  - 39. The apparatus according to claim 38, further including means for searching said indexed digital media data dependent upon said speech query by matching said word lattice created from said speech query with word lattices of said speech annotation of said indexed digital media data.
  - 40. The apparatus according to claim 39, further including means for confidence filtering said lattice created from said speech query to produce a short-list for said searching means.
  - 41. The apparatus according to claim 38, wherein said word lattice is created dependent upon at least one of acoustic and linguistic knowledge.
  - 42. The apparatus according to claim 41, wherein said acoustic knowledge is based on a hidden Markov model.
  - 43. The apparatus according to claim 41, wherein said linguistic knowledge is an N-gram statistical linguistic model.
  - 44. The apparatus according to claim 37, further including means for searching said indexed digital media data dependent upon a text query.
  - 45. The apparatus according to claim 37, wherein said speech query is generated dependent upon at least one of a customised vocabulary and Backus-Naur Form grammar.

46. An computer program product having a computer readable medium having a computer program recorded therein for voice retrieving digital media data annotated with speech, said computer program product including:
- means for providing an indexed digital media data, said indexed digital media data derived from a word lattice created from a speech annotation of said digital media data, wherein said speech annotation is independent of a source digital media data;
  
  means for generating a speech query; and
  
  means for retrieving one or more portions of said indexed digital media data dependent upon said speech query.
- View Dependent Claims (47, 48, 49, 50, 51, 52, 53, 54)
- - 47. The computer program product according to claim 46, further including means for creating a word lattice from said speech query.
  - 48. The computer program product according to claim 47, further including means for searching said indexed digital media data dependent upon said speech query by matching said word lattice created from said speech query with word lattices of said speech annotation of said indexed digital media data.
  - 49. The computer program product according to claim 48, further including means for confidence filtering said lattice created from said speech query to produce a short-list for said searching means.
  - 50. The computer program product according to claim 47, wherein said word lattice is created dependent upon at least one of acoustic and linguistic knowledge.
  - 51. The computer program product according to claim 50, wherein said acoustic knowledge is based on a hidden Markov model.
  - 52. The computer program product according to claim 50, wherein said linguistic knowledge is an N-gram statistical linguistic model.
  - 53. The computer program product according to claim 46, further including means for searching said indexed digital media data dependent upon a text query.
  - 54. The computer program product according to claim 46, wherein said speech query is generated dependent upon at least one of a customised vocabulary and Backus-Naur Form grammar.

55. A system for voice annotating and retrieving source digital media data, said system including:
- means for speech annotating at least one segment of said source digital media data with a speech annotation that is independent of the source digital media data to provide a speech annotated digital media data;
  
  means for indexing said speech-annotated digital media data by said speech annotation to provide an indexed digital media data;
  
  means for generating a speech or voice query; and
  
  means for retrieving one or more portions of said indexed digital media data dependent upon said speech query.
- View Dependent Claims (56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68)
- - 56. The system according to claim 55, further including means for creating a lattice structure from said speech annotation.
  - 57. The system according to claim 56, wherein said means for creating said lattice structure is dependent upon at least one of acoustic and linguistic knowledge.
  - 58. The system according to claim 57, wherein said acoustic knowledge is based on a hidden Markov model.
  - 59. The system according to claim 57, wherein said linguistic knowledge is an N-gram statistical linguistic model.
  - 60. The system according to claim 56, further including means for reverse indexing said lattice structure to provide a reverse index table.
  - 61. The system according to claim 60, further including means for content addressing said reverse index table.
  - 62. The system according to claim 55, wherein said speech-annotating means post-annotates said source digital media data.
  - 63. The system according to claim 55, further including means for creating a lattice structure from said speech query.
  - 64. The system according to claim 63, further including means for searching said indexed digital media data dependent upon said speech query by matching said lattice structure created from said speech query with lattice structures of said speech annotation of said indexed digital media data.
  - 65. The system according to claim 64, further including means for confidence filtering said lattice structure created from said speech query to produce a short-list for said searching means.
  - 66. The system according to claim 63, wherein said lattice structure is created dependent upon at least one of acoustic and linguistic knowledge.
  - 67. The system according to claim 55, further including means for searching said indexed digital media data dependent upon a text query.
  - 68. The system according to claim 55, wherein at least one of said annotating means and said speech query is dependent upon at least one of a customised vocabulary and Backus-Naur Form grammar.

69. A method of voice annotating source digital media data, said method comprising the steps of:
- speech annotating, independently of the source media data, one or more portions of said source media data using a formal spoken language;
  
  applying said annotated speech to a lattice engine and further applying at least one of acoustic and linguistic knowledge to said lattice engine to generate a word lattice;
  
  applying said word lattice to a reverse index engine to build a reversed index table for said lattice; and
  
  applying said reversed index table to an addressing engine module to create indexed media content.

70. An apparatus for voice annotating source digital media data, comprising:
- means for speech annotating, independently of the source media data, one or more portions of said source media data using a formal spoken language;
  
  means for applying said annotated speech to a lattice engine and further applying at least one of acoustic and linguistic knowledge to said lattice engine to generate a word lattice;
  
  means for applying said word lattice to a reverse index engine to build a reversed index table for said lattice; and
  
  means for applying said reversed index table to an addressing engine module to create indexed media content.

71. A computer program product having a computer readable medium having a computer program recorded therein for voice annotating source digital media data, said computer program product including:
- means for speech annotating, independently of the source media data, one or more portions of said source media data using a formal spoken language;
  
  means for applying said annotated speech to a lattice engine and further applying at least one of acoustic and linguistic knowledge to said lattice engine to generate a word lattice;
  
  means for applying said word lattice to a reverse index engine to build a reversed index table for said lattice; and
  
  means for applying said reversed index table to an addressing engine module to create indexed media content.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kent Ridge Digital Labs
Original Assignee
Kent Ridge Digital Labs
Inventors
Narasimhalu, Arcot Desai, Li, Haizhou, Wu, Jiankang
Primary Examiner(s)
Dorvil, Richemond

Application Number

US09/319,319
Time in Patent Office

1,089 Days
Field of Search

704/200,270,275,235,257,255,256,251,231
US Class Current

704/256.4
CPC Class Codes

G10L 15/08 Speech classification or se...

G10L 15/197 Probabilistic grammars, e.g...

Method and apparatus for voice annotation and retrieval of multimedia data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

263 Citations

71 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for voice annotation and retrieval of multimedia data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

263 Citations

71 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links