System and method for indexing and querying audio archives

US 6,434,520 B1
Filed: 04/16/1999
Issued: 08/13/2002
Est. Priority Date: 04/16/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method for processing an audio data file, comprising the steps of:

segmenting the audio data file into segments based on detected speaker changes;

performing speaker identification for each segment and assigning at least one speaker identification tag to each segment based on an identified speaker;

verifying the identity of the speaker associated with the at least one identification tag for each segment; and

indexing the segments of the audio data file for storage in a database in accordance with the identification tags of verified speakers.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for indexing segments of audio/multimedia files and data streams for storage in a database according to audio information such as speaker identity, the background environment and channel (music, street noise, car noise, telephone, studio noise, speech plus music, speech plus noise, speech over speech), and/or the transcription of the spoken utterances. The content or topic of the transcribed text can also be determined using natural language understanding to index based on the context of the transcription. A user can then retrieve desired segments of the audio file from the database by generating a query having one or more desired parameters based on the indexed information.

Citations

34 Claims

1. A method for processing an audio data file, comprising the steps of:
- segmenting the audio data file into segments based on detected speaker changes;
  
  performing speaker identification for each segment and assigning at least one speaker identification tag to each segment based on an identified speaker;
  
  verifying the identity of the speaker associated with the at least one identification tag for each segment; and
  
  indexing the segments of the audio data file for storage in a database in accordance with the identification tags of verified speakers.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method of claim 1, further including the step of enrolling an unknown speaker of a corresponding segment.
  - 3. The method of claim 1, further including the step of retrieving at least one segment from the database in accordance with a user query based on an identity of a desired speaker.
  - 4. The method of claim 3, wherein the user query is a speaker identification tag and the retrieving step includes the step of selecting for output at least one segment which is indexed with at least one speaker identification tag that matches the speaker identification tag of the query.
  - 5. The method of claim 3, further including the steps of:
6. The method of claim 5, wherein the user query is a voiceprint associated with the desired speaker, and wherein the retrieving step includes the steps of:
- comparing the input speaker voiceprint with each of the stored voiceprints of the segments; and
  
  selecting at least one segment having a corresponding voiceprint stored therewith that matches the input voiceprint.
7. The method of claim 5, wherein the user query is an audio segment of the desired speaker, and wherein the retrieving step includes the steps of:
- generating a voiceprint from the input audio segment;
  
  comparing the generated voiceprint with each of the stored voiceprints of the segments; and
  
  selecting at least one segment having a corresponding voiceprint stored therewith that matches the generated voiceprint.
8. The method of claim 5, further including the step of storing for each segment one of a corresponding waveform, acoustic features, and both.
9. The method of claim 8, wherein the user query is an audio segment of the desired speaker, and wherein the retrieving step includes the steps of:
- comparing the audio segment with one of the stored waveforms and stored acoustic features of each of the segments in the database; and
  
  selecting at least one segment having a corresponding one of a waveform and acoustic features stored therewith that match the audio segment of the speaker of interest.
10. The method of claim 9, wherein the audio segment of the speaker of interest is one of input by the user and selected from the database.
11. The method of claim 1, further including the steps of:
- segmenting the audio data file into segments based on detected changes in environment; and
  
  identifying at least one environment of each segment and assigning at least one environment tag to each segment corresponding to the at least one identified environment;
  
  wherein the indexing step further includes indexing the segments of the audio data file for storage in the database in accordance with the environment tags of the segments.
12. The method of claim 11, wherein the step of detecting changes in environment includes detecting changes in one of a background noise, a channel, and a combination thereof.
13. The method of claim 11, including the step of retrieving at least one segment from the database in accordance with a user query based on one of an identity of a desired speaker, the identity of a desired environment, and a combination thereof.
14. The method of claim 1, further including the steps of:
- recognizing spoken words of each segment; and
  
  storing the recognized words for each corresponding segment in the database.
15. The method of claim 14, wherein the recognizing step includes the steps of:
- identifying one of channel acoustic components, background acoustic components, and a combination thereof, for each segment; and
  
  decoding the spoken words of each segment using trained models based on the identified acoustic components.
16. The method of claim 14, further including the steps of:
- performing natural language understanding (NLU) of the recognized words of each segment to determine at least one NLU topic of each segment;
  
  wherein the indexing step further includes indexing the segments of the audio data file for storage in the database in accordance with the determined NLU topics.
17. The method of claim 16, including the step of retrieving at least one segment from the database in accordance with a user query based on one of an identity of a speaker of interest, at least one user-selected keyword, context of the recognized words text, at least one NLU topic, and a combination thereof.

18. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for processing an audio data file, the method steps comprising:
- segmenting the audio data file into segments based on detected speaker changes;
  
  performing speaker identification for each segment and assigning at least one speaker identification tag to each segment based on an identified speaker;
  
  verifying the identity of the speaker associated with the at least one identification tag for each segment; and
  
  indexing the segments of the audio data file for storage in a database in accordance with the identification tags of verified speakers.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
- - 19. The program storage device of claim 18, further including instructions for performing the step of retrieving at least one segment from the database in accordance with a user query based on an identity of a desired speaker.
  - 20. The program storage device of claim 18, further including instructions for performing the steps of:
21. The program storage device of claim 20, wherein the step of detecting changes in environment includes detecting changes in one of a background, a channel, and a combination thereof.
22. The program storage device of claim 20, further including instructions for performing the step of retrieving at least one segment from the database in accordance with a user query based on one of an identity of a desired speaker, the identity of a desired environment, and a combination thereof.
23. The program storage device of claim 18, further including instructions for performing the steps of:
- recognizing spoken words of each segment; and
  
  storing the recognized words for each corresponding segment in the database.
24. The program storage device of claim 23, wherein the instructions for performing the recognizing step include instructions for performing the steps of:
- identifying one of channel acoustic components, background acoustic components, and a combination thereof, for each segment; and
  
  decoding the spoken words of each segment using trained models based on the identified acoustic components.
25. The program storage device of claim 23, further including instruction for performing the steps of:
- performing natural language understanding (NLU) of the recognized words of each segment to determine at least one NLU topic of each segment;
  
  wherein the instructions for performing the indexing step include instructions for indexing the segments of the audio data file for storage in the database in accordance with the determined NLU topics.
26. The program storage device of claim 25, further including instructions for performing the step of retrieving at least one segment from the database in accordance with a user query based on one of an identity of a speaker of interest, at least one user-selected keyword, context of the recognized words text, at least one NLU topic, and a combination thereof.

27. A system for managing a database of audio data files, comprising:
- a segments for dividing an input audio data file into segments by detecting speaker changes in the input audio data file;
  
  a speaker identifier for identifying a speaker of each segment and assigning at least one identity tag to each segment;
  
  a speaker verifier for verifying the at least one identity tag of each segment; and
  
  an indexer for indexing the segments of the input audio data file for storage in the database in accordance with the identity tags of verified speakers.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34)
- - 28. The system of claim 27, further comprising a search engine for retrieving at least one segment from the database by processing a user query based on an identity of a desired speaker.
  - 29. The system of claim 27, wherein the segments divides the audio data file into segments based on detected changes in environment, and wherein the system further comprises an environment identifier for identifying at least one environment of each segment and assigning at least one environment tag to each segment corresponding to the at least one identified environment, and wherein the indexer further indexes the segments of the audio data file for storage in the database in accordance with the environment tags.
  - 30. The system of claim 29, wherein the detected environment changes include one of background, channels, and a combination thereof.
  - 31. The system of claim 29, further comprising a search engine for retrieving at least one segment from the database indexed by one of an identity of a desired speaker, the identity of a desired environment, and a combination thereof, by processing a user query based on one of the identity of a desired speaker, the identity of a desired is environment, and a combination thereof.
  - 32. The system of claim 27, further comprising:
33. The system of claim 32, further comprising means for performing natural language understanding (NLU) of the recognized words of each segment to determine at least one NLU topic of each segment, wherein the indexer indexes the segments of the audio data file for storage in the database in accordance with the determined NLU topics.
34. The system of claim 33, further comprising a search engine for retrieving at least one segment from the database by processing a user query based on one of an identity of a speaker of interest, at least one user-selected keyword, context of the recognized words text, at least one NLU topic, and a combination thereof.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Kanevsky, Dimitri, Maes, Stephane H.
Primary Examiner(s)
Chawan, Vijay B

Application Number

US09/294,214
Time in Patent Office

1,215 Days
Field of Search

704/245, 704/246, 704/255, 704/231, 704/270-275, 704/247, 704/243, 704/233, 704/251, 704/257, 704/250, 379/67
US Class Current

704/243
CPC Class Codes

G06F 16/685   using automatically derived...

G10L 15/26   Speech to text systems G10L...

G10L 17/00   Speaker identification or v...

System and method for indexing and querying audio archives

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

34 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for indexing and querying audio archives

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

34 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links