Automated voice and speech labeling

US 9,129,605 B2
Filed: 03/14/2013
Issued: 09/08/2015
Est. Priority Date: 03/30/2012
Status: Active Grant

First Claim

Patent Images

1. A method for converting speech to text, comprising the steps of:

receiving a digital signal comprising a recorded spoken input;

obtaining at least one measurement of said digital signal, the measurement comprising a first measured portion of said recorded spoken input and a second measured portion of said recorded spoken input;

identifying at least one characteristic of said digital signal by comparing said first measured portion of said recorded spoken input to a first database of digital audio signal characteristics;

transcribing said first measured portion of said recorded spoken input using said at least one characteristic of said digital signal to create an initial transcription;

backfilling said first database of digital audio signal characteristics with at least one characteristic from a second database of digital audio signal characteristics;

identifying a second characteristic of said digital signal by comparing said second measured portion of said digital signal to said backfilled first database of digital audio signal characteristics;

transcribing said second measured portion of said recorded spoken input using said second characteristic of said digital signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for voice and speech analysis which correlates a speaker signal source and a normalized signal comprising measurements of input acoustic data to a database of language, dialect, accent, and/or speaker attributes in order to create a transcription of the input acoustic data.

Citations

20 Claims

1. A method for converting speech to text, comprising the steps of:
- receiving a digital signal comprising a recorded spoken input;
  
  obtaining at least one measurement of said digital signal, the measurement comprising a first measured portion of said recorded spoken input and a second measured portion of said recorded spoken input;
  
  identifying at least one characteristic of said digital signal by comparing said first measured portion of said recorded spoken input to a first database of digital audio signal characteristics;
  
  transcribing said first measured portion of said recorded spoken input using said at least one characteristic of said digital signal to create an initial transcription;
  
  backfilling said first database of digital audio signal characteristics with at least one characteristic from a second database of digital audio signal characteristics;
  
  identifying a second characteristic of said digital signal by comparing said second measured portion of said digital signal to said backfilled first database of digital audio signal characteristics;
  
  transcribing said second measured portion of said recorded spoken input using said second characteristic of said digital signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 19)
- - 2. The method of claim 1, further comprising the step of processing the received digital signal.
  - 3. The method of claim 2, wherein the step of processing the received digital signal comprises the step of obtaining at least one measurement of said spoken input.
  - 4. The method of claim 2, wherein the step of processing the received digital signal comprises the step of labeling vocal and non-vocal portions of said spoken input.
  - 5. The method of claim 1, further comprising the step of normalizing said digital signal.
  - 6. The method of claim 1, wherein said at least one characteristic of said digital signal is a characteristic of the speaker of said spoken input.
  - 7. The method of claim 1, wherein at least one characteristic of said digital audio signal is used to produce said first level transcription.
  - 8. The method of claim 1, further comprising the step of displaying said second transcription on a monitor.
  - 19. The method of claim 1, wherein said second database of digital audio signal characteristics is populated with characteristics derived from a plurality of speakers.

9. A system for converting speech to text, the system comprising:
- a digital audio signal comprising an encoding of a recorded spoken input;
  
  means for obtaining at least one measurement of said digital audio signal, the measurement comprising a first measured portion of said recorded spoken input and a second measured portion of said recorded spoken input;
  
  means for comparing said first measured portion of said recorded spoken input to a first database of digital audio signal characteristics;
  
  means for identifying at least one characteristic of said digital audio signal based on said comparison;
  
  means for transcribing said first measured portion of said spoken input using said at least one characteristic of the digital audio signal to create an initial transcription;
  
  means for backfilling said first database of digital audio signal characteristics with at least one characteristic from a second database of digital audio signal characteristics;
  
  means for identifying a second characteristic of said digital signal by comparing said second measured portion of said digital signal to said backfilled first database of digital audio signal characteristics; and
  
  means for transcribing said second measured portion of said recorded spoken input using said second characteristic of said digital signal.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 20)
- - 10. A system according to claim 9, further comprising means for filtering the digital audio signal.
  - 11. A system according to claim 9, further comprising normalization means for normalizing the digital audio signal.
  - 12. A system according to claim 9, comprising normalization means for normalizing prosodic speech features.
  - 13. A system according to claim 9, comprising means for labeling the vocal and nonvocal portions of said digital audio signal.
  - 14. A system according to claim 9, further comprising means identifying at least one characteristic of said digital audio signal.
  - 15. A system according to claim 9, further comprising means for comparing characteristics to produce said initial transcript.
  - 16. A system according to claim 9, further comprising means for constructing a multi-speaker feature map.
  - 17. A system according to claim 9, further comprising means for using a multi-speaker feature map to backfill a single-speaker feature map.
  - 18. A system according to claim 9, further comprising displaying means for displaying the second transcription on a monitor.
  - 20. The system of claim 9, wherein said second database of digital audio signal characteristics is populated with characteristics derived from a plurality of speakers.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SRC, Inc.
Original Assignee
SRC, Inc.
Inventors
Eller, David Donald, Morphet, Steven Brian, Boyett, Watson Brent
Primary Examiner(s)
Chawan, Vijay B

Application Number

US13/828,856
Publication Number

US 20130262111A1
Time in Patent Office

908 Days
Field of Search

704/235, 704/256.7, 704/260, 704/243, 704/231, 704/251, 704/257, 704/258, 704/240, 379/265.02, 379/309, 379/52, 379/88.16
US Class Current

1/1
CPC Class Codes

G06F 16/685   using automatically derived...

G10L 15/10   using distance or distortio...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

Automated voice and speech labeling

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Automated voice and speech labeling

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links