AUTOMATED VOICE AND SPEECH LABELING
First Claim
Patent Images
1. A method for converting speech to text, comprising the steps of:
- receiving a digital signal comprising a recorded spoken input;
obtaining at least one measurement of said digital signal;
identifying at least one characteristic of said digital signal by comparing said at least one measurement of said digital signal to a database of digital audio signal characteristics;
transcribing at least a portion of said recorded spoken input using said at least one characteristic of said digital signal to create an initial transcription;
calculating at least one speech sound phonetic identification value;
transcribing at least a second portion of said recorded spoken input using said first transcription and said at least one speech sound phonetic identification value.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for voice and speech analysis which correlates a speaker signal source and a normalized signal comprising measurements of input acoustic data to a database of language, dialect, accent, and/or speaker attributes in order to create a transcription of the input acoustic data.
-
Citations
18 Claims
-
1. A method for converting speech to text, comprising the steps of:
-
receiving a digital signal comprising a recorded spoken input; obtaining at least one measurement of said digital signal; identifying at least one characteristic of said digital signal by comparing said at least one measurement of said digital signal to a database of digital audio signal characteristics; transcribing at least a portion of said recorded spoken input using said at least one characteristic of said digital signal to create an initial transcription; calculating at least one speech sound phonetic identification value; transcribing at least a second portion of said recorded spoken input using said first transcription and said at least one speech sound phonetic identification value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for converting speech to text, the system comprising:
-
a digital audio signal comprising an encoding of a recorded spoken input; means for obtaining at least one measurement of said digital audio signal; means for comparing said at least one measurement of the digital signal to a database of digital audio signal characteristics; means for identifying at least one characteristic of said digital audio signal based on said comparison; means for transcribing at least a portion of said spoken input using said at least one characteristic of the digital audio signal to create an initial transcription; means for constructing a multi-speaker feature map; means for backfilling said speaker'"'"'s feature map from a multi-speaker feature map; and means for correlating said at least one measurement to said initial transcription. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
-
Specification