Hyperarticulation detection in repetitive voice queries using pairwise comparison for improved speech recognition
First Claim
1. A system for detecting hyperarticulation in repetitive voice queries, the system comprising:
- a machine-readable memory storing computer-executable instructions; and
one or more hardware processors in communication with the machine-readable memory that, having executed the computer-executable instructions, configure the system to;
receive a first audio utterance comprising at least one word;
receive a second audio utterance comprising at least one word;
determine whether the second audio utterance is likely to include hyperarticulation based on a comparison of the first audio utterance with the second audio utterance;
in response to a determination that the second audio utterance is likely to include hyperarticulation, determining a plurality of hyperarticulation features, at least one hyperarticulation feature comprising a probability that a portion of the second audio utterance includes hyperarticulation;
determine a first plurality of candidate hypotheses corresponding to the second audio utterance;
determine a score for each of the candidate hypotheses based on the determined plurality of hyperarticulation features, where each candidate hypothesis is associated with a rank corresponding to the determined score; and
select a hypothesis from the first plurality of hypotheses based on the rank associated with the selected hypothesis.
1 Assignment
0 Petitions
Accused Products
Abstract
Automatic speech recognition systems can benefit from cues in user voice such as hyperarticulation. Traditional approaches typically attempt to define and detect an absolute state of hyperarticulation, which is very difficult, especially on short voice queries. This disclosure provides for an approach for hyperarticulation detection using pair-wise comparisons and on a real-world speech recognition system. The disclosed approach uses delta features extracted from a pair of repetitive user utterances. The improvements provided by the disclosed systems and methods include improvements in word error rate by using hyperarticulation information as a feature in a second pass N-best hypotheses rescoring setup.
-
Citations
13 Claims
-
1. A system for detecting hyperarticulation in repetitive voice queries, the system comprising:
-
a machine-readable memory storing computer-executable instructions; and one or more hardware processors in communication with the machine-readable memory that, having executed the computer-executable instructions, configure the system to; receive a first audio utterance comprising at least one word; receive a second audio utterance comprising at least one word; determine whether the second audio utterance is likely to include hyperarticulation based on a comparison of the first audio utterance with the second audio utterance; in response to a determination that the second audio utterance is likely to include hyperarticulation, determining a plurality of hyperarticulation features, at least one hyperarticulation feature comprising a probability that a portion of the second audio utterance includes hyperarticulation; determine a first plurality of candidate hypotheses corresponding to the second audio utterance; determine a score for each of the candidate hypotheses based on the determined plurality of hyperarticulation features, where each candidate hypothesis is associated with a rank corresponding to the determined score; and select a hypothesis from the first plurality of hypotheses based on the rank associated with the selected hypothesis. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for detecting hyperarticulation in repetitive voice queries, the system comprising:
-
a machine-readable memory storing computer-executable instructions; and one or more hardware processors in communication with the machine-readable memory that, having executed the computer-executable instructions, configures the system to; receive a first audio utterance comprising at least one word; determine a first candidate hypothesis for the received first audio utterance; output a message in response to the received first candidate hypothesis; receive a second audio utterance comprising at least one word; determine whether the second audio utterance is likely to include hyperarticulation based on a comparison of the first audio utterance with the second audio utterance; in response to a determination that the second audio utterance is likely to include hyperarticulation, determining a plurality of hyperarticulation features, at least one hyperarticulation feature comprising a probability that a portion of the second audio utterance includes hyperarticulation; determine a first plurality of candidate hypotheses corresponding to the second audio utterance; determine a score for each of the candidate hypotheses based on the determined plurality of hyperarticulation features, where each candidate hypothesis is associated with a rank corresponding to the determined score; select a hypothesis from the first plurality of hypotheses based on the rank associated with the selected hypothesis; and perform an operation based on the selected hypothesis. - View Dependent Claims (9, 10, 11, 12, 13)
-
Specification