Hyperarticulation detection in repetitive voice queries using pairwise comparison for improved speech recognition

US 10,354,642 B2
Filed: 06/15/2017
Issued: 07/16/2019
Est. Priority Date: 03/03/2017
Status: Active Grant

First Claim

Patent Images

1. A system for detecting hyperarticulation in repetitive voice queries, the system comprising:

a machine-readable memory storing computer-executable instructions; and

one or more hardware processors in communication with the machine-readable memory that, having executed the computer-executable instructions, configure the system to;

receive a first audio utterance comprising at least one word;

receive a second audio utterance comprising at least one word;

determine whether the second audio utterance is likely to include hyperarticulation based on a comparison of the first audio utterance with the second audio utterance;

in response to a determination that the second audio utterance is likely to include hyperarticulation, determining a plurality of hyperarticulation features, at least one hyperarticulation feature comprising a probability that a portion of the second audio utterance includes hyperarticulation;

determine a first plurality of candidate hypotheses corresponding to the second audio utterance;

determine a score for each of the candidate hypotheses based on the determined plurality of hyperarticulation features, where each candidate hypothesis is associated with a rank corresponding to the determined score; and

select a hypothesis from the first plurality of hypotheses based on the rank associated with the selected hypothesis.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Automatic speech recognition systems can benefit from cues in user voice such as hyperarticulation. Traditional approaches typically attempt to define and detect an absolute state of hyperarticulation, which is very difficult, especially on short voice queries. This disclosure provides for an approach for hyperarticulation detection using pair-wise comparisons and on a real-world speech recognition system. The disclosed approach uses delta features extracted from a pair of repetitive user utterances. The improvements provided by the disclosed systems and methods include improvements in word error rate by using hyperarticulation information as a feature in a second pass N-best hypotheses rescoring setup.

Citations

13 Claims

1. A system for detecting hyperarticulation in repetitive voice queries, the system comprising:
- a machine-readable memory storing computer-executable instructions; and
  
  one or more hardware processors in communication with the machine-readable memory that, having executed the computer-executable instructions, configure the system to;
  
  receive a first audio utterance comprising at least one word;
  
  receive a second audio utterance comprising at least one word;
  
  determine whether the second audio utterance is likely to include hyperarticulation based on a comparison of the first audio utterance with the second audio utterance;
  
  in response to a determination that the second audio utterance is likely to include hyperarticulation, determining a plurality of hyperarticulation features, at least one hyperarticulation feature comprising a probability that a portion of the second audio utterance includes hyperarticulation;
  
  determine a first plurality of candidate hypotheses corresponding to the second audio utterance;
  
  determine a score for each of the candidate hypotheses based on the determined plurality of hyperarticulation features, where each candidate hypothesis is associated with a rank corresponding to the determined score; and
  
  select a hypothesis from the first plurality of hypotheses based on the rank associated with the selected hypothesis.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system of claim 1, wherein the second audio utterance is further determined to include hyperarticulation where the second audio utterance is received within a predetermined time interval from a time when the first audio utterance was received.
  - 3. The system of claim 1, wherein the comparison of the first audio utterance with the second audio utterance includes determining a phonetic similarity between the first audio utterance and the second audio utterance.
  - 4. The system of claim 1, wherein the system is further configured to:
    - determine a second plurality of candidate hypotheses corresponding to the first audio utterance; and
      
      the comparison of the first audio utterance with the second audio utterance includes determining whether a hypothesis from the first plurality of hypotheses is included in the second plurality of hypotheses.
  - 5. The system of claim 1, wherein the score for each of the candidate hypotheses of the first plurality of candidate hypotheses is further based on a metaphone similarity between the first audio utterance and the second audio utterance and a phonetics similarity between the first audio utterance and the second audio utterance.
  - 6. The system of claim 1, wherein:
    - the first audio utterance comprises a first plurality of word segments;
      
      the second audio utterance comprises a second plurality of word segments; and
      
      the system is further configured to align the first plurality of word segments with the second plurality of word segments.
  - 7. The system of claim 1, wherein:
    - the first audio utterance comprises a first plurality of word segments;
      
      the second audio utterance comprises a second plurality of word segments; and
      
      the system is further configured to determine that the second audio utterance is likely to include hyperarticulation where at least one word segment of the second plurality of word segments is longer in duration than a corresponding word segment of the first plurality of word segments.

8. A system for detecting hyperarticulation in repetitive voice queries, the system comprising:
- a machine-readable memory storing computer-executable instructions; and
  
  one or more hardware processors in communication with the machine-readable memory that, having executed the computer-executable instructions, configures the system to;
  
  receive a first audio utterance comprising at least one word;
  
  determine a first candidate hypothesis for the received first audio utterance;
  
  output a message in response to the received first candidate hypothesis;
  
  receive a second audio utterance comprising at least one word;
  
  determine whether the second audio utterance is likely to include hyperarticulation based on a comparison of the first audio utterance with the second audio utterance;
  
  in response to a determination that the second audio utterance is likely to include hyperarticulation, determining a plurality of hyperarticulation features, at least one hyperarticulation feature comprising a probability that a portion of the second audio utterance includes hyperarticulation;
  
  determine a first plurality of candidate hypotheses corresponding to the second audio utterance;
  
  determine a score for each of the candidate hypotheses based on the determined plurality of hyperarticulation features, where each candidate hypothesis is associated with a rank corresponding to the determined score;
  
  select a hypothesis from the first plurality of hypotheses based on the rank associated with the selected hypothesis; and
  
  perform an operation based on the selected hypothesis.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The system of claim 8, wherein the second audio utterance is further determined to include hyperarticulation where the second audio utterance is received within a predetermined time interval from a time when the first audio utterance was received.
  - 10. The system of claim 8, wherein the comparison of the first audio utterance with the second audio utterance includes determining a phonetic similarity between the first audio utterance and the second audio utterance.
  - 11. The system of claim 8, wherein the system is further configured to:
    - determine a second plurality of candidate hypotheses corresponding to the first audio utterance; and
      
      the comparison of the first audio utterance with the second audio utterance includes determining whether a hypothesis from the first plurality of hypotheses is included in the second plurality of hypotheses.
  - 12. The system of claim 8, wherein the score for each of the candidate hypotheses of the first plurality of candidate hypotheses is further based on a metaphone similarity between the first audio utterance and the second audio utterance and a phonetics similarity between the first audio utterance and the second audio utterance.
  - 13. The system of claim 8, wherein:
    - the first audio utterance comprises a first plurality of word segments;
      
      the second audio utterance comprises a second plurality of word segments; and
      
      the system is further configured to determine that the second audio utterance is likely to include hyperarticulation where at least one word segment of the second plurality of word segments is longer in duration than a corresponding word segment of the first plurality of word segments.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Gurunath Kulkarni, Ranjitha, El Kholy, Ahmed Moustafa, Bawab, Ziad Al, Alon, Noha, Zitouni, Imed
Primary Examiner(s)
Vo, Huyen X

Application Number

US15/624,451
Publication Number

US 20180254035A1
Time in Patent Office

761 Days
Field of Search

704 1- 10, 704230-257, 704270-277
US Class Current
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/065   Adaptation

G10L 15/20   Speech recognition techniqu...

G10L 15/22   Procedures used during a sp...

G10L 25/51   for comparison or discrimin...

Hyperarticulation detection in repetitive voice queries using pairwise comparison for improved speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Hyperarticulation detection in repetitive voice queries using pairwise comparison for improved speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links