System and method of improving speech recognition using context

US 9,626,963 B2
Filed: 04/30/2013
Issued: 04/18/2017
Est. Priority Date: 04/30/2013
Status: Active Grant

First Claim

Patent Images

1. A system, comprising:

a processor;

a single microphone configured to both record user speech and to record ambient sounds; and

a speech recognition module configured to;

identify that the ambient sounds are of a particular type by comparing the ambient sounds to stored waveforms;

select a dictionary based on the identified particular type of ambient sounds;

identify, as contextual information, terms related to the identified particular type of ambient sounds based on identification of the identified particular type of ambient sounds, the terms being generated as contextual information;

alter, in response to identification of the terms related to the identified particular type of ambient sounds, the dictionary such that the dictionary includes the terms related to the identified particular type of ambient sounds;

assign, in the dictionary, score values to the terms related to the identified particular type of ambient sounds based on identifying that the terms are related to the identified particular type of ambient sounds; and

analyze the user speech by comparing each potential output word or phoneme in the user speech to waveforms stored for the dictionary to attempt to match the potential output word or phoneme to a waveform corresponding to a particular word or phoneme in the dictionary, an analysis varying based on the assigned scores to the terms identified as contextual information.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method are provided for improving speech recognition accuracy. Contextual information about user speech may be received, and then speech recognition analysis can be performed on the user speech using the contextual information. This allows the system and method to improve accuracy when performing tasks like searching and navigating using speech recognition.

36 Citations

View as Search Results

15 Claims

1. A system, comprising:
- a processor;
  
  a single microphone configured to both record user speech and to record ambient sounds; and
  
  a speech recognition module configured to;
  
  identify that the ambient sounds are of a particular type by comparing the ambient sounds to stored waveforms;
  
  select a dictionary based on the identified particular type of ambient sounds;
  
  identify, as contextual information, terms related to the identified particular type of ambient sounds based on identification of the identified particular type of ambient sounds, the terms being generated as contextual information;
  
  alter, in response to identification of the terms related to the identified particular type of ambient sounds, the dictionary such that the dictionary includes the terms related to the identified particular type of ambient sounds;
  
  assign, in the dictionary, score values to the terms related to the identified particular type of ambient sounds based on identifying that the terms are related to the identified particular type of ambient sounds; and
  
  analyze the user speech by comparing each potential output word or phoneme in the user speech to waveforms stored for the dictionary to attempt to match the potential output word or phoneme to a waveform corresponding to a particular word or phoneme in the dictionary, an analysis varying based on the assigned scores to the terms identified as contextual information.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The system of claim 1, wherein the ambient sounds include music and the identification that the ambient sounds are of the particular type includes identifying that the ambient sounds are music and identifying the music, wherein the speech recognition module is further configured to retrieve identify, as the contextual information, terms related to the identified music.
  - 3. The system of claim 1, further comprising a sensor, and wherein the contextual information includes information identified from sensor information detected by the sensor.
  - 4. The system of claim 3, wherein the sensor is a global positioning system module and the contextual information includes location.
  - 5. The system of claim 3, wherein the sensor is a global positioning system module and the contextual information includes speed.

6. A method comprising:
- recording sounds using a single microphone;
  
  identifying, using one or more processors, potential output words and phonemes as well as ambient sounds in the sounds recorded by the single microphone;
  
  identifying that the ambient sounds are of a particular type by comparing the ambient sounds to stored waveforms;
  
  selecting a dictionary based on the identified particular type of ambient sounds;
  
  identifying, as contextual information, terms related to the identified particular type of ambient sounds based on identification of the identified particular type of ambient sounds, the terms being generated as contextual information;
  
  assigning, in the dictionary, score values to the terms related to the identified particular type of ambient sounds based on identifying that the terms are related to the identified particular type of ambient sounds; and
  
  analyzing user speech by comparing each potential output word or phoneme in the user speech to waveforms stored for the dictionary to attempt to match the potential output word or phoneme to a waveform corresponding to a particular word or phoneme in the dictionary, the analyzing varying based on the assigned scores to the terms identified as contextual information.
- View Dependent Claims (7, 8, 9, 10, 11, 12)
- - 7. The method of claim 6, wherein the contextual information includes user location.
  - 8. The method of claim 6, wherein the contextual information includes speed of movement of a user.
  - 9. The method of claim 6, wherein the ambient sounds include music and the identification that the ambient sounds are of the particular type includes identifying that the ambient sounds are music and identifying the music, wherein the speech recognition module is further configured to identify, as the contextual information, terms related to the identified music.
  - 10. The method of claim 6, further comprising altering the dictionary based on the contextual information such that the dictionary includes the terms related to the identified particular type of ambient sounds.
  - 11. The method of claim 10, wherein the dictionary is altered by replacing the dictionary with a different dictionary.
  - 12. The method of claim 10, wherein the dictionary is altered by adding words pertaining to the contextual information to the dictionary.

13. A non-transitory machine-readable storage medium comprising a set of instructions which, when executed by a processor, causes execution of operations comprising:
- recording sounds using a single microphone;
  
  identifying potential output words and phonemes as well as ambient sounds in the sounds recorded by the single microphone;
  
  identifying that the ambient sounds are of a particular type by comparing the ambient sounds to stored waveforms;
  
  selecting a dictionary based on the identified particular type of ambient sounds;
  
  identifying, as contextual information, terms related to the identified particular type of ambient sounds based on identification of the identified particular type of ambient sounds, the terms being generated as contextual information;
  
  assigning, in the dictionary, score values to the terms related to the identified particular type of ambient sounds based on identifying that the terms are related to the identified particular type of ambient sounds; and
  
  analyzing the user speech by comparing each potential output word or phoneme in the user speech to waveforms stored for the dictionary to attempt to match the potential output word or phoneme to a waveform corresponding to a particular word or phoneme in the dictionary, the analyzing varying based on the assigned scores to the terms identified as contextual information.
- View Dependent Claims (14, 15)
- - 14. The non-transitory machine-readable storage medium of claim 13, wherein the speech recognition analysis includes utilizing a hidden Markov model.
  - 15. The non-transitory machine-readable storage medium of claim 13, wherein the ambient sounds include music and the identification that the ambient sounds are of the particular type includes identifying that the ambient sounds are music and identifying the music, wherein the speech recognition module is further configured to identify, as the contextual information, terms related to the identified music.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
PayPal, Inc. (PayPal Holdings, Inc.)
Original Assignee
PayPal, Inc. (PayPal Holdings, Inc.)
Inventors
Farraro, Eric J.
Primary Examiner(s)
Hudspeth, David
Assistant Examiner(s)
OGUNBIYI, OLUWADAMILOL M

Application Number

US13/874,304
Publication Number

US 20140324428A1
Time in Patent Office

1,449 Days
Field of Search

704231, 704244
US Class Current
CPC Class Codes

G10L 15/06   Creation of reference templ...

G10L 15/063   Training

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 2015/0635   updating or merging of old ...

G10L 2015/226   using non-speech characteri...

G10L 25/48   specially adapted for parti...

G10L 25/51   for comparison or discrimin...

G10L 25/84   for discriminating voice fr...

System and method of improving speech recognition using context

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

36 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

System and method of improving speech recognition using context

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

36 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links