CONTEXT-BASED SPEECH RECOGNITION

US 20150039299A1
Filed: 09/18/2013
Published: 02/05/2015
Est. Priority Date: 07/31/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving an audio signal encoding a portion of an utterance;

receiving context information associated with the utterance, wherein the context information is not derived from the audio signal or any other audio signal;

providing, as input to a neural network, data corresponding to the audio signal and the context information; and

generating a transcription for the utterance based on at least an output of the neural network.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A processing system receives an audio signal encoding a portion of an utterance. The processing system receives context information associated with the utterance, wherein the context information is not derived from the audio signal or any other audio signal. The processing system provides, as input to a neural network, data corresponding to the audio signal and the context information, and generates a transcription for the utterance based on at least an output of the neural network.

Citations

20 Claims

1. A computer-implemented method comprising:
- receiving an audio signal encoding a portion of an utterance;
  
  receiving context information associated with the utterance, wherein the context information is not derived from the audio signal or any other audio signal;
  
  providing, as input to a neural network, data corresponding to the audio signal and the context information; and
  
  generating a transcription for the utterance based on at least an output of the neural network.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1,wherein providing, as an input to a neural network, data corresponding to the audio signal and the context information comprises providing, as an input to a neural network, a set of acoustic feature vectors derived from the audio signal and data corresponding to the context information.
  - 3. The method of claim 1, wherein receiving context information associated with the utterance comprises receiving an internet protocol (IP) address of a client device from which the audio signal originated.
  - 4. The method of claim 1, wherein receiving context information associated with the utterance comprises receiving a geographic location of a client device from which the audio signal originated.
  - 5. The method of claim 1, wherein receiving context information associated with the utterance comprises receiving a search history associated with a speaker of the utterance.
  - 6. The method of claim 1, further comprising:
    - receiving a set of data derived from the audio signal, the set of data corresponding to one or more time-independent characteristics of the audio signal;
      
      wherein providing, as input to a neural network, data corresponding to the audio signal and the context information comprises providing, as input to a neural network, data corresponding to the audio signal, the context information, and the set of data derived from the audio signal.
  - 7. The method of claim 6, wherein the set of data corresponding to one or more time-independent characteristics of the audio signal includes one or more of a signal corresponding to an accent of a speaker of the utterance, a signal corresponding to background noise of the audio signal, a signal corresponding to recording channel properties of the audio signal, a signal corresponding to a pitch of the speaker, and a signal corresponding to an age of the speaker.

8. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving an audio signal encoding a portion of an utterance;
  
  receiving context information associated with the utterance, wherein the context information is not derived from the audio signal or any other audio signal;
  
  providing, as input to a neural network, data corresponding to the audio signal and the context information; and
  
  generating a transcription for the utterance based on at least an output of the neural network.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8,wherein providing, as an input to a neural network, data corresponding to the audio signal and the context information comprises providing, as an input to a neural network, a set of acoustic feature vectors derived from the audio signal and data corresponding to the context information.
  - 10. The system of claim 8, wherein receiving context information associated with the utterance comprises receiving an internet protocol (IP) address of a client device from which the audio signal originated.
  - 11. The system of claim 8, wherein receiving context information associated with the utterance comprises receiving a geographic location of a client device from which the audio signal originated.
  - 12. The system of claim 8, wherein receiving context information associated with the utterance comprises receiving a search history associated with a speaker of the utterance.
  - 13. The system of claim 8, wherein the operations further comprise:
    - receiving a set of data derived from the audio signal, the set of data corresponding to one or more time-independent characteristics of the audio signal;
      
      wherein providing, as input to a neural network, data corresponding to the audio signal and the context information comprises providing, as input to a neural network, data corresponding to the audio signal, the context information, and the set of data derived from the audio signal.
  - 14. The system of claim 13, wherein the set of data corresponding to one or more time-independent characteristics of the audio signal includes one or more of a signal corresponding to an accent of a speaker of the utterance, a signal corresponding to background noise of the audio signal, a signal corresponding to recording channel properties of the audio signal, a signal corresponding to a pitch of the speaker, and a signal corresponding to an age of the speaker.

15. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving an audio signal encoding a portion of an utterance;
  
  receiving context information associated with the utterance, wherein the context information is not derived from the audio signal or any other audio signal;
  
  providing, as input to a neural network, data corresponding to the audio signal and the context information; and
  
  generating a transcription for the utterance based on at least an output of the neural network.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-readable medium of claim 15,wherein providing, as an input to a neural network, data corresponding to the audio signal and the context information comprises providing, as an input to a neural network, a set of acoustic feature vectors derived from the audio signal and data corresponding to the context information.
  - 17. The computer-readable medium of claim 15, wherein receiving context information associated with the utterance comprises receiving an internet protocol (IP) address of a client device from which the audio signal originated.
  - 18. The computer-readable medium of claim 15, wherein receiving context information associated with the utterance comprises receiving a geographic location of a client device from which the audio signal originated.
  - 19. The computer-readable medium of claim 15, wherein receiving context information associated with the utterance comprises receiving a search history associated with a speaker of the utterance.
  - 20. The computer-readable medium of claim 15, wherein the operations further comprise:
    - receiving a set of data derived from the audio signal, the set of data corresponding to one or more time-independent characteristics of the audio signal;
      
      wherein providing, as input to a neural network, data corresponding to the audio signal and the context information comprises providing, as input to a neural network, data corresponding to the audio signal, the context information, and the set of data derived from the audio signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Weinstein, Eugene, Moreno Mengibar, Pedro J., Schalkwyk, Johan

Granted Patent

US 9,311,915 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/202
CPC Class Codes

G10L 15/16 using artificial neural net...

CONTEXT-BASED SPEECH RECOGNITION

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

CONTEXT-BASED SPEECH RECOGNITION

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links