Background audio identification for speech disambiguation

US 10,224,024 B1
Filed: 06/14/2017
Issued: 03/05/2019
Est. Priority Date: 06/01/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving an audio stream containing first audio data and second audio data, the first audio data corresponding to an utterance spoken by a user and the second audio data corresponding to background audio associated with playback of an item of media content;

processing the second audio data to generate at least one term associated with the background audio;

adjusting a probability or relevance score associated with a speech recognition model recognizing the at least one term associated with the background audio in the first audio data;

after adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio, transcribing the first audio data into a textual representation of the utterance using the speech recognition model; and

transmitting the textual representation of the utterance to a computing device associated with the user, the textual representation of the utterance when received by the computing device causing the computing device to at least one of display the textual representation on a display or perform a particular task based on the textual representation.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Implementations relate to techniques for providing context-dependent search results. A computer-implemented method includes receiving an audio stream at a computing device during a time interval, the audio stream comprising user speech data and background audio, separating the audio stream into a first substream that includes the user speech data and a second substream that includes the background audio, identifying concepts related to the background audio, generating a set of terms related to the identified concepts, influencing a speech recognizer based on at least one of the terms related to the background audio, and obtaining a recognized version of the user speech data using the speech recognizer.

Citations

20 Claims

1. A computer-implemented method comprising:
- receiving an audio stream containing first audio data and second audio data, the first audio data corresponding to an utterance spoken by a user and the second audio data corresponding to background audio associated with playback of an item of media content;
  
  processing the second audio data to generate at least one term associated with the background audio;
  
  adjusting a probability or relevance score associated with a speech recognition model recognizing the at least one term associated with the background audio in the first audio data;
  
  after adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio, transcribing the first audio data into a textual representation of the utterance using the speech recognition model; and
  
  transmitting the textual representation of the utterance to a computing device associated with the user, the textual representation of the utterance when received by the computing device causing the computing device to at least one of display the textual representation on a display or perform a particular task based on the textual representation.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer-implemented method of claim 1, wherein processing the second audio data to generate the at least one term comprises:
    - identifying one or more concepts from audio features of the second audio data; and
      
      generating a set of terms related to the identified one or more concepts.
  - 3. The computer-implemented method of claim 2, wherein generating the set of terms related to the identified one or more concepts comprises querying a conceptual expansion database for the set of terms using the one or more concepts identified from the audio features of the second audio data.
  - 4. The computer-implemented method of claim 1, wherein receiving the audio stream containing the first audio data and the second audio data comprises receiving the first audio data and the second audio data from the computing device associated with the user, the first audio data and the second audio data captured by the computing device during a same time interval.
  - 5. The computer-implemented method of claim 2, further comprising:
    - generating conceptual bias data using the at least one term associated with the background audio; and
      
      adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio in the first audio data based on the conceptual bias data.
  - 6. The computer-implemented method of claim 2, wherein identifying the one or more concepts from the audio features of the second audio data comprises:
    - comparing the audio features of the second audio data to known audio segments;
      
      determining that the audio features of the second audio data correspond to audio features of a particular known audio segment; and
      
      identifying the one or more concepts as being related to the particular known audio segment.
  - 7. The computer-implemented method of claim 5, wherein transcribing the first audio data into the textual representation of the utterance comprises selecting, by the speech recognition model, the textual representation from a set of textual representations that have substantially similar frequencies of occurrence in a particular language by using the conceptual data to weigh a statistical selection of the textual representation from the set of textual representations.

8. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving an audio stream containing first audio data and second audio data, the first audio data corresponding to an utterance spoken by a user and the second audio data corresponding to background audio associated with playback of an item of media content;
  
  processing the second audio data to generate at least one term associated with the background audio;
  
  adjusting a probability or relevance score associated with a speech recognition model recognizing the at least one term associated with the background audio in the first audio data;
  
  after adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio, transcribing the first audio data into a textual representation of the utterance using the speech recognition model; and
  
  transmitting the textual representation of the utterance to a computing device associated with the user, the textual representation of the utterance when received by the computing device causing the computing device to at least one of display the textual representation on a display or perform a particular task based on the textual representation.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein processing the second audio data to generate the at least one term comprises:
    - identifying one or more concepts from audio features of the second audio data; and
      
      generating a set of terms related to the identified one or more concepts.
  - 10. The system of claim 9, wherein generating the set of terms related to the identified one or more concepts comprises querying a conceptual expansion database for the set of terms using the one or more concepts identified from the audio features of the second audio data.
  - 11. The system of claim 8, wherein receiving the audio stream containing the first audio data and the comprises receiving the first audio data and the second audio data from the computing device associated with the user, the first audio data and the second audio data captured by the computing device during a same time interval.
  - 12. The system of claim 9, wherein the operations further comprise:
    - generating conceptual bias data using the at least one term associated with the background audio; and
      
      adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio in the first audio data based on the conceptual bias data.
  - 13. The system of claim 9, wherein identifying the one or more concepts from the audio features of the comprises:
    - comparing the audio features of the second audio data to known audio segments;
      
      determining that the audio features of the second audio data correspond to audio features of a particular known audio segment; and
      
      identifying the one or more concepts as being related to the particular known audio segment.
  - 14. The system of claim 12, wherein transcribing the first audio data into the textual representation of the utterance comprises selecting, by the speech recognition model, the textual representation from a set of textual representations that have substantially similar frequencies of occurrence in a particular language by using the conceptual data to weigh a statistical selection of the textual representation from the set of textual representations.

15. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving an audio stream containing first audio data and second audio data, the first audio data corresponding to an utterance spoken by a user and the second audio data corresponding to background audio associated with playback of an item of media content;
  
  processing the second audio data to generate at least one term associated with the background audio;
  
  adjusting a probability or relevance score associated with a speech recognition model recognizing the at least one term associated with the background audio in the first audio data;
  
  after adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio, transcribing the first audio data into a textual representation of the utterance using the speech recognition model; and
  
  transmitting the textual representation of the utterance to a computing device associated with the user, the textual representation of the utterance when received by the computing device causing the computing device to at least one of display the textual representation on a display or perform a particular task based on the textual representation.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-readable medium of claim 15, wherein processing the second audio data to generate the at least one term comprises:
    - identifying one or more concepts from audio features of the second audio data; and
      
      generating a set of terms related to the identified one or more concepts.
  - 17. The computer-readable medium of claim 16, wherein generating the set of terms related to the identified one or more concepts comprises querying a conceptual expansion database for the set of terms using the one or more concepts identified from the audio features of the second audio data.
  - 18. The computer-readable medium of claim 15, wherein receiving the audio stream containing the first audio data and the comprises receiving the first audio data and the second audio data from the computing device associated with the user, the first audio data and the second audio data captured by the computing device during a same time interval.
  - 19. The computer-readable medium of claim 16, wherein the operations further comprise:
    - generating conceptual bias data using the at least one term associated with the background audio; and
      
      adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio in the first audio data based on the conceptual bias data.
  - 20. The computer-readable medium of claim 16, wherein identifying the one or more concepts from the audio features of the comprises:
    - comparing the audio features of the second audio data to known audio segments;
      
      determining that the audio features of the second audio data correspond to audio features of a particular known audio segment; and
      
      identifying the one or more concepts as being related to the particular known audio segment.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Sanders, Jason, Taubman, Gabriel, Lee, John J.
Primary Examiner(s)
Chawan, Vijay B

Application Number

US15/622,341
Time in Patent Office

629 Days
Field of Search

704235, 704270, 704233, 704247, 704231, 704246, 704250, 704260, 709204, 715758
US Class Current
CPC Class Codes

G06F 16/685   using automatically derived...

G10L 15/08   Speech classification or se...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/225   Feedback of the input speech

G10L 21/0208   Noise filtering

G10L 21/0272   Voice signal separating

G10L 25/48   specially adapted for parti...

H04M 2201/40   using speech recognition

H04M 2203/352   In-call/conference informat...

H04M 3/4936   Speech interaction details ...

Background audio identification for speech disambiguation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Background audio identification for speech disambiguation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links