Audio signal transmission techniques

US 9,111,542 B1
Filed: 03/26/2012
Issued: 08/18/2015
Est. Priority Date: 03/26/2012
Status: Expired due to Fees

First Claim

Patent Images

1. A method comprising:

under control of one or more computing devices configured with executable instructions,obtaining a first audio signal from a first audio transducer and a second audio signal from a second audio transducer, wherein the first audio transducer and the second audio transducer are included in an audio transducer array located within an environment;

determining a time offset between the first audio signal and the second audio signal based at least in part on a time-difference-of-arrival between the first audio signal and the second audio signal;

identifying, based at least in part on the time offset, at least one difference between at least a portion of the first audio signal and at least a portion of the second audio signal; and

sending, to one or more computing resources that are remote from the environment, the first audio signal, information indicative of the at least one difference between the first audio signal and the second audio signal, and an indication of the time offset between the first audio signal and the second audio signal, the one or more computing resources that are remote from the environment being configured to;

(i) construct a representation of the second audio signal using the first audio signal, the information indicative of the at least one difference between the first audio signal and the second audio signal, and the indication of the time offset between the first audio signal and the second audio signal, and (ii) perform automatic speech recognition (ASR) to identify a voice command present in at least one of the first audio signal or the representation of the second audio signal.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice interaction architecture that compiles multiple audio signals captured at different locations within an environment, determines a time offset between a primary audio signal and other captured audio signals and identifies differences between the primary signal and the other signal(s). Thereafter, the architecture may provide the primary audio signal, an indication of the determined time offset(s) and the identified differences to remote computing resources for further processing. For instance, the architecture may send this information to a network-accessible distributed computing platform that performs beamforming and/or automatic speech recognition (ASR) on the received audio. The distributed computing platform may in turn determine a response to provide based upon the beamforming and/or ASR.

Citations

30 Claims

1. A method comprising:
- under control of one or more computing devices configured with executable instructions,obtaining a first audio signal from a first audio transducer and a second audio signal from a second audio transducer, wherein the first audio transducer and the second audio transducer are included in an audio transducer array located within an environment;
  
  determining a time offset between the first audio signal and the second audio signal based at least in part on a time-difference-of-arrival between the first audio signal and the second audio signal;
  
  identifying, based at least in part on the time offset, at least one difference between at least a portion of the first audio signal and at least a portion of the second audio signal; and
  
  sending, to one or more computing resources that are remote from the environment, the first audio signal, information indicative of the at least one difference between the first audio signal and the second audio signal, and an indication of the time offset between the first audio signal and the second audio signal, the one or more computing resources that are remote from the environment being configured to;
  
  (i) construct a representation of the second audio signal using the first audio signal, the information indicative of the at least one difference between the first audio signal and the second audio signal, and the indication of the time offset between the first audio signal and the second audio signal, and (ii) perform automatic speech recognition (ASR) to identify a voice command present in at least one of the first audio signal or the representation of the second audio signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. A method as recited in claim 1, further comprising:
    - obtaining a third audio signal from a third audio transducer within the environment;
      
      determining a time offset between the third audio signal and the first audio signal; and
      
      identifying, based at least in part on the time offset between the third audio signal and the first audio signal, at least one difference between at least a portion of the first audio signal and the third audio signal; and
      
      wherein sending further comprises sending information indicative of the at least one difference between the first audio signal and the third audio signal, and the determined time offset between the first audio signal and the third audio signal.
  - 3. A method as recited in claim 1, wherein:
    - the first audio signal includes a first set of segments and the second audio signal comprises a second set of segments;
      
      sending of the first audio signal includes sending the first set of segments; and
      
      sending information indicative of the at least one difference between the first audio signal and the second audio signal includes sending at least one segment of the second set of segments.
  - 4. A method as recited in claim 3, wherein the at least one segment of the second set of segments includes an audio signal that is different than audio signals of the first set of segments.
  - 5. A method as recited in claim 1, wherein:
    - the first audio signal includes a first set of segments and the second audio signal includes a second set of segments;
      
      sending of the first audio signal includes sending the first set of segments; and
      
      sending information indicative of the at least one difference between the first audio signal and the second audio signal includes sending information indicative of differences between corresponding segments of the first audio signal and the second audio signal.
  - 6. A method as recited in claim 1, wherein the time-difference-of-arrival between the first audio signal and the second audio signal is determined using cross-correlation.
  - 7. A method as recited in claim 1, wherein the one or more computing resources that are remote from the environment form a portion of a network-accessible computing platform.
  - 8. A method as recited in claim 1, wherein the one or more computing resources that are remote from the environment form a portion of a distributed computing platform.
  - 9. A method as recited in claim 1, further comprising compressing at least one of the first audio signal and the information indicative of the at least one difference between the first audio signal and the second audio signal prior to the sending.
  - 10. A method as recited in claim 1, further comprising removing from the first audio signal and at least a portion of the second audio signal an audio frequency that is included in the first audio signal and the second audio signal prior to the sending.
  - 11. A method as recited in claim 10, wherein the audio frequency is not perceptible to a human ear.
  - 12. A method as recited in claim 10, wherein the audio frequency is constant.

13. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed on one or more processors, cause the one or more processors to perform acts comprising:
- receiving during a time interval a first signal based on audio within an environment;
  
  receiving during the time interval a second signal based on the audio within the environment;
  
  identifying difference information between the first signal and the second signal; and
  
  sending the first signal and the difference information to one or more computing resources that are remote from the environment, the one or more computing resources that are remote from the environment configured to;
  
  (i) construct a representation of the second signal using the first signal and the difference information, and (ii) perform automatic speech recognition (ASR) using at least one of the first signal or the representation of the second signal to identify a voice command uttered within the environment.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 14. One or more non-transitory computer-readable media as recited in claim 13, wherein the first signal represents audio captured by a first audio transducer at a first location in the environment and the second signal represents the audio captured by a second audio transducer at a second location in the environment.
  - 15. One or more non-transitory computer-readable media as recited in claim 13, wherein the difference information includes at least a portion of the second signal.
  - 16. One or more non-transitory computer-readable media as recited in claim 13, wherein the difference information identifies that there are no differences between the first signal and the second signal that exceed a threshold.
  - 17. One or more non-transitory computer-readable media as recited in claim 13, wherein the difference information includes information indicative of a difference between the first signal and the second signal that exceeds a threshold.
  - 18. One or more non-transitory computer-readable media as recited in claim 13, the acts further comprising determining a time offset between the first signal and the second signal.
  - 19. One or more non-transitory computer-readable media as recited in claim 18, wherein the identifying comprises:
    - aligning segments of the first signal to corresponding segments of the second signal based at least in part on the determined time offset; and
      
      comparing the aligned segments to identify the difference information of the first signal and the second signal.
  - 20. One or more non-transitory computer-readable media as recited in claim 18, wherein sending further includes sending the determined time offset to the one or more computing resources that are remote from the environment.
  - 21. One or more non-transitory computer-readable media as recited in claim 13, the acts further comprising:
    - receiving during the first time interval a third signal based on the audio in the environment; and
      
      identifying difference information between the third signal and the first signal.
  - 22. One or more non-transitory computer-readable media as recited in claim 21, wherein sending further includes sending the identified difference information between the third signal and the first signal.
  - 23. One or more non-transitory computer-readable media as recited in claim 21, the acts further comprising:
    - determining a time offset between the first signal and the third signal; and
      
      wherein the sending further includes sending;
      
      the identified difference information between the third signal and the first signal; and
      
      the determined time offset between the first signal and the third signal.
  - 24. One or more non-transitory computer-readable media as recited in claim 13, the acts further comprising:
    - receiving during the first time interval a third signal based on the audio in the environment; and
      
      identifying difference information between the third signal and the second signal.

25. One or more computing resources comprising:
- one or more processors; and
  
  one or more computer-readable media storing computer-executable instructions that, when executed on the one or more processors, cause the one or more processors to perform acts comprising;
  
  receiving a first signal based at least in part on audio captured by a first audio transducer of an audio transducer array located within an environment;
  
  receiving information indicative of a difference between the first signal and a second signal based at least in part on the audio as captured by a second audio transducer of the audio transducer array;
  
  constructing a representation of the second signal based at least in part on the received first signal and the received information indicative of the difference between the first signal and the second signal; and
  
  performing, using at least one of the first signal or the constructed representation of the second signal, automatic speech recognition (ASR) to identify a voice command uttered within the environment.
- View Dependent Claims (26, 27, 28, 29, 30)
- - 26. One or more computing resources as recited in claim 25, wherein receiving information indicative of a difference includes receiving individual segments of the second signal.
  - 27. One or more computing devices as recited in claim 25, the acts further comprising receiving an indication of a time offset between the first signal and the second signal, and wherein constructing a representation of the second signal is based at least in part on the indication of the time offset between the first signal and the second signal.
  - 28. One or more computing devices as recited in claim 25, the acts further comprising:
    - receiving information indicative of a difference between the first signal and a third signal, the third signal representing the audio captured by a third audio transducer at a third location within the environment;
      
      receiving an indication of a time offset between first signal and the third signal;
      
      constructing a representation of the third audio signal based at least in part on;
      
      the information indicative of the difference between the first signal and the third signal; and
      
      the time offset between the first signal and the third signal.
  - 29. One or more computing devices as recited in claim 25, the acts further comprising:
    - determining, based at least in part on the received first signal and the constructed representation of the second signal, a response to provide to one or more computing devices at the environment; and
      
      providing data indicating the response to the one or more computing devices at the environment.
  - 30. One or more computing devices as recited in claim 25, the acts further comprising:
    - receiving information indicative of a difference between the second signal and a third signal, the third signal representing the audio captured by a third audio transducer at a third location within the environment;
      
      receiving an indication of a time offset between second signal and the third signal;
      
      constructing a representation of the third audio signal based at least in part on;
      
      the constructed representation of the second signal;
      
      the information indicative of the difference between the second signal and the third signal; and
      
      the time offset between the second signal and the third signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Hart, Gregory M., Bezos, Jeffrey P.
Primary Examiner(s)
COLUCCI, MICHAEL C

Application Number

US13/430,407
Time in Patent Office

1,240 Days
Field of Search

704/275, 704/201, 704/215, 704/224, 704/226, 704/231, 704/233, 704/234, 704/250, 704/251, 704/254, 704/270, 704/500, 375/285, 381/56, 381/66, 381/92, 381/94.7, 455/570
US Class Current

1/1
CPC Class Codes

G10L 15/20   Speech recognition techniqu...

G10L 15/30   Distributed recognition, e....

G10L 19/00   Speech or audio signals ana...

G10L 2021/02082   the noise being echo, rever...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/00   Speech or voice signal proc...

G10L 21/0216   characterised by the method...

G10L 21/0272   Voice signal separating

Audio signal transmission techniques

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Audio signal transmission techniques

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links