Progressive encoding of audio

US 8,509,931 B2
Filed: 09/30/2011
Issued: 08/13/2013
Est. Priority Date: 09/30/2010
Status: Active Grant

First Claim

Patent Images

1. A system, comprising:

one or more computers; and

a computer-readable medium coupled to the one or more computers and having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;

retrieving a digital audio signal;

processing the digital audio signal to generate a first sub-set of data, the first sub-set of data defining a first portion of the digital audio signal, the first sub-set of data represented as a first node in a direct acyclic graph;

transmitting the first sub-set of data for generation of a first version of a reconstructed audio signal, the first version of the reconstructed audio signal having a first fidelity relative to the digital audio signal;

receiving a first signal indicating that speech from the first version of the reconstructed audio signal was not recognized;

in response to receiving the first signal, processing the digital audio signal to generate a second sub-set of data and a third sub-set of data, the second sub-set of data defining a second portion of the digital audio signal and comprising data that is different than data of the first sub-set of data, and the third sub-set of data defining a third portion of the digital audio signal and comprising data that is different than data of the first and second sub-sets of data, the second and the third sub-set of data represented as a second and a third node, respectively, in the direct acyclic graph, the graph including edges between the first, the second, and the third nodes based on dependencies between the first, the second, and the third nodes;

comparing a priority of the second sub-set of data to a priority of the third sub-set of data, the comparing including;

identifying a particular node of the second and the third nodes for which each of the remaining nodes of the first, the second, and the third nodes that has an edge pointing to the particular node is previously transmitted;

transmitting, based on the identifying, at least one of the second sub-set of data and the third sub-set of data, wherein at least one of the second sub-set of data and the third sub-set of data is useable to obtain a second version of the reconstructed audio signal having a second fidelity relative to the digital audio signal, the second fidelity greater than the first fidelity;

receiving a second signal indicating that speech from the second version of the reconstructed audio signal was recognized; and

in response to receiving the second signal, ceasing generation of subsequent sub-sets of data based on the digital audio signal.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present disclosure includes processing a signal to generate a first sub-set of data, transmitting the first sub-set of data for generation of a reconstructed audio signal, the reconstructed audio signal having a fidelity relative to the signal, processing the signal to generate a second sub-set of data and a third sub-set of data, the second sub-set of data defining a second portion of the signal and comprising data that is different than data of the first sub-set of data, and the third sub-set of data defining a third portion of the signal and comprising data that is different than data of the first and second sub-sets of data, comparing a priority of the second sub-set of data to a priority of the third sub-set of data, and transmitting one of the second sub-set of data and the third sub-set of data over the network for improving the fidelity.

16 Citations

View as Search Results

25 Claims

1. A system, comprising:
- one or more computers; and
  
  a computer-readable medium coupled to the one or more computers and having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;
  
  retrieving a digital audio signal;
  
  processing the digital audio signal to generate a first sub-set of data, the first sub-set of data defining a first portion of the digital audio signal, the first sub-set of data represented as a first node in a direct acyclic graph;
  
  transmitting the first sub-set of data for generation of a first version of a reconstructed audio signal, the first version of the reconstructed audio signal having a first fidelity relative to the digital audio signal;
  
  receiving a first signal indicating that speech from the first version of the reconstructed audio signal was not recognized;
  
  in response to receiving the first signal, processing the digital audio signal to generate a second sub-set of data and a third sub-set of data, the second sub-set of data defining a second portion of the digital audio signal and comprising data that is different than data of the first sub-set of data, and the third sub-set of data defining a third portion of the digital audio signal and comprising data that is different than data of the first and second sub-sets of data, the second and the third sub-set of data represented as a second and a third node, respectively, in the direct acyclic graph, the graph including edges between the first, the second, and the third nodes based on dependencies between the first, the second, and the third nodes;
  
  comparing a priority of the second sub-set of data to a priority of the third sub-set of data, the comparing including;
  
  identifying a particular node of the second and the third nodes for which each of the remaining nodes of the first, the second, and the third nodes that has an edge pointing to the particular node is previously transmitted;
  
  transmitting, based on the identifying, at least one of the second sub-set of data and the third sub-set of data, wherein at least one of the second sub-set of data and the third sub-set of data is useable to obtain a second version of the reconstructed audio signal having a second fidelity relative to the digital audio signal, the second fidelity greater than the first fidelity;
  
  receiving a second signal indicating that speech from the second version of the reconstructed audio signal was recognized; and
  
  in response to receiving the second signal, ceasing generation of subsequent sub-sets of data based on the digital audio signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The system of claim 1, wherein the second sub-set of data includes more data than the first sub-set of data.
  - 3. The system of claim 1, wherein transmitting further comprises transmitting only one of the second sub-set of data and the third sub-set of data, wherein the operations further comprise subsequently transmitting the other of the second sub-set of data and the third sub-set of data to obtain a third version of the reconstructed audio signal having a third fidelity relative to the digital audio signal, the third fidelity of the third version of the the reconstructed audio signal greater than the second fidelity.
  - 4. The system of claim 1, wherein the third sub-set of data includes more data than each of the second sub-set of data and the first sub-set of data.
  - 5. The system of claim 1, wherein processing the digital audio signal to generate a first sub-set of data comprises:
    - determining an original sampling rate of the digital audio signal; and
      
      down-sampling data of the digital audio signal at a first sampling rate that is less than the original sampling rate to provide the first sub-set of data.
  - 6. The system of claim 5, wherein processing the digital audio signal to generate a second sub-set of data comprises:
    - up-sampling data of the first sub-set of data at the original sampling rate to provide first up-sampled data;
      
      subtracting the first up-sampled data from data of the digital audio signal to provide first residual data; and
      
      down-sampling the first residual data at a second sampling rate that is greater than the first sampling rate and that is less than the original sampling rate to provide the second sub-set of data.
  - 7. The system of claim 6, wherein processing the digital audio signal to generate a third sub-set of data comprises:
    - up-sampling data of the second sub-set of data at the original sampling rate to provide second up-sampled data; and
      
      subtracting the second up-sampled data from the first residual data to provide second residual data, the second residual data defining the third sub-set of data.
  - 8. The system of claim 1, wherein processing the digital audio signal to generate a first sub-set of data comprises:
    - determining a bit-depth of data of the digital audio signal; and
      
      extracting a first bit of each sample of the data of the digital audio signal to provide first extracted data, the first extracted data defining the first sub-set of data and the first bit being determined based on the bit-depth.
  - 9. The system of claim 8, wherein processing the digital audio signal to generate a second sub-set of data comprises extracting a second bit of each sample of the data of the data set to provide second extracted data, the second extracted data defining the second sub-set of data and the second bit being determined based on the bit-depth.
  - 10. The system of claim 1, wherein the first signal further indicates that the fidelity of the first version of the reconstructed audio signal is less than a threshold fidelity.
  - 11. The system of claim 10, wherein the first signal further indicates that the fidelity of the second version of the reconstructed audio signal is greater than the threshold fidelity.
  - 12. The system of claim 1, wherein the operations further comprise compressing the first sub-set of data and the one of the second sub-set of data and the third sub-set of data.

13. A system, comprising:
- one or more computers; and
  
  one or more computer-readable media coupled to the one or more computers and having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;
  
  receiving a first sub-set of data, the first sub-set of data having been generated based on a digital audio signal, the first sub-set of data represented as a first node in a direct acyclic graph;
  
  processing the first sub-set of data to generate a first version of a reconstructed audio signal, the first version of the reconstructed audio signal having a first fidelity relative to the digital audio signal;
  
  determining that speech from the first version of the reconstructed audio signal was not recognized and, in response, transmitting a first signal for transmission of subsequent sub-sets of data;
  
  receiving at least one of a second sub-set of data and a third sub-set of data based on a comparison of a priority of the second sub-set of data to a priority of the third sub-set of data, the second sub-set of data defining a second portion of the digital audio signal and comprising data that is different than data of the first sub-set of data, and the third sub-set of data defining a third portion of the digital audio signal and comprising data that is different than data of the first and second sub-sets of data, the second and the third sub-set of data represented as a second and a third node, respectively, in the direct acyclic graph, the graph including edges between the first, the second, and the third nodes based on dependencies between the first, the second, and the third nodes;
  
  processing the at least one of the second sub-set of data and third sub-set of data, wherein at least one of the second sub-set of data and third sub-set of data is useable to obtain a second version of the reconstructed audio signal having a second fidelity relative to the digital audio signal, the second fidelity greater than the first fidelity; and
  
  determining that speech from the second version of the reconstructed audio signal was recognized and, in response, transmitting a second signal for ceasing generation of subsequent sub-sets of data based on the digital audio signal,wherein receiving at least one of the second sub-set of data and the third sub-set of data includes receiving a particular node of the second and the third nodes for which each of the remaining nodes of the first, the second, and the third nodes that has an edge pointing to the particular node is previously transmitted.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 14. The system of claim 13, wherein the second sub-set of data includes more data than the first sub-set of data.
  - 15. The system of claim 13, wherein processing further comprises processing only one of the second sub-set of data and the third sub-set of data, wherein the operations further comprise:
    - receiving the other of the second sub-set of data and the third sub-set of data; and
      
      processing the other of the second sub-set of data and the third sub-set of data to obtain a third version of the reconstruction audio signal having a third fidelity relative to the digital audio signal, the third fidelity of the third version of the reconstructed audio signal greater than the second fidelity.
  - 16. The system of claim 13, wherein the third sub-set of data includes more data than each of the second sub-set of data and the first sub-set of data.
  - 17. The system of claim 13, wherein processing the first sub-set of data comprises up-sampling data of the first data sub-set at an original sampling rate of the data set to provide first up-sampled data, the first version of the reconstructed signal being generated based on the first up-sampled data, and the first data sub-set having been generated using a first sampling rate that is less than the original sampling rate.
  - 18. The system of claim 17, wherein processing one of the second sub-set of data and the third sub-set of data comprises up-sampling data of the one of the second sub-set of data and the third sub-set of data at the original sampling rate to provide second up-sampled data, the second up-sampled data being added to the first version of the reconstructed audio signal to obtain the second version of the reconstructed audio signal, and the one of the second sub-set of data and the third sub-set of data having been generated using a second sampling rate that is less than the original sampling rate and that is greater than the first sampling rate.
  - 19. The system of claim 18, wherein the operations further comprise:
    - up-sampling data of the other of the second sub-set of data and the third sub-set of data at the original sampling rate to provide third up-sampled data; and
      
      adding the third up-sampled data to the second version of the reconstructed audio signal to obtain a third version of the reconstructed audio signal.
  - 20. The system of claim 13, wherein the first sub-set of data is generated by extracting a first bit of each sample of data of the digital audio signal to provide first extracted data, the first extracted data defining the first sub-set of data and the first bit being determined based on a bit-depth.
  - 21. The system of claim 20, wherein the second sub-set of data is generated by extracting a second bit of each sample of data of the digital audio signal to provide second extracted data, the second extracted data defining the second sub-set of data and the second bit being determined based on the bit-depth.
  - 22. The system of claim 13, wherein the first signal further indicates that the fidelity of the first version of the reconstructed audio signal is less than a threshold fidelity and the second signal further indicates that the fidelity of the second version of the reconstructed audio signal is greater than the threshold fidelity.
  - 23. The system of claim 13, wherein the operations further comprise decompressing the first sub-set of data and the one of the second sub-set of data and third sub-set of data.

24. A method, comprising:
- receiving a first sub-set of data, the first sub-set of data having been generated based on a digital audio signal, the first sub-set of data represented as a first node in a direct acyclic graph;
  
  processing the first sub-set of data to generate a first version of a reconstructed audio signal, the first version of the reconstructed audio signal having a first fidelity relative to the digital audio signal;
  
  determining that speech from the first version of the reconstructed audio signal was not recognized and, in response, transmitting a first signal for transmission of subsequent sub-sets of data;
  
  receiving at least one of a second sub-set of data and a third sub-set of data based on a comparison of a priority of the second sub-set of data to a priority of the third sub-set of data, the second sub-set of data defining a second portion of the digital audio signal and comprising data that is different than data of the first sub-set of data, and the third sub-set of data defining a third portion of the digital audio signal and comprising data that is different than data of the first and second sub-sets of data, the second and the third sub-set of data represented as a second and a third node, respectively, in the direct acyclic graph, the graph including edges between the first, the second, and the third nodes based on dependencies between the first, the second, and the third nodes;
  
  processing the at least one of the second sub-set of data and third sub-set of data, wherein at least one of the second sub-set of data and third sub-set of data is useable to obtain a second version of the reconstructed audio signal having a second fidelity relative to the digital audio signal, the second fidelity greater than the first fidelity; and
  
  determining that speech from the second version of the reconstructed audio signal was recognized and, in response, transmitting a second signal for ceasing generation of subsequent sub-sets of data based on the digital audio signal,wherein receiving at least one of the second sub-set of data and the third sub-set of data includes receiving a particular node of the second and the third nodes for which each of the remaining nodes of the first, the second, and the third nodes that has an edge pointing to the particular node is previously transmitted.

25. One or more non-transitory computer-readable media coupled to one or more computers and having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising:
- receiving a first sub-set of data, the first sub-set of data having been generated based on a digital audio signal, the first sub-set of data represented as a first node in a direct acyclic graph;
  
  processing the first sub-set of data to generate a first version of a reconstructed audio signal, the first version of the reconstructed audio signal having a first fidelity relative to the digital audio signal;
  
  determining that speech from the first version of the reconstructed audio signal was not recognized and, in response, transmitting a first signal for transmission of subsequent sub-sets of data;
  
  receiving at least one of a second sub-set of data and a third sub-set of data based on a comparison of a priority of the second sub-set of data to a priority of the third sub-set of data, the second sub-set of data defining a second portion of the digital audio signal and comprising data that is different than data of the first sub-set of data, and the third sub-set of data defining a third portion of the digital audio signal and comprising data that is different than data of the first and second sub-sets of data, the second and the third sub-set of data represented as a second and a third node, respectively, in the direct acyclic graph, the graph including edges between the first, the second, and the third nodes based on dependencies between the first, the second, and the third nodes;
  
  processing the at least one of the second sub-set of data and third sub-set of data, wherein at least one of the second sub-set of data and third sub-set of data is useable to obtain a second version of the reconstructed audio signal having a second fidelity relative to the digital audio signal, the second fidelity greater than the first fidelity; and
  
  determining that speech from the second version of the reconstructed audio signal was recognized and, in response, transmitting a second signal for ceasing generation of subsequent sub-sets of data based on the digital audio signal,wherein receiving at least one of the second sub-set of data and the third sub-set of data includes receiving a particular node of the second and the third nodes for which each of the remaining nodes of the first, the second, and the third nodes that has an edge pointing to the particular node is previously transmitted.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Lloyd, Matthew I., Jansche, Martin
Primary Examiner(s)
MCCORD, PAUL C

Application Number

US13/250,576
Publication Number

US 20120084089A1
Time in Patent Office

683 Days
Field of Search

700/94
US Class Current

700/94
CPC Class Codes

G10L 15/18   using natural language mode...

G10L 15/20   Speech recognition techniqu...

G10L 2015/223   Execution procedure of a sp...

Progressive encoding of audio

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

16 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Progressive encoding of audio

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

16 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links