Audio frame labeling to achieve unequal error protection for audio frames of unequal importance

US 10,354,660 B2
Filed: 04/28/2017
Issued: 07/16/2019
Est. Priority Date: 04/28/2017
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving a sequence of audio frames;

labeling each audio frame with a respective one of multiple possible labels for silence, noise, concealable speech, and un-concealable speech based on content in the audio frame;

determining for each audio frame based on the respective label of the audio frame a respective importance level among possible importance levels ranging from a low importance level to a high importance;

producing different subsets of audio frames such that all of the audio frames in each subset have the same label;

applying forward error correction to a respective fraction of audio frames of each subset of audio frames, such that the respective fraction increases in an order of the labels silence, noise, concealable speech and un-concealable speech, wherein the applying forward error correction further includes applying to the audio frames labeled as un-concealable speech first forward error correction that uses a sequential recovery code, and applying to audio frames not labeled as un-concealable speech second forward error correction that uses a code having a longer delay than the sequential recovery code; and

not applying forward error correction to remaining audio frames in each subset.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An endpoint device receives a sequence of audio frames. The endpoint device determines for each audio frame a respective importance level among possible importance levels ranging from a low importance level to a high importance level based on content in the audio frame indicative of the respective importance level. The endpoint device associates each audio frame with the respective importance level, to produce different subsets of audio frames associated with respective ones of different importance levels. The endpoint device, for each subset of audio frames, applies forward error correction to a fraction of audio frames in the subset of audio frames, wherein the fraction increases as the importance level of the audio frames in the subset increases, and does not apply forward error correction to remaining audio frames in the subset.

Citations

20 Claims

1. A method comprising:
- receiving a sequence of audio frames;
  
  labeling each audio frame with a respective one of multiple possible labels for silence, noise, concealable speech, and un-concealable speech based on content in the audio frame;
  
  determining for each audio frame based on the respective label of the audio frame a respective importance level among possible importance levels ranging from a low importance level to a high importance;
  
  producing different subsets of audio frames such that all of the audio frames in each subset have the same label;
  
  applying forward error correction to a respective fraction of audio frames of each subset of audio frames, such that the respective fraction increases in an order of the labels silence, noise, concealable speech and un-concealable speech, wherein the applying forward error correction further includes applying to the audio frames labeled as un-concealable speech first forward error correction that uses a sequential recovery code, and applying to audio frames not labeled as un-concealable speech second forward error correction that uses a code having a longer delay than the sequential recovery code; and
  
  not applying forward error correction to remaining audio frames in each subset.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, the method further comprising:
    - storing a mapping of each label and importance level among the possible importance levels to a corresponding fraction of audio frames that, when associated with that importance level, are to receive forward error correction, wherein the mapping is such that the fraction of audio frames that are to receive forward error correction increases as the importance level increases,wherein, for each subset of audio frames, the applying the forward error correction includes applying the forward error correction to the fraction of the audio frames in the subset as indicated by the mapping.
  - 3. The method of claim 1, wherein the determining for each audio frame the respective importance level includes:
    - determining, based on the content in the audio frame, an ease with which a loss of the audio frame is able to be concealed from human hearing using error concealment; and
      
      assigning the respective importance level to the audio frame such that the respective importance level increases as the ease with which the loss of the audio frame is able to be concealed from human hearing decreases.
  - 4. The method of claim 3, wherein:
    - the determining the ease with which the loss of the audio frame is able to be concealed includes determining whether the audio frame includes either silence/noise or speech, wherein silence/noise is deemed to be more easily concealed than speech; and
      
      the assigning includes assigning either a lower importance level or a higher importance level that is greater than the lower importance level to the audio frame when the audio frame is determined to include either silence/noise or speech, respectively.
  - 5. The method of claim 1, wherein the determining for each audio frame a respective importance level includes:
    - determining one or more speech phonemes in the audio frame when the audio frame includes speech; and
      
      assigning the respective importance level such that the respective importance level increases as an importance of the one or more speech phonemes to the human intelligibility of speech increases.
  - 6. The method of claim 1, further comprising:
    - storing information that maps each of the possible importance levels to corresponding ones of the fractions of audio frames that are to receive the forward error correction, wherein each fraction is in a range from 0 to 1.
  - 7. The method of claim 1, further comprising:
    - for each subset of audio frames, transmitting the remaining audio frames that did not receive forward error correction, and transmitting encoded audio frames produced as a result of the applying the forward error correction.
  - 8. The method of claim 7, further comprising:
    - monitoring available transmission bandwidth for the transmitting; and
      
      increasing or decreasing the fractions of audio frames that are to receive error correction as the monitoring indicates corresponding increasing and decreasing available transmission bandwidth, respectively.
  - 9. The method of claim 1, further comprising:
    - selecting different types of forward error correction based on different importance levels,wherein the applying forward error correction includes applying the selected type of forward error correction to the audio frames.
  - 10. The method of claim 1, wherein the second forward error correction that uses the code having a larger delay than the sequential recovery code includes forward error correction that uses a random linear code (RLC) or a Maximum Distance Separable (MDS) code.
  - 11. The method of claim 1, wherein the applying further includes generating forward error correction packets from the subsets, each forward error correction packet based on audio frames from different ones of the subsets of the audio frames.

12. An apparatus comprising:
- a network interface unit configured to enable communications over a communication network; and
  
  a processor coupled to the network interface unit and configured to;
  
  receive a sequence of audio frames;
  
  label each audio frame with a respective one of multiple possible labels for silence, noise, concealable speech, and un-concealable speech based on content in the audio frame;
  
  determine for each audio frame based on the respective label of the audio frame a respective importance level among possible importance levels ranging from a low importance level to a high importance level;
  
  produce different subsets of audio frames such that all of the audio frames in each subset have the same label;
  
  apply forward error correction to a respective fraction of audio frames of each subset of audio frames, such that the respective fraction increases in an order of the labels silence, noise, concealable speech and un-concealable speech, wherein the processor is configured to apply the forward error correction by applying to the audio frames labeled as un-concealable speech first forward error correction that uses a sequential recovery code, and applying to audio frames not labeled as un-concealable speech second forward error correction that uses a code having a longer delay than the sequential recovery code; and
  
  not apply forward error correction to remaining audio frames in the subset.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The apparatus of claim 12, wherein the processor is further configured to:
    - store a mapping of each label and importance level among the possible importance levels to a corresponding fraction of audio frames that, when associated with that importance level, are to receive forward error correction, wherein the processor is configured to map such that the fraction of audio frames that are to receive forward error correction increases as the importance level increases,wherein, for each subset of audio frames, the processor is configured to apply the forward error correction by applying the forward error correction to the fraction of the audio frames in the subset as indicated by the mapping.
  - 14. The apparatus of claim 12, wherein the processor is configured to determine for each audio frame the respective importance level by:
    - determining, based on the content in the audio frame, an ease with which a loss of the audio frame is able to be concealed from human hearing using error concealment; and
      
      assigning the respective importance level to the audio frame such that the respective importance level increases as the ease with which the loss of the audio frame is able to be concealed from human hearing decreases.
  - 15. The apparatus of claim 14, wherein:
    - the processor is configured to perform the determining the ease with which the loss of the audio frame is able to be concealed by determining whether the audio frame includes either silence/noise or speech, wherein silence/noise is deemed to be more easily concealed than speech; and
      
      the processor is configured to perform the assigning by assigning either a lower importance level or a higher importance level that is greater than the lower importance level to the audio frame when the audio frame is determined to include either silence/noise or speech, respectively.
  - 16. The apparatus of claim 12, wherein the processor is configured to determine for each audio frame a respective importance level by:
    - determining one or more speech phonemes in the audio frame when the audio frame includes speech; and
      
      assigning the respective importance level such that the respective importance level increases as an importance of the one or more speech phonemes to the human intelligibility of speech increases.

17. A non-transitory processor readable medium encoded with instructions that, when executed by a processor, cause the processor to perform operations including:
- receiving a sequence of audio frames;
  
  labeling each audio frame with a respective one of multiple possible labels for silence, noise, concealable speech, and un-concealable speech based on content in the audio frame;
  
  determining for each audio frame based on the respective label of the audio frame a respective importance level among possible importance levels ranging from a low importance level to a high importance level;
  
  producing different subsets of audio frames such that all of the audio frames in each subset have the same label;
  
  applying forward error correction to a respective fraction of audio frames of each subset of audio frames, such that the respective fraction increases in an order of the labels silence, noise, concealable speech and un-concealable speech, wherein the applying forward error correction further includes applying to the audio frames labeled as un-concealable speech first forward error correction that uses a sequential recovery code, and applying to audio frames not labeled as un-concealable speech second forward error correction that uses a code having a longer delay than the sequential recovery code; and
  
  not applying forward error correction to remaining audio frames in the subset.
- View Dependent Claims (18, 19, 20)
- - 18. The non-transitory processor readable medium of claim 17, further comprising instructions to cause the processor to perform:
    - storing a mapping of each label and importance level among the possible importance levels to a corresponding fraction of audio frames that, when associated with that importance level, are to receive forward error correction, wherein the mapping is such that the fraction of audio frames that are to receive forward error correction increases as the importance level increases,wherein the instructions to cause the processor to perform the applying further comprise instructions to cause the processor to perform, for each subset of audio frames, applying the forward error correction to the fraction of the audio frames in the subset as indicated by the mapping.
  - 19. The non-transitory processor readable medium of claim 17, wherein the instructions to cause the processor to perform the determining for each audio frame the respective importance level include instructions to cause the processor to perform:
    - determining, based on the content in the audio frame, an ease with which a loss of the audio frame is able to be concealed from human hearing using error concealment; and
      
      assigning the respective importance level to the audio frame such that the respective importance level increases as the ease with which the loss of the audio frame is able to be concealed from human hearing decreases.
  - 20. The non-transitory processor readable medium of claim 19, wherein:
    - the instructions to cause the processor to perform determining the ease with which the loss of the audio frame is able to be concealed include instructions to cause the processor to perform determining whether the audio frame includes either silence/noise or speech, wherein silence/noise is deemed to be more easily concealed than speech; and
      
      the instructions to cause the processor to perform assigning include instructions to cause the processor to perform assigning either a lower importance level or a higher importance level that is greater than the lower importance level to the audio frame when the audio frame is determined to include either silence/noise or speech, respectively.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Original Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Inventors
Badr, Ahmed, Khisti, Ashish J., Tan, Wai-tian, Ramalho, Michael A., Apostolopoulos, John G.
Primary Examiner(s)
Blankenagel, Bryan S

Application Number

US15/581,104
Publication Number

US 20180315431A1
Time in Patent Office

809 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 19/005   Correction of errors induce...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 25/84   for discriminating voice fr...

G10L 25/93   Discriminating between voic...

H04L 1/004   by using forward error cont...

H04L 1/007   Unequal error protection fo...

H04L 2001/0098   Unequal error protection

Audio frame labeling to achieve unequal error protection for audio frames of unequal importance

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Audio frame labeling to achieve unequal error protection for audio frames of unequal importance

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links