Ratio of Speech to Non-Speech Audio such as for Elderly or Hearing-Impaired Listeners

US 20100106507A1
Filed: 02/12/2008
Published: 04/29/2010
Est. Priority Date: 02/12/2007
Status: Active Grant

First Claim

Patent Images

1. A method for enhancing speech portions of an audio program having speech and non-speech components, comprisingreceiving the audio program having speech and non-speech components, the audio program having a high quality such that when reproduced in isolation the program does not have audible artifacts that listeners would deem objectionable,receiving a copy of speech components of the audio program, the copy having a low quality such that when reproduced in isolation the copy has audible artifacts that listeners would deem objectionable, andcombining the low-quality copy of speech components and the high-quality audio program in such proportions that the ratio of speech to non-speech components in the resulting audio program is increased and the audible artifacts of the low-quality copy of speech components are masked by the high-quality audio program.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention relates to audio signal processing and speech enhancement. In accordance with one aspect, the invention combines a high-quality audio program that is a mix of speech and non-speech audio with a lower-quality copy of the speech components contained in the audio program for the purpose of generating a high-quality audio program with an increased ratio of speech to non-speech audio such as may benefit the elderly, hearing impaired or other listeners. Aspects of the invention are particularly useful for television and home theater sound, although they may be applicable to other audio and sound applications. The invention relates to methods, apparatus for performing such methods, and to software stored on a computer-readable medium for causing a computer to perform such methods.

Citations

37 Claims

1. A method for enhancing speech portions of an audio program having speech and non-speech components, comprisingreceiving the audio program having speech and non-speech components, the audio program having a high quality such that when reproduced in isolation the program does not have audible artifacts that listeners would deem objectionable,receiving a copy of speech components of the audio program, the copy having a low quality such that when reproduced in isolation the copy has audible artifacts that listeners would deem objectionable, andcombining the low-quality copy of speech components and the high-quality audio program in such proportions that the ratio of speech to non-speech components in the resulting audio program is increased and the audible artifacts of the low-quality copy of speech components are masked by the high-quality audio program.
- View Dependent Claims (3, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 32, 33, 34, 35, 36, 37)
- - 3. A method according to claim 1 or claim 2 wherein the proportions of combining the copy of speech components and the audio program are such that the speech components in the resulting audio program have substantially the same dynamic characteristics as the corresponding speech components in the audio program and the non-speech components in the resulting audio program have a compressed dynamic range relative to the corresponding non-speech components in the audio program.
  - 6. A method according to claim 3 wherein the level of speech components in the resulting audio program is substantially the same as the level of the corresponding speech components in the audio program.
  - 7. A method according to claim 6 wherein the level of non-speech components in the resulting audio program increases more slowly than the level of non-speech components in the audio program increases.
  - 8. A method according to claim 1 or claim 2 wherein the combining is in accordance with complementary scale factors applied, respectively, to the copy of speech components and to the audio program.
  - 9. A method according to claim 1 or claim 2 wherein the combining is an additive combination of the copy of speech components and the audio program in which the copy of speech components is scaled with a scale factor α
    - and the audio program is scaled with the complementary scale factor (1-α
      
      ), α
      
      having a range of 0 to 1.
  - 10. A method according to claim 9 wherein α
    - is a function of the level of non-speech components of the audio program.
  - 11. A method according to claim 9 wherein α
    - has a fixed maximum value α
      
      _max.
  - 12. A method according to claim 9 wherein α
    - has a dynamic maximum value α
      
      _max.
  - 13. A method according to claim 12 wherein the value α
    - _maxis based on a prediction of auditory masking caused by the main audio program.
  - 14. A method according to claim 12 further comprising receiving α
    - _max.
  - 15. A method according to claim 1 or claim 2 wherein the proportions of combining the copy of speech components and the audio program are such that the speech components in the resulting audio program have a compressed dynamic range relative to the corresponding speech components in the audio program and the non-speech components in the resulting audio program have substantially the same dynamic characteristics as the corresponding non-speech components in the audio program.
  - 32. Apparatus adapted to perform the methods of any one of claims 1, 2, 26, 28 and 30.
  - 33. A computer program, stored on a computer-readable medium for causing a computer to perform the methods of any one of claims 1, 2, 26, 28 and 30.
  - 34. A method according to claim 10 wherein α
    - has a fixed maximum value α
      
      _max.
  - 35. A method according to claim 10 wherein α
    - has a dynamic maximum value α
      
      _max.
  - 36. A method according to claim 35 wherein the value α
    - _maxis based on a prediction of auditory masking caused by the main audio program.
  - 37. A method according to claim 36 further comprising receiving α
    - _max.

2. A method for enhancing speech portions of an audio program having speech and non-speech components with a copy of speech components of the audio program, the copy having a low quality such that when reproduced in isolation the copy has audible artifacts that listeners would deem objectionable, comprisingcombining the low-quality copy of the speech components and the audio program in such proportions that the ratio of speech to non-speech components in the resulting audio program is increased and the audible artifacts of the low-quality copy of speech components are masked by the audio program.

4-5. -5. (canceled)

16-25. -25. (canceled)

26. A method for assembling audio information for use in enhancing speech portions of an audio program having speech and non-speech components, comprisingobtaining an audio program having speech and non-speech components,encoding the audio program with a high quality such that when decoded and reproduced in isolation the program does not have audible artifacts that listeners would deem objectionable,obtaining a copy of speech components of the audio program,encoding the copy with a low quality such that when reproduced in isolation the copy has audible artifacts that listeners would deem objectionable, andtransmitting or storing the encoded audio program and the encoded copy of speech components of the audio program.
- View Dependent Claims (27)
- - 27. A method according to claim 26 further comprising multiplexing the audio program and the copy of speech components of the audio program before transmitting or storing them.

28. A method for assembling audio information for use in enhancing speech portions of an audio program having speech and non-speech components, comprisingobtaining an audio program having speech and non-speech components,encoding the audio program with a high quality such that when decoded and reproduced in isolation the program does not have audible artifacts that listeners would deem objectionable,deriving a prediction of the auditory masking threshold of the encoded audio program,obtaining a copy of speech components of the audio program,encoding the copy with a low quality such that when reproduced in isolation the copy has audible artifacts that listeners would deem objectionable,deriving a measure of the coding noise of the encoded copy, andtransmitting or storing the encoded audio program, the prediction of its auditory masking threshold, the encoded copy of speech components of the audio program and the measure of its coding noise.
- View Dependent Claims (29)
- - 29. A method according to claim 28 further comprising multiplexing the audio program, the prediction of its auditory masking threshold, the copy of speech components of the audio program, and the measure of its coding noise before transmitting or storing them.

30. A method for assembling audio information for use in enhancing speech portions of an audio program having speech and non-speech components, comprisingobtaining an audio program having speech and non-speech components,encoding the audio program with a high quality such that when decoded and reproduced in isolation the program does not have audible artifacts that listeners would deem objectionable,deriving a prediction of the auditory masking threshold of the encoded audio program,obtaining a copy of speech components of the audio program,encoding the copy with a low quality such that when reproduced in isolation the copy has audible artifacts that listeners would deem objectionable,deriving a measure of the coding noise of the encoded copy,deriving a parameter based on a function of the prediction of the auditory masking threshold and the measure of the coding noise, andtransmitting or storing the encoded audio program, the encoded copy of speech components of the audio program and the parameter.
- View Dependent Claims (31)
- - 31. A method according to claim 30 further comprising multiplexing the audio program, the copy of speech components of the audio program, and the parameter before transmitting or storing them.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Original Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Inventors
Muesch, Hannes

Granted Patent

US 8,494,840 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/270.100
CPC Class Codes

H04R 2225/43 Signal processing in hearin...

H04R 25/356 Amplitude, e.g. amplitude s...

Ratio of Speech to Non-Speech Audio such as for Elderly or Hearing-Impaired Listeners

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

37 Claims

Specification

Solutions

Use Cases

Quick Links

Ratio of Speech to Non-Speech Audio such as for Elderly or Hearing-Impaired Listeners

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

37 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links