Creating, rendering and interacting with a multi-faceted audio cloud

US 10,007,724 B2
Filed: 06/29/2012
Issued: 06/26/2018
Est. Priority Date: 06/29/2012
Status: Active Grant

First Claim

Patent Images

1. An apparatus comprising:

at least one processor; and

a non-transitory computer readable storage medium having computer readable program code embodied therewith and executable by the at least one processor, the computer readable program code comprising;

computer readable program code configured to segment audio provided in a first language not having available automatic speech recognition capabilities into speech units, wherein to segment comprises employing a language sub-word recognition technique selected from the group consisting of;

a statistical system for sub-word unit recognition;

a voice-activity-detection technique; and

a syllable segmentation technique, wherein the language sub-word recognition technique comprises utilizing a sub-word recognition technique of a second language having available automatic speech recognition capabilities and different from the first language of the audio;

computer readable program code configured to identify prominent speech units, wherein to identify comprises detecting a repeated speech unit by identifying speech patterns within the audio and using a language agnostic speech unit comparison technique, wherein the language agnostic speech unit comparison technique comprises a technique where a language associated with the speech unit is disregarded;

wherein to identify further comprises determining a frequency of occurrence of a speech unit and wherein a prominent speech unit comprises a speech unit that exceeds a predetermined frequency of occurrence threshold;

computer readable program code configured to create an audio cloud comprising audio signals of the prominent speech units, wherein each of the audio signals comprise a playable audio unit that when played provides an audible output from the audio of the corresponding prominent speech unit;

computer readable program code configured to render the audio cloud, wherein the audio cloud comprises a visual representation of the audio signals, wherein the audio signals are arranged in order of decreasing frequency of occurrence and wherein a volume of the audio signals is based upon the frequency of occurrence; and

computer readable program code configured to afford user interaction with at least a clip portion of the audio cloud.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and arrangements for effecting a cloud representation of audio content. An audio cloud is created and rendered, and user interaction with at least a clip portion of the audio cloud is afforded.

Citations

17 Claims

1. An apparatus comprising:
- at least one processor; and
  
  a non-transitory computer readable storage medium having computer readable program code embodied therewith and executable by the at least one processor, the computer readable program code comprising;
  
  computer readable program code configured to segment audio provided in a first language not having available automatic speech recognition capabilities into speech units, wherein to segment comprises employing a language sub-word recognition technique selected from the group consisting of;
  
  a statistical system for sub-word unit recognition;
  
  a voice-activity-detection technique; and
  
  a syllable segmentation technique, wherein the language sub-word recognition technique comprises utilizing a sub-word recognition technique of a second language having available automatic speech recognition capabilities and different from the first language of the audio;
  
  computer readable program code configured to identify prominent speech units, wherein to identify comprises detecting a repeated speech unit by identifying speech patterns within the audio and using a language agnostic speech unit comparison technique, wherein the language agnostic speech unit comparison technique comprises a technique where a language associated with the speech unit is disregarded;
  
  wherein to identify further comprises determining a frequency of occurrence of a speech unit and wherein a prominent speech unit comprises a speech unit that exceeds a predetermined frequency of occurrence threshold;
  
  computer readable program code configured to create an audio cloud comprising audio signals of the prominent speech units, wherein each of the audio signals comprise a playable audio unit that when played provides an audible output from the audio of the corresponding prominent speech unit;
  
  computer readable program code configured to render the audio cloud, wherein the audio cloud comprises a visual representation of the audio signals, wherein the audio signals are arranged in order of decreasing frequency of occurrence and wherein a volume of the audio signals is based upon the frequency of occurrence; and
  
  computer readable program code configured to afford user interaction with at least a clip portion of the audio cloud.

2. A non-transitory computer program storage device comprising:
- a non-transitory computer readable storage device having computer readable program code embodied therewith, the computer readable program code comprising;
  
  computer readable program code configured to segment audio provided in a first language not having available automatic speech recognition capabilities into speech units, wherein to segment comprises employing a language sub-word recognition technique selected from the group consisting of;
  
  a statistical system for sub-word unit recognition;
  
  a voice-activity-detection technique; and
  
  a syllable segmentation technique, wherein the language sub-word recognition technique comprises utilizing a sub-word recognition technique of a second language having available automatic speech recognition capabilities and different from the first language of the audio;
  
  computer readable program code configured to identify prominent speech units, wherein to identify comprises detecting a repeated speech unit by identifying speech patterns within the audio and using a language agnostic speech unit comparison technique, wherein the language agnostic speech unit comparison technique comprises a technique where a language associated with the speech unit is disregarded;
  
  wherein to identify further comprises determining a frequency of occurrence of a speech unit and wherein a prominent speech unit comprises a speech unit that exceeds a predetermined frequency of occurrence threshold;
  
  computer readable program code configured to create an audio cloud comprising audio signals of the prominent speech units, wherein each of the audio signals comprise a playable audio unit that when played provides an audible output from the audio of the corresponding prominent speech unit;
  
  computer readable program code configured to render the audio cloud, wherein the audio cloud comprises a visual representation of the audio signals, wherein the audio signals are arranged in order of decreasing frequency of occurrence and wherein a volume of the audio signals is based upon the frequency of occurrence; and
  
  computer readable program code configured to afford user interaction with at least a clip portion of the audio cloud.
- View Dependent Claims (3, 4, 5, 6, 7, 8)
- - 3. The non-transitory computer program storage device according to claim 2, comprising computer readable program code configured to detect speech units.
  - 4. The non-transitory computer program storage device according to claim 2, wherein said computer readable program code is configured to render the audio cloud via at least one member selected from the group consisting of:
    - audio-based rendering; and
      
      visual-display-based rendering.
  - 5. The non-transitory computer program storage device according to claim 2, wherein said computer readable program code is configured to afford the creating and rendering of the audio cloud as interactive based on user input.
  - 6. The non-transitory computer program storage device according to claim 2, wherein a language sub-word recognition technique comprises a speech analysis technique where accuracy of the technique is not dependant on the language and language characteristics of the speaker.
  - 7. The non-transitory computer program storage device according to claim 2, wherein the audio cloud comprises a plurality of audio segments.
  - 8. The non-transitory computer program storage device according to claim 2, wherein the prominent speech units within the rendered audio cloud are presented in an order based upon the prominence of the speech unit.

9. A non-transitory computer program storage device comprising:
- a non-transitory computer readable storage device having computer readable program code embodied therewith and executable by the at least one processor, the computer readable program code comprising;
  
  computer readable program code configured to segment audio provided in a first language not having available automatic speech recognition capabilities into speech units;
  
  wherein to segment comprises employing a language sub-word recognition technique selected from the group consisting of;
  
  a statistical system for sub-word unit recognition;
  
  a voice-activity-detection technique; and
  
  a syllable segmentation technique, wherein the language sub-word recognition technique comprises utilizing a sub-word recognition technique of a second language having available automatic speech recognition capabilities and different from the first language of the audio;
  
  computer readable program code configured to identify, by detecting a repeated speech unit by identifying speech patterns within the audio and via employing a language-agnostic speech unit comparison technique, prominent speech units within the audio, wherein the language agnostic speech unit comparison technique comprises a technique where a language associated with the speech unit is disregarded;
  
  wherein to identify further comprises determining a frequency of occurrence of a speech unit and wherein a prominent speech unit comprises a speech unit that exceeds a predetermined frequency of occurrence threshold;
  
  computer readable program code configured to create an audio cloud comprising audio signals of the identified prominent speech units, wherein each of the audio signals comprise a playable audio unit that when played provides an audible output from the audio of the corresponding prominent speech unit;
  
  computer readable program code configured to render the audio cloud, wherein the audio cloud comprises a visual representation of the audio signals, wherein the audio signals are arranged in order of decreasing frequency of occurrence and wherein a volume of the audio signals is based upon the frequency of occurrence.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
- - 10. The non-transitory computer program storage device according to claim 9, wherein a language sub-word recognition technique comprises a speech analysis technique where accuracy of the technique is not dependant on a language and language characteristics of the speaker.
  - 11. The non-transitory computer program storage device according to claim 10, wherein the language characteristics comprise at least one characteristic selected from the group consisting of:
    - dialect, accent, and vocabulary.
  - 12. The non-transitory computer program storage device according to claim 9, wherein the audio segments within the rendered audio cloud are presented in an order based upon the prominence of the unit.
  - 13. The non-transitory computer program storage device according to claim 9, wherein to identify prominent units comprises employing a language agnostic statistical sub-word analysis.
  - 14. The non-transitory computer program storage device according to claim 9, wherein to identify prominent units comprises creating a repetition score.
  - 15. The non-transitory computer program storage device according to claim 9, wherein to identify prominent units comprises employing term frequency inverse document frequency operations.
  - 16. The non-transitory computer program storage device according to claim 9, comprising computer readable program code configured to detect speech units.
  - 17. The non-transitory computer program storage device according to claim 9, wherein to render the audio cloud comprises a type of rendering selected from the group consisting of:
    - audio-based rendering; and
      
      visual-display-based rendering.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Ajmera, Jitendra, Deshmukh, Om Dadaji, Jain, Anupam, Nanavati, Amit Anil, Rajput, Nitendra
Primary Examiner(s)
JACKSON, JAKIEDA R

Application Number

US13/538,988
Publication Number

US 20140006011A1
Time in Patent Office

2,188 Days
Field of Search

704254
US Class Current
CPC Class Codes

G06F 16/64 Browsing; Visualisation the...

G10L 15/04 Segmentation; Word boundary...

Creating, rendering and interacting with a multi-faceted audio cloud

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Creating, rendering and interacting with a multi-faceted audio cloud

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links