SPEECH-DRIVEN SELECTION OF AN AUDIO FILE

US 20080065382A1
Filed: 02/12/2007
Published: 03/13/2008
Est. Priority Date: 02/10/2006
Status: Active Grant

First Claim

Patent Images

1. A method for detecting a refrain in an audio file having vocal components, the method comprising:

generating a phonetic transcription of at least a portion of the audio file; and

identifying a vocal segment in the generated phonetic transcription, which vocal segment is repeated at least once.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for detecting a refrain in an audio file having vocal components. The method and system includes generating a phonetic transcription of a portion of the audio file, analyzing the phonetic transcription and identifying a vocal segment in the generated phonetic transcription that is repeated frequently. The method and system further relate to the speech-driven selection based on similarity of detected refrain and user input.

241 Citations

25 Claims

1. A method for detecting a refrain in an audio file having vocal components, the method comprising:
- generating a phonetic transcription of at least a portion of the audio file; and
  
  identifying a vocal segment in the generated phonetic transcription, which vocal segment is repeated at least once.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, further including pre-segmenting the audio file into vocal and non-vocal components.
  - 3. The method of claim 2, further including (i) either or both attenuating the non-vocal components of the audio file and amplifying the vocal components of the audio file and (ii) generating the phonetic transcription based on the resulting audio file.
  - 4. The method of claim 1, further including identifying repeating segments of melody, rhythm, power, and harmonics of the audio file.
  - 5. The method of claim 1, where identifying includes identifying a vocal segment which is repeated at least twice in the phonetic transcription.
  - 6. The method of claim 1, where the phonetic transcription is generated for a majority audio file.

7. A method for processing an audio file having at least vocal components, the method comprising:
- detecting a refrain of the audio file;
  
  generating either or both a phonetic or acoustic representation of the refrain; and
  
  storing the generated phonetic or acoustic representation together with the audio file.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The method of claim 7, where detecting the refrain includes detecting vocal segments that are repeated at least once in the audio file.
  - 9. The method of claim 7, where detecting the refrain includes generating a phonetic transcription of a majority of the audio file and identifying repeating similar segments within the phonetic transcription of the audio file.
  - 10. The method of any of claims 9, where detecting the refrain further includes identifying repeating similar segments of melody, harmony or rhythm or any combination thereof in the audio file.
  - 11. The method of claim 7 further including decomposing the detected refrain and further dividing the refrain into subparts based upon prosody, loudness, vocal pauses or combinations thereof, within the refrain.

12. A method of speech-driven selection of an audio file from a plurality of audio files in an audio player, each of the audio files having at least vocal components, the method comprising:
- detecting a refrain in each of the audio files of the plurality of audio files;
  
  determining either or both phonetic or acoustic representations of at least part of a refrain of each of the audio files;
  
  supplying each of the phonetic or acoustic representations to a speech recognition unit;
  
  comparing the phonetic or acoustic representations to the voice command of the user of the audio player; and
  
  selecting an audio file based on the best matching result of the comparison.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 13. The method of claim 12, where a statistical model is used for comparing the voice command to the phonetic or acoustic representation.
  - 14. The method of claim 12, where the phonetic or acoustic representations of refrains are integrated into a speech recognizer as elements in a finite grammar or statistical language model.
  - 15. The method of claim 12, where selecting an audio file based on the best matching result of comparison includes selecting the audio file based additionally on either or both the phonetic or acoustic representation of the refrain.
  - 16. The method of claim 15, where selecting an audio file based on the best matching result of comparison includes selecting the audio file based additionally on the phonetic data of the refrain.
  - 17. The method of claim 12, where detecting a refrain further includes further segmenting the detected refrain.
  - 18. The method of claim 12 where detecting a refrain further, includes further segmenting either or both the generated phonetic or acoustic representation of the detected refrain.
  - 19. The method of claim 17, where for the further segmentation is based upon the prosody, loudness, vocal pauses or any combination thereof of the audio file.
  - 20. The method of claim 12, where detecting a refrain in each of the audio files includes generating a phonetic transcription of a majority of the audio file;
    - and identifying a vocal segment in the generated phonetic transcription, that is repeated at least once.
  - 21. The method of claim 20, where generating the phonetic or acoustic representation of the refrain includes processing the audio file by a method comprising:
    - detecting a refrain of the audio file;
      
      generating either or both a phonetic or acoustic representation of the refrain; and
      
      storing the generated phonetic or acoustic representation together with the audio file.
  - 22. The method of claims 12 further including:
    - determining the melody of the refrain;
      
      determining the melody of the speech command;
      
      comparing the two melodies; and
      
      selecting at least one of the audio files base upon best match of either or both the phonetic or acoustic representations and melody comparison.

23. A system for detecting a refrain in an audio file having at least vocal components, the system comprising:
- a phonetic transcription unit that generates a phonetic transcription of at least a portion of the audio file;
  
  an analyzing unit that identifies vocal segments within the phonetic transcription that are repeated at least once.

24. A system for processing an audio file having at least vocal components, the system comprising:
- a detecting unit that detects the refrain of the audio file;
  
  a transcription unit that generates a phonetic or acoustic representation of the refrain; and
  
  a control unit that stores the phonetic or acoustic representation linked to the audio data.

25. A system for a speech-driven selection of an audio file comprising:
- a refrain detecting unit that detects the refrain of an audio file;
  
  a transcription unit that generates a phonetic or acoustic representation of the detected refrain;
  
  a speech recognition unit that compares the phonetic or acoustic representation to the voice command of the user selecting the audio file and that determines the best matching result of the comparison; and
  
  a control unit that selects the audio file in accordance with the result of the comparison.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Harman Becker Automotive Systems GmbH (Samsung Electronics Co. Ltd.)
Original Assignee
Harman Becker Automotive Systems GmbH (Samsung Electronics Co. Ltd.)
Inventors
GERL, Franz, Willett, Daniel, Brueckner, Raymond

Granted Patent

US 7,842,873 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/258
CPC Class Codes

G10H 1/0008   Associated control or indic...

G10H 2210/046   for differentiation between...

G10H 2210/066   for pitch analysis as part ...

G10H 2210/076   for extraction of timing, t...

G10H 2210/081   for automatic key or tonali...

G10H 2240/135   Library retrieval index, i....

G10H 2240/141   Library retrieval matching,...

G10L 25/48   specially adapted for parti...

G10L 25/87   Detection of discrete point...

SPEECH-DRIVEN SELECTION OF AN AUDIO FILE

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

241 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

SPEECH-DRIVEN SELECTION OF AN AUDIO FILE

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

241 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links