Method and apparatus for searching multimedia data using speech recognition in mobile device

US 8,200,490 B2
Filed: 02/09/2007
Issued: 06/12/2012
Est. Priority Date: 03/02/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A method of searching music using speech recognition, the method comprising:

recognizing as a phoneme sequence a speech signal uttered by a user; and

searching music information by performing partial symbol matching between the recognized phoneme sequence and a standard pronunciation sequence, considering pronunciation differences between a pronunciation of—

sequenced partial symbols in the standard pronunciation sequence—

and a pronunciation of a single partial symbol within the sequenced partial symbols or a pronunciation of a sequence of partial symbols less than all of the sequenced partial symbols within the sequenced partial symbols,wherein the recognizing comprises;

extracting a feature vector sequence of the speech signal uttered by the user; and

converting the extracted feature vector sequence to the phoneme sequence, so that the speech signal is recognized as the phoneme sequence.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of searching music using speech recognition in a mobile device, the method including: recognizing a speech signal uttered by a user as a phoneme sequence; and searching music information by performing partial symbol matching between the recognized phoneme sequence and a standard pronunciation sequence.

Citations

22 Claims

1. A method of searching music using speech recognition, the method comprising:
- recognizing as a phoneme sequence a speech signal uttered by a user; and
  
  searching music information by performing partial symbol matching between the recognized phoneme sequence and a standard pronunciation sequence, considering pronunciation differences between a pronunciation of—
  
  sequenced partial symbols in the standard pronunciation sequence—
  
  and a pronunciation of a single partial symbol within the sequenced partial symbols or a pronunciation of a sequence of partial symbols less than all of the sequenced partial symbols within the sequenced partial symbols,wherein the recognizing comprises;
  
  extracting a feature vector sequence of the speech signal uttered by the user; and
  
  converting the extracted feature vector sequence to the phoneme sequence, so that the speech signal is recognized as the phoneme sequence.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the searching music information comprises:
    - calculating a match score according to a result of the partial symbol matching; and
      
      displaying a music information search result according to the match score.
  - 3. The method of claim 2, wherein the match score is calculated by a phoneme confusion matrix.
  - 4. The method of claim 2, wherein, in the displaying a music information search result according to the match score, only a music information search result having the match score greater than a predetermined reference value is displayed.
  - 5. The method of claim 1, further comprising extracting a recognition target vocabulary from a predetermined music file and generating the music information with respect to the extracted recognition target vocabulary.
  - 6. The method of claim 5, further comprising:
    - generating a pronunciation dictionary with the recognition target vocabulary; and
      
      sorting the generated pronunciation dictionary.

7. A non-transitory computer-readable recording medium in which a program to execute a method of searching music using speech recognition is recorded, the method comprising:
- recognizing as a phoneme sequence a speech signal uttered by a user; and
  
  searching music information by performing partial symbol matching between the recognized phoneme sequence and a standard pronunciation sequence, using a phoneme confusion matrix based on pronunciation differences between a pronunciation of sequenced partial symbols in the standard pronunciation sequence—
  
  and a pronunciation of a single partial symbol within the sequenced partial symbols or a pronunciation of a sequence of partial symbols less than all of the sequenced partial symbols within the sequenced partial symbol,wherein the recognizing comprises;
  
  extracting a feature vector sequence of the speech signal uttered by the user; and
  
  converting the extracted feature vector sequence to the phoneme sequence, so that the speech signal is recognized as the phoneme sequence.

8. A music search apparatus comprising:
- a music database storing a pronunciation dictionary with respect to music and music information;
  
  a feature extraction unit extracting a feature vector sequence from a speech signal;
  
  a phoneme decoding unit decoding the feature vector sequence into a candidate phoneme sequence;
  
  a matching unit matching the candidate phoneme sequence with a reference phoneme pattern in the pronunciation dictionary with respect to the music information, with the pronunciation dictionary relating pronunciation differences between a pronunciation of sequenced partial symbols and a pronunciation of a single partial symbol within the sequenced partial symbols or a pronunciation of a sequence of partial symbols less than all of the sequenced partial symbols within the sequenced partiala calculation unit calculating a match score according to a result of the matching; and
  
  a display unit displaying a music information search result according to the calculated match score.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The apparatus of claim 8, wherein the matching unit matches the candidate phoneme sequence with the reference phoneme pattern in the pronunciation dictionary, with respect to the music information, using a phoneme confusion matrix and language boundary information.
  - 10. The apparatus of claim 8, wherein the display unit displays only music information search results having the match score greater than a predetermined reference value.
  - 11. The apparatus of claim 8, wherein the display unit arranges and displays music information search results according to a predetermined criteria when the match score of the music information search result is the same as another match score of another search.
  - 12. The apparatus of claim 8, further comprising a music information generation unit extracting a recognition target vocabulary from a predetermined music file, and generating the music information with respect to the extracted recognition target vocabulary.
  - 13. The apparatus of claim 8, wherein the matching unit converts a pronunciation sequence of a part of the candidate phoneme sequence exhibiting an effect of palatalization into an original pronunciation sequence in an isolated speech form and matches the converted pronunciation sequence with the reference phoneme pattern of the pronunciation dictionary.

14. A music search apparatus comprising:
- a music database storing a pronunciation dictionary with respect to music and music information;
  
  a phoneme decoding unit decoding a speech signal into a candidate phoneme sequence;
  
  a matching unit matching the candidate phoneme sequence with a reference phoneme pattern in the pronunciation dictionary with respect to the music information;
  
  a calculation unit calculating a match score according to a result of the matching; and
  
  a display unit displaying a music information search result according to the calculated match score,wherein the matching unit converts a pronunciation sequence of a part of the candidate phoneme sequence exhibiting an effect of palatalization into an original pronunciation sequence in an isolated speech form and matches the converted pronunciation sequence with the reference phoneme pattern of the pronunciation dictionary.

15. A music search apparatus comprising:
- a feature extraction unit extracting a feature vector sequence of a speech signal of an input speech query;
  
  a phoneme decoding unit decoding the extracted feature vector sequence into at least one candidate phoneme sequences;
  
  a matching unit partially matching a candidate phoneme sequence with a reference pattern included in a stored lexicon by matching the candidate phoneme sequence with the reference pattern using a phoneme confusion matrix and linguistic constraints and, after the partial matching, matching a converted pronunciation sequence with a reference phoneme pattern of the lexicon so as to overcome an inconsistency due to a difference in pronunciation caused by palatalization; and
  
  a calculation unit calculating a match score according to the match score using a probability value of the phoneme confusion matrix and considering probabilities of insertion and deletion of the phoneme.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
- - 16. The apparatus of claim 15, further comprising a music database storing music, music information, and the lexicon, the lexicon being for the music information and corresponding to a reference pronunciation pattern for comparing a speech query with a recognized phoneme sequence.
  - 17. The apparatus of claim 15, wherein the feature extraction unit extracts a feature vector sequence of a speech signal of an input speech query by reducing background noise of the speech signal of the speech query, extracting a speech interval from the speech signal, and extracting a feature vector sequence usable in speech recognition from the detected speech interval.
  - 18. The apparatus of claim 15, wherein the phoneme decoding unit decodes the extracted feature vector sequence into the at least one candidate phoneme sequence using a phoneme or a tri-phoneme acoustic model and applies connectivity between contexts when using the tri-phoneme acoustic model.
  - 19. The apparatus of claim 15, wherein the phoneme decoding unit applies a phoneme-level grammar when converting the extracted feature vector sequence into the at least one candidate phoneme sequence.
  - 20. The apparatus of claim 15, wherein the matching unit obtains the converted pronunciation sequence used to overcome an inconsistency due to a difference in pronunciation caused by palatalization by converting a pronunciation sequence of a part of the candidate phoneme sequence exhibiting an effect of palatalization into an original pronunciation sequence in an isolated speech form.
  - 21. The apparatus of claim 15, wherein the conversion of the pronunciation sequence into the original pronunciation sequence enables regularization by back-tracking from a pronunciation rule.
  - 22. The apparatus of claim 15, wherein the matching a converted pronunciation sequence with a reference phoneme pattern of the lexicon is achieved by Viterbi alignment with respect to a matched phoneme segment of a candidate recognition list obtained from the partial matching.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Samsung Electronics Co. Ltd.
Inventors
Choi, In Jeong, Kim, Nam Hoon, Han, Ick Sano, Jeong, Sang Bae
Primary Examiner(s)
SKED, MATTHEW J

Application Number

US11/704,271
Publication Number

US 20070208561A1
Time in Patent Office

1,950 Days
Field of Search

None
US Class Current

704/252
CPC Class Codes

G10L 15/26 Speech to text systems G10L...

G10L 2015/025 Phonemes, fenemes or fenone...

Method and apparatus for searching multimedia data using speech recognition in mobile device

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for searching multimedia data using speech recognition in mobile device

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links