Method and apparatus for searching multimedia data using speech recognition in mobile device
First Claim
Patent Images
1. A method of searching music using speech recognition, the method comprising:
- recognizing as a phoneme sequence a speech signal uttered by a user; and
searching music information by performing partial symbol matching between the recognized phoneme sequence and a standard pronunciation sequence, considering pronunciation differences between a pronunciation of—
sequenced partial symbols in the standard pronunciation sequence—
and a pronunciation of a single partial symbol within the sequenced partial symbols or a pronunciation of a sequence of partial symbols less than all of the sequenced partial symbols within the sequenced partial symbols,wherein the recognizing comprises;
extracting a feature vector sequence of the speech signal uttered by the user; and
converting the extracted feature vector sequence to the phoneme sequence, so that the speech signal is recognized as the phoneme sequence.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of searching music using speech recognition in a mobile device, the method including: recognizing a speech signal uttered by a user as a phoneme sequence; and searching music information by performing partial symbol matching between the recognized phoneme sequence and a standard pronunciation sequence.
-
Citations
22 Claims
-
1. A method of searching music using speech recognition, the method comprising:
-
recognizing as a phoneme sequence a speech signal uttered by a user; and searching music information by performing partial symbol matching between the recognized phoneme sequence and a standard pronunciation sequence, considering pronunciation differences between a pronunciation of—
sequenced partial symbols in the standard pronunciation sequence—
and a pronunciation of a single partial symbol within the sequenced partial symbols or a pronunciation of a sequence of partial symbols less than all of the sequenced partial symbols within the sequenced partial symbols,wherein the recognizing comprises; extracting a feature vector sequence of the speech signal uttered by the user; and converting the extracted feature vector sequence to the phoneme sequence, so that the speech signal is recognized as the phoneme sequence. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A non-transitory computer-readable recording medium in which a program to execute a method of searching music using speech recognition is recorded, the method comprising:
-
recognizing as a phoneme sequence a speech signal uttered by a user; and searching music information by performing partial symbol matching between the recognized phoneme sequence and a standard pronunciation sequence, using a phoneme confusion matrix based on pronunciation differences between a pronunciation of sequenced partial symbols in the standard pronunciation sequence—
and a pronunciation of a single partial symbol within the sequenced partial symbols or a pronunciation of a sequence of partial symbols less than all of the sequenced partial symbols within the sequenced partial symbol,wherein the recognizing comprises; extracting a feature vector sequence of the speech signal uttered by the user; and converting the extracted feature vector sequence to the phoneme sequence, so that the speech signal is recognized as the phoneme sequence.
-
-
8. A music search apparatus comprising:
-
a music database storing a pronunciation dictionary with respect to music and music information; a feature extraction unit extracting a feature vector sequence from a speech signal; a phoneme decoding unit decoding the feature vector sequence into a candidate phoneme sequence; a matching unit matching the candidate phoneme sequence with a reference phoneme pattern in the pronunciation dictionary with respect to the music information, with the pronunciation dictionary relating pronunciation differences between a pronunciation of sequenced partial symbols and a pronunciation of a single partial symbol within the sequenced partial symbols or a pronunciation of a sequence of partial symbols less than all of the sequenced partial symbols within the sequenced partial a calculation unit calculating a match score according to a result of the matching; and a display unit displaying a music information search result according to the calculated match score. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A music search apparatus comprising:
-
a music database storing a pronunciation dictionary with respect to music and music information; a phoneme decoding unit decoding a speech signal into a candidate phoneme sequence; a matching unit matching the candidate phoneme sequence with a reference phoneme pattern in the pronunciation dictionary with respect to the music information; a calculation unit calculating a match score according to a result of the matching; and a display unit displaying a music information search result according to the calculated match score, wherein the matching unit converts a pronunciation sequence of a part of the candidate phoneme sequence exhibiting an effect of palatalization into an original pronunciation sequence in an isolated speech form and matches the converted pronunciation sequence with the reference phoneme pattern of the pronunciation dictionary.
-
-
15. A music search apparatus comprising:
-
a feature extraction unit extracting a feature vector sequence of a speech signal of an input speech query; a phoneme decoding unit decoding the extracted feature vector sequence into at least one candidate phoneme sequences; a matching unit partially matching a candidate phoneme sequence with a reference pattern included in a stored lexicon by matching the candidate phoneme sequence with the reference pattern using a phoneme confusion matrix and linguistic constraints and, after the partial matching, matching a converted pronunciation sequence with a reference phoneme pattern of the lexicon so as to overcome an inconsistency due to a difference in pronunciation caused by palatalization; and a calculation unit calculating a match score according to the match score using a probability value of the phoneme confusion matrix and considering probabilities of insertion and deletion of the phoneme. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
Specification