Speech retrieval method, speech retrieval apparatus, and program for speech retrieval apparatus

US 9,626,957 B2
Filed: 05/27/2016
Issued: 04/18/2017
Est. Priority Date: 04/21/2014
Status: Active Grant

First Claim

Patent Images

1. A speech retrieval apparatus comprising:

a segment detection unit configured to detect one or more coinciding segments for speech data by comparing a character string of a recognition result of word speech recognition and a character string of a keyword, the keyword being designated by the character string and a phoneme string or a syllable string stored in a non-transitory computer readable storage medium;

an evaluation value calculation unit configured to calculate an evaluation value of each of the one or more coinciding segments using the phoneme string or the syllable string of the keyword to evaluate a phoneme string or a syllable string recognized in each of the one or more coinciding segments and that is a recognition result of phoneme speech recognition, wherein the phoneme string or the syllable string associated with each of the one or more coinciding segments is a phoneme string or a syllable string associated with a segment n which a start and an end of the segment is expanded by a predetermined time; and

a segment output unit configured to output a segment in which the calculated evaluation value exceeds a predetermined threshold.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for speech retrieval includes acquiring a keyword designated by a character string, and a phoneme string or a syllable string, detecting one or more coinciding segments by comparing a character string that is a recognition result of word speech recognition with words as recognition units performed for speech data to be retrieved and the character string of the keyword, calculating an evaluation value of each of the one or more segments by using the phoneme string or the syllable string of the keyword to evaluate a phoneme string or a syllable string that is recognized in each of the detected one or more segments and that is a recognition result of phoneme speech recognition with phonemes or syllables as recognition units performed for the speech data, and outputting a segment in which the calculated evaluation value exceeds a predetermined threshold.

Citations

20 Claims

1. A speech retrieval apparatus comprising:
- a segment detection unit configured to detect one or more coinciding segments for speech data by comparing a character string of a recognition result of word speech recognition and a character string of a keyword, the keyword being designated by the character string and a phoneme string or a syllable string stored in a non-transitory computer readable storage medium;
  
  an evaluation value calculation unit configured to calculate an evaluation value of each of the one or more coinciding segments using the phoneme string or the syllable string of the keyword to evaluate a phoneme string or a syllable string recognized in each of the one or more coinciding segments and that is a recognition result of phoneme speech recognition, wherein the phoneme string or the syllable string associated with each of the one or more coinciding segments is a phoneme string or a syllable string associated with a segment n which a start and an end of the segment is expanded by a predetermined time; and
  
  a segment output unit configured to output a segment in which the calculated evaluation value exceeds a predetermined threshold.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The apparatus according to claim 1, wherein the recognition result of word speech recognition includes words as recognition units performed for the speech data.
  - 3. The apparatus according to claim 1, wherein the recognition result of phoneme speech recognition includes phonemes or syllables as recognition units performed for the speech data.
  - 4. The apparatus according to claim 1, wherein the evaluation value calculation unit is farther configured to compare a phoneme string or a syllable string that is an N-best recognition result of phoneme speech recognition with phonemes or syllables as recognition units performed for speech data associated with each of the detected one or more coinciding segments and the phoneme string of the keyword to set a rank of the coinciding N-best recognition result as the evaluation value.
  - 5. The apparatus according to claim 1, wherein the evaluation value calculation unit is further configured to set, as the evaluation value, an edit distance between a phoneme string or a syllable string that is a 1-best recognition result of phoneme speech recognition with phonemes or syllables as recognition units performed for speech data associated with each of the detected one or more coinciding segments and the phoneme string or the syllable string of the keyword.
  - 6. The apparatus according to claim 5, wherein the edit distance is a distance matched by matching based on dynamic programming.
  - 7. The apparatus according to claim 1, further comprising a word speech recognition unit configured to perform word speech recognition of the speech data to be retrieved, with words as recognition units.
  - 8. The apparatus according to claim 1, further comprising a phoneme speech recognition unit configured to perform phoneme speech recognition of the speech data associated with each of the detected one or more coinciding segments, with phonemes or syllables as recognition units.
  - 9. The apparatus according to claim 1, further comprising a phoneme speech recognition unit configured to perform phoneme speech recognition of the speech data to be retrieved, with phonemes or syllables as recognition units.
  - 10. The apparatus according to claim 1, wherein the evaluation value calculation unit is further configured to calculate the evaluation value of each of the one or more coinciding segments using the character string of the keyword to evaluate the character string in each of the detected one or more coinciding segments.
  - 11. The apparatus according to claim 1, wherein the segment output unit is further configured to adjust the predetermined threshold to alter at least one of a precision value and a recall value of the output segment, the precision value being positively correlated with the predetermined threshold and the recall value being negatively correlated with the predetermined threshold.
  - 12. The apparatus according to claim 11, wherein the precision value is a ratio of retrieval results satisfying a retrieval request to all documents satisfying the retrieval request.
  - 13. The apparatus according to claim 11, wherein the recall value is a ratio of retrieval results satisfying a retrieval request to all retrieval results.

14. A non-transitory computer readable storage medium comprising a computer readable program for a speech retrieval apparatus, wherein the program causes the speech retrieval apparatus to:
- detect one or more coinciding segments for speech data by comparing a character string of a recognition result of word speech recognition and a character string of a keyword, the keyword being designated by the character string and a phoneme string or a syllable string;
  
  calculate an evaluation value of each of the one or more coinciding segments using the phoneme string or the syllable string of the keyword to evaluate a phoneme string or a syllable string recognized in each of the one or more coinciding segments and that is a recognition result of phoneme speech recognition, wherein the phoneme string or the syllable string associated with each of the one or more coinciding segments is a phoneme string or a syllable string associated with a segment in which a start and an end of the segment is expanded by a predetermined time; and
  
  output a segment in which the calculated evaluation value exceeds a predetermined threshold.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The non-transitory computer readable storage medium of claim 14, wherein the program further causes the speech retrieval apparatus to set, as the evaluation value, an edit distance between a phoneme string or a syllable string that is a 1-best recognition result of phoneme speech recognition with phonemes or syllables as recognition units performed for speech data associated with each of the detected one or more coinciding segments and the phoneme string or the syllable string of the keyword.
  - 16. The non-transitory computer readable storage medium of claim 15, wherein the edit distance is a distance matched by matching based on dynamic programming.
  - 17. The non-transitory computer readable storage medium of claim 14, wherein the program further causes the speech retrieval apparatus to calculate the evaluation value of each of the one or more coinciding segments using the character string of the keyword to evaluate the character string in each of the detected one or more coinciding segments.
  - 18. The non-transitory computer readable storage medium of claim 14, wherein the program further causes the speech retrieval apparatus to adjust the predetermined threshold to alter at least one of a precision value and a recall value of the output segment, the precision value being positively correlated with the predetermined threshold and the recall value being negatively correlated with the predetermined threshold.
  - 19. The non-transitory computer readable storage medium of claim 18, wherein the precision value is a ratio of retrieval results satisfying a retrieval request to all documents sat ng the retrieval request.
  - 20. The non-transitory computer readable storage medium of claim 18, wherein the recall value is a ratio of retrieval results satisfying a retrieval request to all retrieval results.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SINOEAST CONCEPT LIMITED (Tencent Holdings Limited)
Original Assignee
SINOEAST CONCEPT LIMITED (Tencent Holdings Limited)
Inventors
Nishimura, Masafumi, Kurata, Gakuto, Nagano, Tohru
Primary Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US15/167,522
Publication Number

US 20160275939A1
Time in Patent Office

326 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/04   Segmentation; Word boundary...

G10L 15/08   Speech classification or se...

G10L 15/187   Phonemic context, e.g. pron...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/027   Syllables being the recogni...

G10L 2015/088   Word spotting

G10L 25/51   for comparison or discrimin...

Speech retrieval method, speech retrieval apparatus, and program for speech retrieval apparatus

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech retrieval method, speech retrieval apparatus, and program for speech retrieval apparatus

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links