Spoken term detection apparatus, method, program, and storage medium

US 8,731,926 B2
Filed: 03/03/2011
Issued: 05/20/2014
Est. Priority Date: 03/04/2010
Status: Expired due to Fees

First Claim

Patent Images

1. A spoken term detection apparatus, comprising:

a storage unit and a processor, whereinthe storage unit includesan accumulation part to accumulate speech data of a retrieval target,an acoustic model storage section to store an acoustic model retaining a characteristic in an acoustic feature space for each unit of speech recognition,an acoustic feature storage to store an acoustic feature extracted from the speech data, anda standard score storage part to store a standard score calculated from a similarity between the acoustic feature and the acoustic model, wherein processing performed by the processor includesa feature extraction process to extract an acoustic feature from speech data accumulated in the accumulation part and store an extracted acoustic feature in the acoustic feature storage,a first calculation process to calculate the standard score from a similarity between an acoustic feature stored in the acoustic feature storage and an acoustic model stored in the acoustic model storage part,an acceptance process to accept an input keyword,a second calculation process to compare an acoustic model corresponding to an accepted keyword with the acoustic feature stored in the acoustic feature storage part to calculate a score of the keyword, anda retrieval process to retrieve speech data including the keyword from speech data accumulated in the accumulation part based on the score of the keyword calculated by the second calculation process and the standard score stored in the standard score storage part, whereinthe standard score equates to the highest-likelihood phoneme series.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A spoken term detection apparatus includes: processing performed by a processor includes a feature extraction process extracting an acoustic feature from speech data accumulated in an accumulation part and storing an extracted acoustic feature in an acoustic feature storage, a first calculation process calculating a standard score from a similarity between an acoustic feature stored in the acoustic feature storage and an acoustic model stored in the acoustic model storage part, a second calculation process for comparing an acoustic model corresponding to an input keyword with the acoustic feature stored in the acoustic feature storage part to calculate a score of the keyword, and a retrieval process retrieving speech data including the keyword from speech data accumulated in the accumulation part based on the score of the keyword calculated by the second calculation process and the standard score stored in the standard score storage part.

Citations

11 Claims

1. A spoken term detection apparatus, comprising:
- a storage unit and a processor, whereinthe storage unit includesan accumulation part to accumulate speech data of a retrieval target,an acoustic model storage section to store an acoustic model retaining a characteristic in an acoustic feature space for each unit of speech recognition,an acoustic feature storage to store an acoustic feature extracted from the speech data, anda standard score storage part to store a standard score calculated from a similarity between the acoustic feature and the acoustic model, wherein processing performed by the processor includesa feature extraction process to extract an acoustic feature from speech data accumulated in the accumulation part and store an extracted acoustic feature in the acoustic feature storage,a first calculation process to calculate the standard score from a similarity between an acoustic feature stored in the acoustic feature storage and an acoustic model stored in the acoustic model storage part,an acceptance process to accept an input keyword,a second calculation process to compare an acoustic model corresponding to an accepted keyword with the acoustic feature stored in the acoustic feature storage part to calculate a score of the keyword, anda retrieval process to retrieve speech data including the keyword from speech data accumulated in the accumulation part based on the score of the keyword calculated by the second calculation process and the standard score stored in the standard score storage part, whereinthe standard score equates to the highest-likelihood phoneme series.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The spoken term detection apparatus according to claim 1, whereinthe feature extraction process divides speech data by specified time T, and extract an acoustic feature of speech data for each time T,the first calculation process calculates the similarity for each time T, andthe standard score storage part stores the similarity calculated by the first calculation process for each time T, and the retrieval process compares the score calculated by the second calculation process with the standard score stored in the standard score storage part by the time T so as to retrieve the speech data.
  - 3. The spoken term detection apparatus according to claim 1, whereinthe acoustic model storage part stores a plurality of acoustic models, andthe first calculation process and the second calculation process are able to use different acoustic models, respectively.
  - 4. The spoken term detection apparatus according to claim 1, further comprising:
    - a language determination process to determine a language that corresponds to the accepted keyword, whereinthe accumulation part accumulates speech data including a plurality of languages,the acoustic model storage part stores an acoustic model corresponding to each of the plurality of languages,the feature extraction process extracts an acoustic feature of the speech data for each language,the first calculation process calculates a score of the speech data for each language by using each acoustic model,the standard score storage part stores top N (N>
      
      1) scores among scores calculated for each language by the first calculation process, andthe retrieval process is stored in the standard score storage part and performs retrieval using a score corresponding to the language determined by the language determination process.
  - 5. The spoken term detection apparatus according to claim 1, whereinwhen a difference between the score of the keyword calculated by the second calculation process and the score stored in the standard score storage part in an arbitrary section of the speech data is not more than a threshold or less than the threshold, the section is retrieved as a keyword-existing section.
  - 6. The spoken term detection apparatus according to claim 5, further comprising:
    - an adjusting process to adjust the threshold in response to a phoneme of the stored score.
  - 7. The spoken term detection apparatus according to claim 5, further comprising:
    - an adjusting process to adjust the threshold in response to a phoneme of the score calculated by the second calculation process.
  - 8. The spoken term detection apparatus according to claim 1, whereinthe feature extraction process extracts an acoustic feature from newly accumulated speech data every time the new speech data is accumulated in the accumulation part.
  - 9. The spoken term detection apparatus according to claim 1, further including:
    - a determination process to determine whether speech data from which the feature extraction process does not extract an acoustic feature is accumulated in the accumulation part when the retrieval process performs retrieval, anda request process to request extraction of an acoustic feature to the feature extraction process when it is determined that there is no accumulation.

10. A spoken term detection method of retrieving speech data including an accepted keyword using an acoustic model holding a characteristic in an acoustic feature space for each unit of speech recognition, comprising:
- extracting an acoustic feature from accumulated speech data;
  
  storing an extracted acoustic feature in an acoustic features storing device;
  
  calculating a standard score from a similarity between a stored acoustic feature and an acoustic feature defined by a stored acoustic model;
  
  storing the calculated standard score;
  
  accepting a keyword;
  
  calculating a score of a keyword by comparing an acoustic model corresponding to the keyword with the acoustic feature stored in the acoustic features storing device; and
  
  executing a process for retrieving speech data including the keyword from the accumulated speech data, based on a calculated score of the keyword and the standard scored, whereinthe standard score equates to the highest-likelihood phoneme series.

11. A computer-readable storage medium storing a program to be executed by a computer, whereinthe program is a program to be executed by a computer in which speech data is accumulated by an accumulation device and an acoustic model retaining a characteristic in an acoustic feature space for each unit of speech recognition is stored in an acoustic features storing device, and the program allows the computer to execute:
- an extraction process for extracting an acoustic feature from the accumulated speech data;
  
  a first calculation process for calculating a standard score from a similarity between the extracted acoustic feature and an acoustic feature defined by the stored acoustic model;
  
  a second calculation process for comparing an acoustic model corresponding with the acoustic feature stored in the acoustic features storing device to calculate an accepted keyword to calculate a score of the keyword; and
  
  a retrieval process for retrieving speech data including the keyword from speech data accumulated in the accumulation device based on the score of the keyword calculated by the second calculation process and the calculated standard scored, whereinthe standard score equates to the highest-likelihood phoneme series.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fujitsu Limited
Original Assignee
Fujitsu Limited
Inventors
Washio, Nobuyuki, Harada, Shouji
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US13/039,495
Publication Number

US 20110218805A1
Time in Patent Office

1,174 Days
Field of Search

704/251, 704/252, 704/253, 704/254, 704/235, 704/243, 704/270, 704/270.1, 704/275, 704/246, 704/255
US Class Current

704/251
CPC Class Codes

G10L 15/00   Speech recognition G10L17/0...

G10L 15/08   Speech classification or se...

G10L 15/26   Speech to text systems G10L...

G10L 2015/088   Word spotting

Spoken term detection apparatus, method, program, and storage medium

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Spoken term detection apparatus, method, program, and storage medium

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links