Voice recognition device and voice recognition method

US 10,665,227 B2
Filed: 08/10/2017
Issued: 05/26/2020
Est. Priority Date: 09/15/2016
Status: Active Grant

First Claim

Patent Images

1. A non-transitory computer-readable recording medium having recorded thereon a computer program for voice recognition that causes a computer to execute a process comprising:

extracting, from a first voice signal of a user, a first string of phonemes included in the first voice signal;

determining whether or not any keyword among a plurality of registered keywords stored in a memory is detected in the first string;

when any keyword is detected in the first string, outputting information representing the detected keyword;

when any keyword is not detected, storing the first string;

extracting, from a second voice signal of the user, a second string of phonemes included in the second voice signal;

determining whether or not any keyword among the plurality of registered keywords is detected in the second string;

storing the second string when any keyword is not detected in the second string;

extracting a string of common phonemes from the first string and the second string;

calculating, for each of the plurality of registered keywords, a first degree of similarity between a string of phonemes corresponding to the keyword and the string of common phonemes; and

selecting, among the plurality of keywords, a prescribed number of keywords based on the first degree of similarity for each keyword, wherein determination of whether or not any keyword is detected in the first string includes;

calculating, for each of the plurality of registered keywords, a second degree of similarity between a string of phonemes corresponding to the keyword and the first string of phonemes based on a number of coincident phonemes between the first string of phonemes and the string of phonemes corresponding to the keyword, a number of phonemes that are included in the string of phonemes corresponding to the keyword but not included in the first string of phonemes, and a number of phonemes that are included in the string of phonemes corresponding to the keyword and are different from phonemes at corresponding positions in the first string of phonemes; and

determining that, when a maximum value among the second degrees of similarity is larger than a predetermined threshold value, the keyword corresponding to the maximum value is detected in the first string.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice recognition device extracts, from a first voice signal of a user, a first string of phonemes included in the first voice signal, extracts, from a second voice signal of the user, a second string of phonemes included in the second voice signal, extracts a string of common phonemes from the first string and the second string, calculates, for each of a plurality of registered keywords, a degree of similarity between a string of phonemes corresponding to the keyword and the string of common phonemes, and selects, among the plurality of keywords, a prescribed number of keywords based on the degree of similarity for each keyword.

13 Citations

23 Claims

1. A non-transitory computer-readable recording medium having recorded thereon a computer program for voice recognition that causes a computer to execute a process comprising:
- extracting, from a first voice signal of a user, a first string of phonemes included in the first voice signal;
  
  determining whether or not any keyword among a plurality of registered keywords stored in a memory is detected in the first string;
  
  when any keyword is detected in the first string, outputting information representing the detected keyword;
  
  when any keyword is not detected, storing the first string;
  
  extracting, from a second voice signal of the user, a second string of phonemes included in the second voice signal;
  
  determining whether or not any keyword among the plurality of registered keywords is detected in the second string;
  
  storing the second string when any keyword is not detected in the second string;
  
  extracting a string of common phonemes from the first string and the second string;
  
  calculating, for each of the plurality of registered keywords, a first degree of similarity between a string of phonemes corresponding to the keyword and the string of common phonemes; and
  
  selecting, among the plurality of keywords, a prescribed number of keywords based on the first degree of similarity for each keyword, wherein determination of whether or not any keyword is detected in the first string includes;
  
  calculating, for each of the plurality of registered keywords, a second degree of similarity between a string of phonemes corresponding to the keyword and the first string of phonemes based on a number of coincident phonemes between the first string of phonemes and the string of phonemes corresponding to the keyword, a number of phonemes that are included in the string of phonemes corresponding to the keyword but not included in the first string of phonemes, and a number of phonemes that are included in the string of phonemes corresponding to the keyword and are different from phonemes at corresponding positions in the first string of phonemes; and
  
  determining that, when a maximum value among the second degrees of similarity is larger than a predetermined threshold value, the keyword corresponding to the maximum value is detected in the first string.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The non-transitory computer-readable recording medium having recorded thereon the computer program for voice recognition according to claim 1, whereinselection of the predetermined number of keywords includes selecting the prescribed number of keywords among the plurality of keywords in descending order of the first degree of similarity for each keyword.
  - 3. The non-transitory computer-readable recording medium having recorded thereon the computer program for voice recognition according to claim 1, whereinextraction of the string of common phonemes includes extracting the string of common phonemes after deleting a phoneme representing silence from each of the first string and the second string.
  - 4. The non-transitory computer-readable recording medium having recorded thereon the computer program for voice recognition according to claim 1, whereinextraction of the string of common phonemes includes extracting the string of common phonemes after deleting a phoneme included in only one of the first string and the second string from each of the first string and the second string.
  - 5. The non-transitory computer-readable recording medium having recorded thereon the computer program for voice recognition according to claim 1, whereinextraction of the string of common phonemes includes extracting the string of common phonemes after substituting, for each of the first string and the second string, a phoneme that is included in the string and which belongs to a phoneme group whose phonemes can be substituted with one another with a representative phoneme associated with the phoneme group.
  - 6. The non-transitory computer-readable recording medium having recorded thereon the computer program for voice recognition according to claim 1, the process further comprisesdetecting a first voice section in which the user utters in the first voice signal and detecting a second voice section in which the user utters in the second voice signal,wherein extraction of the first string includes extracting a string of phonemes included in the first voice section as the first string, andextraction of the second string includes extracting a string of phonemes included in the second voice section as the second string.
  - 7. The non-transitory computer-readable recording medium having recorded thereon the computer program for voice recognition according to claim 1, whereincalculation of the first degree of similarity includes calculating, for each of the plurality of keywords, an edit distance between a string of phonemes corresponding to the keyword and the string of common phonemes and, based on the edit distance, calculating the first degree of similarity.
  - 8. The non-transitory computer-readable recording medium having recorded thereon the computer program for voice recognition according to claim 7, whereincalculation of the first degree of similarity includes calculating, for each of the plurality of keywords, a minimum value of the edit distance using dynamic programming matching and, based on the minimum value, calculating the first degree of similarity.
  - 9. The non-transitory computer-readable recording medium having recorded thereon the computer program for voice recognition according to claim 7, whereincalculation of the first degree of similarity includes calculating, for each of the plurality of keywords, a minimum value of the edit distance using dynamic programming matching and, based on a degree of coincidence between a string of phonemes corresponding to the keyword when the edit distance takes the minimum value and the string of common phonemes, calculating the first degree of similarity.
  - 10. The non-transitory computer-readable recording medium having recorded thereon the computer program for voice recognition according to claim 1, the process further comprisesextracting, from a third voice signal representing a voice of the user, a third string of phonemes included in the third voice signal,wherein extraction of the string of common phoneme includes extracting a string of common phonemes to the first string, the second string, and the third string.
  - 11. The non-transitory computer-readable recording medium having recorded thereon the computer program for voice recognition according to claim 1, the process further comprisespresenting the selected prescribed number of keywords to the user.

12. A voice recognition device comprising:
- a memory configured to store a plurality of registered keywords; and
  
  a processor configured to;
  
  extract, from a first voice signal of a user, a first string of phonemes included in the first voice signal;
  
  determine whether or not any keyword among a plurality of registered keywords is detected in the first string;
  
  when any keyword is detected in the first string, output information representing the detected keyword;
  
  when any keyword is not detected in the first string,store the first string;
  
  extract, from a second voice signal of the user, a second string of phonemes included in the second voice signal;
  
  determine whether or not any keyword among a plurality of registered keywords is detected in the second string;
  
  store the second string when any keyword is not detected in the second string;
  
  extract a string of common phonemes from the first string and the second string;
  
  calculate, for each of the plurality of registered keywords, a first degree of similarity between a string of phonemes corresponding to the keyword and the string of common phonemes; and
  
  select, among the plurality of keywords, a prescribed number of keywords based on the first degree of similarity for each keyword, wherein the processor for the determination of whether or not any keyword is detected executes to;
  
  calculate, for each of the plurality of registered keywords, a second degree of similarity between a string of phonemes corresponding to the keyword and the first string of phonemes based on a number of coincident phonemes between the first string of phonemes and the string of phonemes corresponding to the keyword, a number of phonemes that are included in the string of phonemes corresponding to the keyword but not included in the first string of phonemes, and a number of phonemes that are included in the string of phonemes corresponding to the keyword and are different from phonemes at corresponding positions in the first string of phonemes; and
  
  determine that, when a maximum value among the second degrees of similarity is larger than a predetermined threshold value, the keyword corresponding to the maximum value is detected in the first string.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 13. The voice recognition device according to claim 12, whereinselection of the predetermined number of keywords includes selecting the prescribed number of keywords among the plurality of keywords in descending order of the first degree of similarity for each keyword.
  - 14. The voice recognition device according to claim 12, whereinextraction of the string of common phonemes includes extracting the string of common phonemes after deleting a phoneme representing silence from each of the first string and the second string.
  - 15. The voice recognition device according to claim 12, whereinextraction of the string of common phonemes includes extracting the string of common phonemes after deleting a phoneme included in only one of the first string and the second string from each of the first string and the second string.
  - 16. The voice recognition device according to claim 12, whereinextraction of the string of common phonemes includes extracting the string of common phonemes after substituting, for each of the first string and the second string, a phoneme that is included in the string and which belongs to a phoneme group whose phonemes can be substituted with one another with a representative phoneme associated with the phoneme group.
  - 17. The voice recognition device according to claim 12, wherein the processor is further configured todetect a first voice section in which the user utters in the first voice signal and detect a second voice section in which the user utters in the second voice signal,wherein extraction of the first string includes extracting a string of phonemes included in the first voice section as the first string, andextraction of the second string includes extracting a string of phonemes included in the second voice section as the second string.
  - 18. The voice recognition device according to claim 12, whereincalculation of the first degree of similarity includes calculating, for each of the plurality of keywords, an edit distance between a string of phonemes corresponding to the keyword and the string of common phonemes and, based on the edit distance, calculating the first degree of similarity.
  - 19. The voice recognition device according to claim 18, whereincalculation of the first degree of similarity includes calculating, for each of the plurality of keywords, a minimum value of the edit distance using dynamic programming matching and, based on the minimum value, calculating the first degree of similarity.
  - 20. The voice recognition device according to claim 18, whereincalculation of the first degree of similarity includes calculating, for each of the plurality of keywords, a minimum value of the edit distance using dynamic programming matching and, based on a degree of coincidence between a string of phonemes corresponding to the keyword when the edit distance takes the minimum value and the string of common phonemes, calculating the first degree of similarity.
  - 21. The voice recognition device according to claim 12, wherein the processor is further configured toextract, from a third voice signal representing a voice of the user, a third string of phonemes included in the third voice signal,wherein extraction of the string of common phoneme includes extracting a string of common phonemes to the first string, the second string, and the third string.
  - 22. The voice recognition device according to claim 12, wherein the processor is further configured topresent the selected prescribed number of keywords to the user.

23. A voice recognition method comprising:
- extracting, from a first voice signal of a user, a first string of phonemes included in the first voice signal;
  
  determining whether or not any keyword among a plurality of registered keywords stored in a memory is detected in the first string;
  
  when any keyword is detected in the first string, outputting information representing the detected keyword;
  
  when any keyword is not detected, storing the first string;
  
  extracting, from a second voice signal of the user, a second string of phonemes included in the second voice signal;
  
  determining whether or not any keyword among a plurality of registered keywords is detected in the second string;
  
  storing the second string when any keyword is not detected in the second string;
  
  extracting a string of common phonemes from the first string and the second string; and
  
  calculating, with respect to each of the plurality of registered keywords, a first degree of similarity between a string of phonemes corresponding to the keyword and the string of common phonemes and, among the plurality of keywords, selecting a prescribed number of keywords based on the first degree of similarity for each keyword, wherein determination of whether or not any keyword is detected in the first string includes;
  
  calculating, for each of the plurality of registered keywords, a second degree of similarity between a string of phonemes corresponding to the keyword and the first string of phonemes based on a number of coincident phonemes between the first string of phonemes and the string of phonemes corresponding to the keyword, a number of phonemes that are included in the string of phonemes corresponding to the keyword but not included in the first string of phonemes, and a number of phonemes that are included in the string of phonemes corresponding to the keyword and are different from phonemes at corresponding positions in the first string of phonemes; and
  
  determining that, when a maximum value among the second degrees of similarity is larger than a predetermined threshold value, the keyword corresponding to the maximum value is detected in the first string.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fujitsu Limited
Original Assignee
Fujitsu Limited
Inventors
Hayakawa, Shoji
Primary Examiner(s)
Shah, Paras D

Application Number

US15/673,830
Publication Number

US 20180075843A1
Time in Patent Office

1,020 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/08   Speech classification or se...

G10L 15/22   Procedures used during a sp...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/088   Word spotting

Voice recognition device and voice recognition method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

13 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Voice recognition device and voice recognition method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

13 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links