Speech recognition system and method for speech recognition

US 8,346,553 B2
Filed: 02/21/2008
Issued: 01/01/2013
Est. Priority Date: 03/16/2007
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition system comprising:

an identifier for adding an identifying code to utterance data corresponding to signals generated by utterances of each of a plurality of users, the identifying code being available for identifying each of the users,a calculator for rating the utterance data by a value for each of the identifying code, the value being determined on the basis of comparison of characteristics of the utterance data with characteristics of word information selected from a plurality of sets of word information stored;

storage for storing N pieces of vocabulary information corresponding to N sets of the utterance data, the utterance data having a same identifying code, the N sets of utterance data having the value within top N, N being an integer equal to one or more;

a selector for selecting posterior N pieces of word information posterior in time to prior N pieces of word information, the identifying codes of the utterance data relative to the posterior and prior N pieces of word information being spoken by the users that are different from each other;

a relational calculator for calculating a degree of relationship between the prior and posterior N pieces of word information, the degree of relationship being capable of rating a fact of the utterance relative to the posterior N pieces of word information being performed later than the utterance relative to the prior N pieces of word information;

a first determiner for determining the posterior N pieces of word information corresponding to an utterance performed later than the utterance relative to the prior N pieces of word information; and

a second determiner for determining the posterior N pieces of word relative to an utterance as a response to the utterance relating to the prior N pieces of word information on the basis of a predetermined condition.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A recognition result extraction unit and an agreement determination unit are provided. The recognition result extraction unit extracts, from a recognition result storage unit, N best solutions A and B obtained by an utterance B. The utterance B follows an utterance A corresponding to the N best solutions A and made by a speaker b who is different from a speaker of the utterance A. In a case where a repeat utterance determination unit determines that the N best solutions B are N best solutions obtained by a repeat utterance B according to the utterance A corresponding to the N best solutions A, when the best solution A and B are different each other, the agreement determination unit determines that some or all of the N best solutions A can be replaced with some or all of the N best solutions B.

Citations

17 Claims

1. A speech recognition system comprising:
- an identifier for adding an identifying code to utterance data corresponding to signals generated by utterances of each of a plurality of users, the identifying code being available for identifying each of the users,a calculator for rating the utterance data by a value for each of the identifying code, the value being determined on the basis of comparison of characteristics of the utterance data with characteristics of word information selected from a plurality of sets of word information stored;
  
  storage for storing N pieces of vocabulary information corresponding to N sets of the utterance data, the utterance data having a same identifying code, the N sets of utterance data having the value within top N, N being an integer equal to one or more;
  
  a selector for selecting posterior N pieces of word information posterior in time to prior N pieces of word information, the identifying codes of the utterance data relative to the posterior and prior N pieces of word information being spoken by the users that are different from each other;
  
  a relational calculator for calculating a degree of relationship between the prior and posterior N pieces of word information, the degree of relationship being capable of rating a fact of the utterance relative to the posterior N pieces of word information being performed later than the utterance relative to the prior N pieces of word information;
  
  a first determiner for determining the posterior N pieces of word information corresponding to an utterance performed later than the utterance relative to the prior N pieces of word information; and
  
  a second determiner for determining the posterior N pieces of word relative to an utterance as a response to the utterance relating to the prior N pieces of word information on the basis of a predetermined condition.
- View Dependent Claims (2, 3)
- - 2. A speech recognition system according to claim 1, further comprising:
    - a third determiner for determining whether a first prior word information of the prior N pieces of word information agrees with a first posterior word of the posterior N pieces of word information, the first prior word information corresponding to the utterance data having a highest value within data relative to the prior N pieces of word information, the first posterior word information corresponding to the utterance data having a highest value within data relative to the posterior N pieces of word information.
  - 3. A speech recognition system according to claim 2, further comprising:
    - a replacer for replacing the first prior word information of the prior N pieces of word information with the first posterior word of the posterior N pieces of word information in the case that the first prior and posterior vocabularies information disagree.

4. A speech recognition system comprising:
- an input identification means for identifying each of a plurality of users of received signals of utterance;
  
  recognition result storage for storing top N recognition vocabularies having high recognition scores starting from the best solution as N best solutions, N being an integer equal to one or more, the recognition scores being calculated by comparing data corresponding to the utterance with a plurality of recognition vocabularies, a recognition word having the highest recognition score being the best solution;
  
  a recognition result extraction means for extracting N best solutions extracted as following N best solutions from the recognition result storage, the following N best solutions following chronologically the utterance corresponding to a preceding N best solutions, the following N best solutions having been made by one of the users different from the user of the utterance corresponding to the preceding N best solutions;
  
  a degree of association calculation means for calculating a degree of association representing a likelihood that the following N best solutions are N best solutions obtained by a response utterance in response to the utterance corresponding to the preceding N best solutions;
  
  a response utterance determination means for determining that the following N best solutions are N best solutions obtained by a response utterance in response to the utterance corresponding to the preceding N best solutions in the case of the degree of association being equal to or more than a threshold value;
  
  a repeat utterance determination means for determining whether the following N best solutions are N best solutions obtained by a repeat utterance in response to the utterance corresponding to the preceding N best solution, in the case that the following N best solutions are N best solutions obtained by a response utterance in response to the utterance corresponding to the preceding N best solutions; and
  
  an agreement determination means for;
  
  determining whether a preceding best solution and a following best solution agree with each other in the case of the following N best solutions being best solutions obtained by a repeat utterance in response to the utterance corresponding to the preceding N best solutions, the preceding best solution being a best solution of the preceding N best solutions, the following best solution being a best solution of the following N best solutions is the following best solution; and
  
  determining that some or all of the preceding N best solutions can be replaced with some or all of the following N best solutions in the case that the preceding best solution and the following best solution do not agree with each other.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 5. The speech recognition system according to claim 4, further comprising:
    - a recognition result correction means for updating the preceding best solution in the recognition result storage to the following best solution, the recognition storage storing the preceding N best solutions, in the case that the agreement determination means determines that the preceding best solution and the following best solution do not agree with each other; and
      
      a result output means for outputting the following best solution updated by the recognition result correction means.
  - 6. The speech recognition system according to claim 5, further comprising:
    - a correction history generating means for generating history data representing a history of updates applied by the recognition result correction means; and
      
      an output presentation means for presenting the history data generated by the correction history generating means.
  - 7. The speech recognition system according to claim 4, wherein, in the case where the response utterance determination means determines that the following N best solutions are N best solutions obtained by a response utterance in response to the utterance corresponding to the preceding N best solutions, when the recognition score of the following best solution of the following N best solutions is equal to or more than a predetermined recognition score and when both a first and second conditions are satisfied, the repeat utterance determination means determines that the following N best solutions are N best solutions obtained by a repeat utterance in response to the utterance corresponding to the preceding N best solutions, the first condition being that a solution in the preceding N best solutions agrees with the following best solution of the following N best solutions, the second condition being that the recognition score of the aforementioned solution in the preceding N best solutions, which agrees with the following best solution, is equal to or more than a predetermined recognition score, or the aforementioned solution in the preceding N best solutions is placed in a preset rank relative to the preceding best solution or higher, the aforementioned solution agreeing with the following best solution.
  - 8. The speech recognition system according to claim 4, the system further comprising:
    - a co-occurrence information storage that stores co-occurrence information representing co-occurrence relationships between recognition vocabularies and/or a semantic attribute storage that stores semantic attributes representing the meanings of recognition vocabularies, anda comparison process changing means for changing a method for comparing an utterance with a plurality of recognition vocabularies on the basis of the co-occurrence information and/or the semantic attributes in the case of the preceding best solution and the following best solution being coincident with each other.
  - 9. The speech recognition system according to claim 4, wherein the degree of association calculation means calculates a degree of association on the basis of at least one of:
    - the number of solutions in which individual solutions in the preceding N best solutions agree with individual solutions in the following N best solutions;
      
      differences between the ranks based on the recognition scores in the preceding N best solutions and the ranks based on the recognition scores in the following N best solutions, individual solutions in the preceding N best solutions being coincident with individual solutions in the following N best solutions;
      
      a time difference between time at which the preceding N best solutions have been output and time at which the following N best solutions have been output;
      
      differences between positions on the time series at which the plurality of groups of the preceding N best solutions appear and the positions on the time series at which the plurality of groups of the following N best solutions appear, in a case that a plurality of groups of the preceding N best solutions are obtained by comparing a first utterance with a plurality of recognition vocabularies, and a plurality of groups of the following N best solutions are obtained by comparing a second utterance made by a user who is different from a user of the first utterance with the plurality of recognition vocabularies.
  - 10. The speech recognition system according to claim 9, wherein, the larger the number of solutions, in which the individual solutions in the preceding N best solutions agree with the individual solutions in the following N best solutions, and the smaller the differences between, regarding the solutions, in which the individual solutions in the preceding N best solutions agree with the individual solutions in the following N best solutions, the ranks based on the recognition scores in the preceding N best solutions and the ranks based on the recognition scores in the following N best solutions, the higher the degree of association calculated by the degree of association calculation means.
  - 11. The speech recognition system according to claim 9, further comprising:
    - a time information control means for assigning time information representing a current time to the N best solutions, and for writing the N best solutions including the time information assigned to the recognition result storage,wherein, the smaller the time difference between the current time represented by time information assigned to the preceding N best solutions and the current time represented by time information assigned to the following N best solutions, the higher the degree of association calculated by the degree-of-association calculation means.
  - 12. The speech recognition system according to claim 9, wherein, in a case where a plurality of groups of the preceding N best solutions are obtained by comparing a first utterance with a plurality of recognition vocabularies, and a plurality of groups of the following N best solutions are obtained by comparing a second utterance made by a user who is different from a user of the first utterance with the plurality of recognition vocabularies, the smaller the differences between the positions, on the time series, at which the plurality of groups of the preceding N best solutions appear and the positions, on the time series, at which the plurality of groups of the following N best solutions appear, the higher the degree of association calculated by the degree of association calculation means.
  - 13. The speech recognition system according to claim 12, further comprising:
    - a function word dictionary for storing function words representing the positions at which utterances appear in association with the positions,wherein, when the following best solution in any one group of the following N best solutions out of the plurality of groups of the following N best solutions agrees with a function word, the degree of association calculation means sets the position represented by the function word as the position at which a group of the following N best solutions appear, the group of the following N best solutions being chronologically next to the one group of the following N best solutions including the following best solution, which agrees with the function word.
  - 14. The speech recognition system according to claim 13, further comprising:
    - a function word extraction means that extracts, from the function word dictionary, function words corresponding to the positions, on the time series, at which the plurality of groups of the preceding N best solutions appear,wherein the output presentation means presents the function words extracted by the function word extraction means in association with the individual preceding best solutions of the plurality of groups of the preceding N best solutions.

15. A speech recognition method comprising:
- adding an identifying code to utterance data corresponding to signals generated by utterances of each of a plurality of users, the identifying code being available for identifying each of the users;
  
  rating the utterance data by a value for each of the identifying codes, the value being determined on the basis of comparison of a characteristics of the utterance data with characteristics of word information selected from a plurality of sets of word information stored;
  
  storing N pieces of word information corresponding to N sets of the utterance data, the utterance data having a same identifying code, the N sets of utterance data having the value within top N, N being an integer equal to one or more;
  
  selecting posterior N pieces of word information posterior in time to prior N pieces of word information, the identifying codes of the utterance data relative to the posterior and prior N pieces of word information being spoken by the users that are different from each other;
  
  calculating a degree of relationship between the prior and posterior N pieces of word information, the degree of relationship being capable of rating a fact of the utterance relative to the posterior N pieces of word information being performed later than the utterance relative to the prior N pieces of word information;
  
  determining the posterior N pieces of word information corresponding to an utterance performed later than the utterance relative to the prior N pieces of word information; and
  
  determining the posterior N pieces of word relative to an utterance as a response to the utterance relating to the prior N pieces of word information on the basis of a predetermined condition.
- View Dependent Claims (16, 17)
- - 16. A speech recognition method according to claim 15, further comprising:
    - determining whether a first prior word information of the prior N pieces of word information agrees with a first posterior word of the posterior N pieces of word information, the first prior word information corresponding to the utterance data having a highest value within data relative to the prior N pieces of word information, the first posterior word information corresponding to the utterance data having a highest value within data relative to the posterior N pieces of word information.
  - 17. A speech recognition method according to claim 16, further comprising:
    - replacing the first prior word information of the prior N pieces of word information with the first posterior word of the posterior N pieces of word information in the case that the first prior and posterior vocabularies information disagree.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fujitsu Limited
Original Assignee
Fujitsu Limited
Inventors
Abe, Kenji
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
ADESANYA, OLUJIMI A

Application Number

US12/034,978
Publication Number

US 20080228482A1
Time in Patent Office

1,776 Days
Field of Search

704/249
US Class Current

704/249
CPC Class Codes

G10L 15/1815   Semantic context, e.g. disa...

G10L 17/00   Speaker identification or v...

G10L 2015/228   of application context

Speech recognition system and method for speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition system and method for speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links