Speech recognition method and apparatus thereof

US 5,003,601 A
Filed: 03/07/1989
Issued: 03/26/1991
Est. Priority Date: 05/25/1984
Status: Expired due to Term

First Claim

Patent Images

1. A method of speech recognition comprising the steps of:

segmenting an input speech signal;

deriving a plurality of time-sequential acoustic parameters from the segmented speech signal having characteristics time-normalized at every segment;

forming a first trajectory using a plurality of dots, each dot corresponding to one of said plurality of time-sequential, time-normalized acoustic parameters, to form a time-normalized trajectory of said segmented speech signal;

providing a plurality of registered trajectories representing known speech segments, each of said registered trajectories being represented by a plurality of data;

matching a formed trajectory with one of said registered trajectories; and

producing an indication of the results of the matching; and

characterized by the further steps of;

sampling said formed first trajectory at a predetermined length therealong, thereby producing new data representing said formed trajectory to be matched with said registered trajectories;

determining a length of said formed trajectory;

determining lengths of said registered trajectories;

comparing the length of said formed trajectory with the registered trajectory lengths; and

using the comparison results in said step of matching.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition method and apparatus in which a plurality of time-sequential acoustic parameters are derived from a segmented input speech signal and are used to form a first trajectory that is time normalized. The time-normalized trajectory is sampled at predetermined lengths therealong and the sampling results used to form a new time-normalized trajectory with equally spaced points ("dots" on the graph), which is compared with a plurality of previously registered trajectories until a match is found between the new time-normalized trajectory and one of the registered trajectories, at which time an indication of the results of the matching is produced. A silence acoustic parameter can also be added to the time-sequential acoustic parameters so that the time-normalized trajectory can start and end from a point of silence. In addition, the length of the first trajectory formed from the acoustic parameters derived from the segmented speech signals is determined and used in the matching operation with the registered trajectories, so that the lengths of the trajectories as well as the distances between the formed trajectory and the stored trajectory are also taken into account to increase the recognition ratio.

Citations

22 Claims

1. A method of speech recognition comprising the steps of:
- segmenting an input speech signal;
  
  deriving a plurality of time-sequential acoustic parameters from the segmented speech signal having characteristics time-normalized at every segment;
  
  forming a first trajectory using a plurality of dots, each dot corresponding to one of said plurality of time-sequential, time-normalized acoustic parameters, to form a time-normalized trajectory of said segmented speech signal;
  
  providing a plurality of registered trajectories representing known speech segments, each of said registered trajectories being represented by a plurality of data;
  
  matching a formed trajectory with one of said registered trajectories; and
  
  producing an indication of the results of the matching; and
  
  characterized by the further steps of;
  
  sampling said formed first trajectory at a predetermined length therealong, thereby producing new data representing said formed trajectory to be matched with said registered trajectories;
  
  determining a length of said formed trajectory;
  
  determining lengths of said registered trajectories;
  
  comparing the length of said formed trajectory with the registered trajectory lengths; and
  
  using the comparison results in said step of matching.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. A method of speech recognition according to claim 1, in which said new time-normalized trajectory is formed on the basis of more than two adjacent time-sequential dots of said new dots.
  - 3. A method of speech recognition according to claim 2, in which said new time-normalized trajectory is formed to pass through each dot in excess of two of said new dots.
  - 4. A method of speech recognition according to claim 3, in which said new time-normalized trajectory is formed by interpolating a distance between said new dots by straight-line interpolation.
  - 5. A method of speech recognition according to claim 3, in which said new time-normalized trajectory is formed by interpolating a distance between said new dots by curved-line interpolation.
  - 6. A method of speech recognition according to claim 1, in which said step of forming a first trajectory includes the step of adding a silence acoustic parameter to said plurality of time-sequential acoustic parameters derived from said segmented speech signal.
  - 7. A method of speech recognition according to claim 6, in which said step of forming a first trajectory includes the step of beginning said first trajectory at a point indicative of silence.
  - 8. A method of speech recognition according to claim 6, in which said step of forming a first trajectory includes the steps of beginning said first trajectory at a point indicative of silence, and ending said first trajectory at said point indicative of silence.
  - 9. A method of speech recognition according to claim 6, in which said step of forming a first trajectory includes the step of ending said first trajectory at a point indicative of silence.
  - 10. A method of speech recognition according to claim 1, in which said predetermined lengths used for sampling said trajectory are varied in accordance with a total length of said first trajectory formed using said segmented speech signal.
  - 11. A method of speech recognition according to claim 1, further comprising the step of comparing a trajectory length of said new time-normalized trajectory formed by said segmented speech signal with a trajectory length of said registered trajectory before carrying out the step of matching, thereby to decrease the number of said registered trajectories to be matched with said new time-normalized trajectory.
  - 12. A method of speech recognition according to claim 1, further comprising the steps of determining a length of said new time-normalized trajectory formed from said segmented speech signal, determining the lengths of said registered trajectories, comparing the length of the new time-normalized trajectory with the registered trajectory length and using the comparison results in the matching step.

13. Apparatus for speech recognition, comprising:
- means for segmenting an input speech signal;
  
  means for producing a plurality of time-sequential acoustic parameters from the segmented speech signal, said parameters lying along a trajectory;
  
  means for producing a plurality of speech recognition parameters from said plurality of time-sequential acoustic parameters;
  
  means for registering a plurality of speech recognition parameters having respective known speech segment correlations;
  
  means for matching said calculated time-sequential speech recognition parameters with previously registered speech recognition parameters;
  
  output means for producing an output representing the results of the matching processing in said means for matching, wherein said means for producing a plurality of speech recognition parameters comprises;
  
  first calculating means receiving said time-sequential acoustic parameter produced at a first sample time and respective acoustic parameters produced at adjacent sampling times;
  
  second calculating means receiving said calculated intervals for calculating said speech recognition parameters by sampling at a series of said calculated intervals for a predetermined length over the trajectory; and
  
  further comprising;
  
  means for correcting a matching distance between said time-sequential speech recognition parameter produced from said segmented speech signal and said previously registered speech recognition parameter during operation of said means for matching based on the difference between a series of intervals of said plurality of time-sequential acoustic parameters produced from said segmented speech signal and a series of intervals of said previously registered speech recognition parameters during operation of said means for matching.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 14. Apparatus according to claim 13, in which said first calculating means includes third calculating means for calculating the interval between said adjacent acoustic parameters at each of a plurality of time-sequential acoustic parameters and for calculating a distance between a parameter at a selected time and a parameter at an adjacent time by summing all of said calculated intervals.
  - 15. Apparatus according to claim 14, in which said third calculating means comprises curved-line interpolation means for calculating the distance between said adjacent acoustic parameters.
  - 16. Apparatus according to claim 13, further comprising means for adding to said plurality of time-sequential acoustic parameters an acoustic parameter indicative of silence.
  - 17. Apparatus according to claim 16, in which said means for adding said silence acoustic parameter includes means for arranging said silence acoustic parameter at a start point of each of said plurality of time-sequential acoustic parameters produced from said segmented speech signal.
  - 18. Apparatus according to claim 16, in which said means for adding said silence acoustic parameter includes means for arranging said silence acoustic parameter at an end point of each of said plurality of time-sequential acoustic parameters produced from said segmented speech signal.
  - 19. Apparatus according to claim 16, in which said means for adding said silence acoustic parameter includes means for arranging said silence parameter at both start and end points of each of said plurality of time-sequential acoustic parameters produced from said segmented speech signal.
  - 20. Apparatus according to claim 13, in which said time-sequential speech recognition parameters lie along a second trajectory and said second calculating means includes means for sampling said second trajectory at a series of intervals therealong for producing further time-sequential speech recognition parameters.
  - 21. Apparatus according to claim 13, further comprising means for selecting a previously registered time-sequential parameter for use in said means for matching on the basis of a length formed of a series intervals of said plurality of time-sequential acoustic parameters produced from said segmented speech signal.
  - 22. Apparatus according to claim 14, in which said third calculating means comprises straight-line interpolation means for calculating the interval between said adjacent acoustic parameters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Akabane, Makoto, Watari, Masao, Sako, Yoichiro, Hiraiwa, Atsunobu
Primary Examiner(s)
KEMENY, EMANUEL

Application Number

US07/323,098
Time in Patent Office

749 Days
Field of Search

381/41, 381/43, 364/513.5
US Class Current

704/255
CPC Class Codes

G10L 15/00 Speech recognition G10L17/0...

Speech recognition method and apparatus thereof

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition method and apparatus thereof

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links