Data Process unit and data process unit control program

US 20090138263A1
Filed: 12/30/2008
Published: 05/28/2009
Est. Priority Date: 10/03/2003
Status: Active Grant

First Claim

Patent Images

1-37. -37. (canceled)

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

To provide a data process unit and data process unit control program which are suitable for generating acoustic models for unspecified speakers taking distribution of diversifying feature parameters into consideration under such specific conditions as the type of speaker, speech lexicons, speech styles, and speech environment and which are suitable for providing acoustic models intended for unspecified speakers and adapted to speech of a specific person.

A data process unit 1 comprises a data classification section 1a, data storing section 1b, pattern model generating section 1c, data control section 1d, mathematical distance calculating section 1e, pattern model converting section 1f, pattern model display section 1g, region dividing section 1h, division changing section 1i, region selecting section 1j, and specific pattern model generating section 1k.

143 Citations

75 Claims

1-37. -37. (canceled)

38. A data process unit comprising:
- acoustic space storing means for storing an acoustic space composed of a plurality of pattern models generated from speech data of a plurality of speakers;
  
  speech data acquiring means for acquiring speech data of a target speaker;
  
  position calculating means for calculating position of the speech data of the target speaker in the acoustic space based on the speech data of the target speaker acquired by the speech data acquiring means and the plurality of pattern models in the acoustic space stored by the acoustic space storing means;
  
  speech data evaluating means for evaluating value of the speech data of the target speaker based on the position calculated by the position calculating means;
  
  evaluation result display means for displaying evaluation results produced by the speech data evaluating means; and
  
  positional relationship information display means for displaying information about positional relationship between the speech data and pattern models around the speech data in the acoustic space based on the calculated position.
- View Dependent Claims (39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 53, 54, 55, 56)
- - 39. The data process unit according to claim 38, wherein the speech data evaluating means evaluates the value of the speech data based on the number of pattern models existing within a predetermined distance from the position of the speech data of the target speaker calculated by the position calculating means.
  - 40. The data process unit according to claim 39, wherein:
    - the predetermined distance is set stepwise; and
      
      the speech data evaluating means evaluates the value of the speech data based on the number of pattern models existing within each distance range set stepwise.
  - 41. The data process unit according to any of claims 38claim 38, wherein the speech data evaluating means uses a pattern model similar in features to the speech data of the target speaker out of the plurality of pattern models as the pattern model of the target speaker for the evaluation based on the position calculated by the position calculating means.
  - 42. The data process unit according to claim 41, wherein the speech data evaluating means uses the top few pattern models similar in features to the speech data of the target speaker as the pattern models of the target speaker for the evaluation out of the plurality of pattern models.
  - 43. The data process unit according to claim 41, wherein the position calculating means converts the speech data acquired by the speech data acquiring means into high dimensional feature data, calculates likelihood of match between the feature data and each of the plurality of the pattern models of the plurality of speakers, selects a specific pattern model from the plurality of pattern models of the plurality of speakers based on the calculated likelihood, calculates mathematical distance between the selected specific pattern model and other pattern models, and calculates the position of the acquired speech data in the acoustic space based on the calculated mathematical distance.
  - 44. The data process unit according to claim 38, wherein the position calculating means converts the speech data acquired by the speech data acquiring means into high dimensional feature data, generates a pattern model of the target speaker based on the feature data, calculates mathematical distance between the generated pattern model and the plurality of pattern models of the plurality of speakers, and calculates the position of the acquired speech data in the acoustic space based on the calculated mathematical distance.
  - 45. The data process unit according to claim 43, wherein:
    - the pattern models consist of 4-dimensional or higher dimensional elements; and
      
      the positional relationship information display means converts a plurality of pattern models in the acoustic space including a plurality of pattern models corresponding to speech data of the target speaker into lower dimensional pattern models while maintaining the distance relationship and displays the pattern models after the conversion as coordinate points in a low dimensional space.
  - 46. The data process unit according to 38, wherein the pattern models are generated using HMMs (Hidden Markov Models).
  - 47. The data process unit according to claim 38, wherein:
    - the speech data evaluating means evaluates value of speech data of the target speaker on a phoneme-by-phoneme basis; and
      
      the evaluation result display means displays evaluation results of the speech data of the target speaker on a phoneme-by-phoneme basis.
  - 48. The data process unit according to claim 38, wherein the evaluation result display means displays supplementary information of the speech data when the speech data evaluating means evaluates the speech data of the target speaker as having low value.
  - 49. The data process unit according to claim 38, comprising:
    - negotiating means for negotiating with the target speaker on whether the speech data can be provided; and
      
      speech data storing means for storing the speech data over which negotiations are completed successfully by the negotiating means.
  - 50. A data process system comprising:
    - an information processing terminal which is under the control of a target speaker; and
      
      the data process unit according to claim 38, wherein;
      
      the information processing terminal and the data process unit are communicably connected with each other,the information processing terminal comprises speech data sending means for acquiring speech data of the target speaker and sending the acquired speech data to the data process unit, and evaluation information display means for displaying information about evaluation results of the speech data of the target speaker acquired from the data process unit, andthe data process unit comprises evaluation information sending means for sending the information about the evaluation results to the information processing terminal.
  - 52. A data process unit control program which is a computer-executable program for controlling the data process unit according to claim 38, comprising:
    - an acoustic space storing step of storing an acoustic space composed of a plurality of pattern models generated from speech data of a plurality of speakers;
      
      a speech data acquiring step of acquiring speech data of a target speaker;
      
      a position calculating step of calculating position of the speech data of the target speaker in the acoustic space based on the speech data acquired in the speech data acquiring step and the plurality of pattern models in the acoustic space stored in the acoustic space storing step;
      
      a speech data evaluating step of evaluating value of the speech data of the target speaker based on the position calculated in the position calculating step; and
      
      an evaluation result display step of displaying evaluation results produced in the speech data evaluating step.
  - 53. The data process unit applicable to the data process system according to claim 50, comprising:
    - acoustic space storing means for storing an acoustic space composed of a plurality of pattern models generated from speech data of a plurality of speakers;
      
      speech data acquiring means for acquiring speech data of a target speaker;
      
      position calculating means for calculating position of the speech data of the target speaker in the acoustic space based on the speech data acquired by the speech data acquiring means and the plurality of pattern models in the acoustic space stored by the acoustic space storing means;
      
      speech data evaluating means for evaluating value of the speech data of the target speaker based on the position calculated by the position calculating means;
      
      evaluation result display means for displaying evaluation results produced by the speech data evaluating means;
      
      positional relationship information display means for displaying information about positional relationship between the speech data and pattern models around the speech data in the acoustic space based on the calculated position; and
      
      evaluation information sending means for sending the information about the evaluation results to the information processing terminal.
  - 54. An information processing terminal applicable to the data process system according to claim 50, comprising:
    - speech data sending means for acquiring speech data of the target speaker and sending the acquired speech data to the data process unit; and
      
      evaluation information display means for displaying information about evaluation results of the speech data of the target speaker acquired from the data process unit.
  - 55. A data process unit control program which is a computer-executable program for controlling the data process unit according to claim 53,wherein the data process unit comprises an acoustic space composed of a plurality of pattern models generated from speech data of a plurality of speakers,the program comprising:
    - a speech data acquiring step of acquiring speech data of a target speaker;
      
      a position calculating step of calculating position of the speech data of the target speaker in the acoustic space based on the speech data acquired in the speech data acquiring step and the plurality of pattern models in the acoustic space;
      
      a speech data evaluating step of evaluating value of the speech data of the target speaker based on the position calculated in the position calculating step;
      
      an evaluation result display step of displaying evaluation results produced in the speech data evaluating step;
      
      a positional relationship information display step of displaying information about positional relationship between the speech data and pattern models around the speech data in the acoustic space based on the calculated position; and
      
      an evaluation information sending step of sending the information about the evaluation results to the information processing terminal.
  - 56. An information processing terminal control program which is a computer-executable program for controlling the information processing terminal according to claim 54, comprising:
    - a speech data sending step of acquiring speech data of the target speaker and sending the acquired speech data to the data process unit; and
      
      an evaluation information display step of displaying information about evaluation results of speech data of the target speaker acquired from the data process unit.

51. A data process method comprising the steps of:
- preparing an acoustic space composed of a plurality of pattern models generated from speech data of a plurality of speakers;
  
  acquiring speech data of a target speaker;
  
  calculating position of the speech data of the target speaker in the acoustic space based on the acquired speech data and the plurality of pattern models in the acoustic space;
  
  evaluating value of the speech data of the target speaker based on the calculated position; and
  
  displaying the evaluation results.

57. A data process unit comprising:
- acoustic space storing means for storing an acoustic space composed of a plurality of pattern models generated from speech data of a plurality of speakers;
  
  speech data acquiring means for acquiring speech data of a target speaker;
  
  position calculating means for calculating position of the speech data of the target speaker in the acoustic space based on the speech data of the target speaker and the plurality of pattern models in the acoustic space;
  
  similar-speaker detecting means for detecting similar speakers who resemble the target speaker in speech out of the plurality of speakers based on the position of the speech data and the plurality of pattern models; and
  
  positional relationship information display means for displaying information about positional relationship between the speech data of the target speaker and pattern models of the similar speakers in the acoustic space based on the position of the speech data and the pattern models of the similar speakers.
- View Dependent Claims (58, 60, 64, 66, 67, 68, 69, 70)
- - 58. The data process unit according to claim 57, comprising:
    - specific speaker specifying means for specifying a specific speaker among the plurality of speakers;
      
      similarity evaluating means for evaluating similarity in speech between the specific speaker and the target speaker based on the position of the speech data and the pattern model of the specific speaker in the acoustic space; and
      
      evaluation result display means for displaying evaluation results produced by the similarity evaluating means, whereinthe positional relationship information display means displays information about positional relationship between the speech data of the target speaker and pattern model of the specific speaker in the acoustic space based on the position of the speech data and the pattern model of the specific speaker.
  - 60. The data process unit according to claim 58, comprising:
    - correction information generating means for generating correction information which indicates corrections to be made to the speech of the target speaker in order to enhance similarity in speech between the target speaker and the specific speaker based on the evaluation results produced by the similarity evaluating means; and
      
      correction information display means for displaying the correction information.
  - 64. The data process unit according to claim 57, wherein the position calculating means converts the speech data acquired by the speech data acquiring means into high dimensional feature data, generates a pattern model of the target speaker based on the feature data, calculates mathematical distance between the generated pattern model and the plurality of pattern models of the plurality of speakers, and calculates the position of the acquired speech data in the acoustic space based on the calculated mathematical distance.
  - 66. The data process unit according to claim 58, wherein the similarity evaluating means evaluates the similarity of the speech data of the target speaker on a phoneme-by-phoneme basis.
  - 67. The data process unit according to claim 58, wherein:
    - the acoustic space is composed of a plurality of pattern models generated from speech data of the plurality of speakers in a plurality of speech styles; and
      
      the similarity evaluating means evaluates the similarity in each of the plurality of speech styles.
  - 68. The data process unit according to claim 67, wherein the positional relationship information display means establishes a coordinate axis of the low dimensional space based on the speech styles for the plurality of pattern models.
  - 69. The data process unit according to claim 57, wherein the pattern models are generated using HMMs (Hidden Markov Models).
  - 70. A data process system comprising:
    - an information processing terminal which is under the control of a target speaker; and
      
      the data process unit according to claim 57, wherein;
      
      the information processing terminal and the data process unit are communicably connected with each other,the information processing terminal comprises speech data sending means for acquiring speech data of the target speaker and sending the acquired speech data to the data process unit, and information display means for displaying information about processing results of speech data acquired from the data process unit, andthe data process unit comprises information sending means for sending the information about the processing results of the speech data to the information processing terminal.

59. A data process unit comprising:
- acoustic space storing means for storing an acoustic space composed of a plurality of pattern models generated from speech data of a plurality of speakers;
  
  specific speaker specifying means for specifying a specific speaker among the plurality of speakers;
  
  speech data acquiring means for acquiring speech data of a target speaker;
  
  position calculating means for calculating position of the speech data of the target speaker based on the speech data of the target speaker and the plurality of pattern models in the acoustic space;
  
  similarity evaluating means for evaluating similarity in speech between the specific speaker and the target speaker based on the position of the speech data and the pattern model of the specific speaker;
  
  evaluation result display means for displaying evaluation results produced by the similarity evaluating means; and
  
  positional relationship information display means for displaying information about positional relationship between the speech data of the target speaker and pattern model of the specific speaker in the acoustic space based on the position of the speech data and the pattern model of the specific speaker.
- View Dependent Claims (61, 62, 63, 65)
- - 61. The data process unit according to claim 59, wherein:
    - the similar-speaker detecting means uses a pattern model which is similar in features to the speech data of the target speaker out of the plurality of pattern models as the pattern model of the target speaker based on the position of the speech data; and
      
      the similarity evaluating means uses a pattern model which is similar in features to the speech data of the target speaker out of the plurality of pattern models as the pattern model of the target speaker based on the position of the speech data.
  - 62. The data process unit according to claim 59, wherein:
    - the similar-speaker detecting means uses the top few pattern models which are similar in features to the speech data of the target speaker out of the plurality of pattern models as the pattern model of the target speaker based on the position of the speech data; and
      
      the similarity evaluating means uses the top few pattern models which are similar in features to the speech data of the target speaker out of the plurality of pattern models as the pattern model of the target speaker based on the position of the speech data.
  - 63. The data process unit according to claim 61, wherein the position calculating means converts the speech data acquired by the speech data acquiring means into high dimensional feature data, calculates likelihood between the feature data and each of the plurality of the pattern models of the plurality of speakers, selects a specific pattern model from the pattern models of the plurality of speakers based on the calculated likelihood, calculates mathematical distance between the selected specific pattern model and other pattern models, and calculates the position of the acquired speech data in the acoustic space based on the calculated mathematical distance.
  - 65. The data process unit according to claim 63, wherein:
    - the pattern models consist of 4-dimensional or higher dimensional elements; and
      
      the positional relationship information display means converts a plurality of pattern models in the acoustic space including a plurality of pattern models corresponding to speech data of the target speaker into lower dimensional pattern models while maintaining the distance relationship and displays the pattern models after the conversion as coordinate points in a low dimensional space.

71. A data process method comprising the steps of:
- preparing an acoustic space composed of a plurality of pattern models generated from speech data of a plurality of speakers;
  
  acquiring speech data of a target speaker;
  
  calculating position of the speech data of the target speaker in the acoustic space based on the speech data of the target speaker and the plurality of pattern models in the acoustic space;
  
  detecting similar speakers who resemble the target speaker in speech out of the plurality of speakers based on the position of the speech data and the plurality of pattern models; and
  
  displaying information about positional relationship between the speech data of the target speaker and pattern models of the similar speakers in the acoustic space based on the position of the speech data and the pattern models of the similar speakers.
- View Dependent Claims (72)
- - 72. The data process method according to claim 71, comprising the steps of:
    - specifying a specific speaker among the plurality of speakers;
      
      evaluating similarity in speech between the specific speaker and the target speaker based on the position of the speech data and the pattern model of the specific speaker in the acoustic space; and
      
      displaying the evaluation results.

73. A data process method comprising:
- preparing an acoustic space composed of a plurality of pattern models generated from speech data of a plurality of speakers;
  
  specifying a specific speaker among the plurality of speakers;
  
  acquiring speech data of a target speaker;
  
  calculating position of the speech data of the target speaker based on the speech data of the target speaker and the plurality of pattern models in the acoustic space;
  
  evaluating similarity in speech between the specific speaker and the target speaker based on the position of the speech data and the pattern model of the specific speaker;
  
  displaying evaluation results; and
  
  displaying information about positional relationship between the speech data of the target speaker and pattern model of the specific speaker in the acoustic space based on the position of the speech data and the pattern model of the specific speaker.

74. A data process unit control program comprising:
- an acoustic space storing step of storing an acoustic space composed of a plurality of pattern models generated from speech data of a plurality of speakers;
  
  a speech data acquiring step of acquiring speech data of a target speaker;
  
  a position calculating step of calculating position of the speech data of the target speaker in the acoustic space based on the speech data of the target speaker and the plurality of pattern models in the acoustic space;
  
  a similar-speaker detecting step of detecting similar speakers who resemble the target speaker in speech out of the plurality of speakers based on the position of the speech data and the plurality of pattern models;
  
  a positional relationship information display step of displaying information about positional relationship between the speech data of the target speaker and pattern models of the similar speakers in the acoustic space based on the position of the speech data and the pattern models of the similar speakers;
  
  a speaker specifying step of specifying a specific speaker;
  
  a similarity evaluating step of evaluating similarity in speech between the specific speaker and the target speaker based on the position of the speech data and the pattern model of the specific speaker in the acoustic space; and
  
  an evaluation result display step of displaying evaluation results produced by the similarity evaluating step, whereinthe positional relationship information display step displays information about positional relationship between the speech data of the target speaker and pattern model of the specific speaker in the acoustic space based on the position of the speech data and the pattern model of the specific speaker.

75. A data process unit control program comprising:
- an acoustic space storing step of storing an acoustic space composed of a plurality of pattern models generated from speech data of a plurality of speakers;
  
  a specific speaker specifying step of specifying a specific speaker among the plurality of speakers;
  
  a speech data acquiring step of acquiring speech data of a target speaker;
  
  a position calculating step of calculating position of the speech data of the target speaker based on the speech data of the target speaker and the plurality of pattern models in the acoustic space;
  
  a similarity evaluating step of evaluating similarity in speech between the specific speaker and the target speaker based on the position of the speech data and the pattern model of the specific speaker;
  
  an evaluation result display step of displaying evaluation results produced by the similarity evaluating step; and
  
  a positional relationship information display step of displaying information about positional relationship between the speech data of the target speaker and pattern model of the specific speaker in the acoustic space based on the position of the speech data and the pattern model of the specific speaker.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Asahi Kasei Kabushiki Kaisha
Original Assignee
Asahi Kasei Kabushiki Kaisha
Inventors
Shozakai, Makoto, Nagino, Goshu

Granted Patent

US 8,606,580 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/243
CPC Class Codes

G06F 18/2163   Partitioning the feature space

G06V 10/765   using rules for classificat...

G06V 40/20   Movements or behaviour, e.g...

G10L 15/063   Training

G10L 15/07   to the speaker

G10L 15/144   Training of HMMs

G10L 21/06   Transformation of speech in...

Data Process unit and data process unit control program

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

143 Citations

75 Claims

Specification

Use Cases

Quick Links

Others

Data Process unit and data process unit control program

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

143 Citations

75 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others