Speaker identification method and speaker identification device

US 9,947,324 B2
Filed: 04/16/2016
Issued: 04/17/2018
Est. Priority Date: 04/22/2015
Status: Active Grant

First Claim

Patent Images

1. A speaker identification method, comprising:

executing, using a processor, learning mode processing using a first database, in which a plurality of unspecified speakers and a plurality of unspecified speaker models obtained by modeling features of voices of the plurality of unspecified speakers are associated and stored, is used to create a second database, in which first speakers who are not stored in the first database and a plurality of the unspecified speaker models are associated and stored; and

executing, using the processor, identification mode processing in which the second database is used to identify a second speaker,wherein, in the executing of the learning mode processing,a voice signal of each of the first speakers is acquired,first similarity degrees between a feature value in the acquired voice signal of each of the first speakers and each feature value in the plurality of unspecified speaker models stored in the first database are calculated,a plurality of the unspecified speaker models for which the calculated first similarity degrees are equal to or greater than a prescribed value are specified, andeach of the first speakers and the specified plurality of unspecified speaker models are associated and stored in the second database, andin the executing of the identification mode processing,a voice signal of the second speaker is acquired,a plurality of second similarity degrees between a feature value in the acquired voice signal of the second speaker and each feature value in the plurality of unspecified speaker models associated with the first speakers and stored in the second database are calculated for each of the first speakers,the calculated plurality of second similarity degrees are corrected by multiplying each of the plurality of second similarity degrees by a weighting value that corresponds to a ranking of the first similarity degrees,a total value obtained by totaling the corrected plurality of second similarity degrees is calculated for each of the first speakers, andone of the first speakers stored in the second database who corresponds to the second speaker is specified based on the calculated total values.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A first similarity degree processor calculates first similarity degrees between a feature value in voice signal of each of first speakers and each feature value in a plurality of unspecified speaker models of a plurality of unspecified speakers. The processor specifies a plurality of the unspecified speaker models for which the first similarity degrees are equal to or greater than a prescribed value. The processor also associates and stores each of the first speakers and the specified unspecified speaker models. Additionally, the processor calculates, for each of the first speakers, a plurality of second similarity degrees between a feature value in a voice signal of a second speaker and each feature values in the unspecified speaker models associated with the first speakers and stored in a second speaker model storage. The processor further specifies the second speaker based on the second similarity degrees.

10 Citations

View as Search Results

10 Claims

1. A speaker identification method, comprising:
- executing, using a processor, learning mode processing using a first database, in which a plurality of unspecified speakers and a plurality of unspecified speaker models obtained by modeling features of voices of the plurality of unspecified speakers are associated and stored, is used to create a second database, in which first speakers who are not stored in the first database and a plurality of the unspecified speaker models are associated and stored; and
  
  executing, using the processor, identification mode processing in which the second database is used to identify a second speaker,wherein, in the executing of the learning mode processing,a voice signal of each of the first speakers is acquired,first similarity degrees between a feature value in the acquired voice signal of each of the first speakers and each feature value in the plurality of unspecified speaker models stored in the first database are calculated,a plurality of the unspecified speaker models for which the calculated first similarity degrees are equal to or greater than a prescribed value are specified, andeach of the first speakers and the specified plurality of unspecified speaker models are associated and stored in the second database, andin the executing of the identification mode processing,a voice signal of the second speaker is acquired,a plurality of second similarity degrees between a feature value in the acquired voice signal of the second speaker and each feature value in the plurality of unspecified speaker models associated with the first speakers and stored in the second database are calculated for each of the first speakers,the calculated plurality of second similarity degrees are corrected by multiplying each of the plurality of second similarity degrees by a weighting value that corresponds to a ranking of the first similarity degrees,a total value obtained by totaling the corrected plurality of second similarity degrees is calculated for each of the first speakers, andone of the first speakers stored in the second database who corresponds to the second speaker is specified based on the calculated total values.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The speaker identification method according to claim 1, wherein the weighting value increases as the first similarity degrees increase.
  - 3. The speaker identification method according to claim 1,wherein a total value obtained by totaling the plurality of second similarity degrees that are equal to or greater than a prescribed value from among the calculated plurality of second similarity degrees is calculated for each of the first speakers, and the one of the first speakers stored in the second database who corresponds to the second speaker is specified based on the calculated total values.
  - 4. The speaker identification method according to claim 1,wherein the second speaker is specified as the one of the first speakers stored in the second database having the highest calculated total value.
  - 5. The speaker identification method according to claim 1,wherein, in the executing of the learning mode processing,speaker models corresponding to the first speakers are newly created based on the specified plurality of unspecified speaker models and the acquired voice signals of the first speakers, andthe created speaker models are associated with the first speakers and stored in a third database, andin the executing of the identification mode processing,for each first speaker, a third similarity degree between a feature value in the acquired voice signal of the second speaker and a feature value in the speaker model associated with the first speaker and stored in the third database is calculated, andone of the first speakers stored in the third database who corresponds to the second speaker is specified based on the calculated third similarity degrees.
  - 6. The speaker identification method according to claim 5,wherein, in a case where the second speaker is not specified as being any of the first speakers stored in the third database, the plurality of second similarity degrees between the feature value in the acquired voice signal of the second speaker and each feature value in the plurality of unspecified speaker models associated with the first speakers and stored in the second database are calculated for each of the first speakers, andthe one of the first speakers stored in the second database who corresponds to the second speaker is specified based on the calculated plurality of second similarity degrees.
  - 7. The speaker identification method according to claim 1,wherein, after the identification mode processing has been performed, the first similarity degrees corresponding to each of the unspecified speaker models calculated in the learning mode processing and the second similarity degrees corresponding to each of the unspecified speaker models calculated in the identification mode processing are compared, andin a case where there is a prescribed number or more of the unspecified speaker models for which a difference between the first similarity degrees and the second similarity degrees is equal to or greater than a prescribed value, the learning mode processing is performed again.
  - 8. The speaker identification method according to claim 1,wherein, after the identification mode processing has been performed, the first similarity degrees corresponding to each of the unspecified speaker models calculated in the learning mode processing and the second similarity degrees corresponding to each of the unspecified speaker models calculated in the identification mode processing are compared, andin a case where there is a prescribed number or more of the unspecified speaker models for which a difference between the first similarity degrees and the second similarity degrees is equal to or greater than a prescribed value, the first similarity degrees corresponding to the unspecified speaker models stored in the second database for which the difference is equal to or greater than the prescribed value are amended to the second similarity degrees calculated in the identification mode processing.

9. A speaker identification device comprising:
- a processor; and
  
  a memory that stores a program,wherein the program causes the processor toexecute a learning mode processing using a first database, in which a plurality of unspecified speakers and a plurality of unspecified speaker models obtained by modeling features of voices of the plurality of unspecified speakers are associated and stored, to create a second database, in which first speakers who are not stored in the first database and a plurality of the unspecified speaker models are associated and stored andexecute an identification mode processing using the second database to identify a second speaker,wherein, in the execution of the learning mode processing,a voice signal of each of the first speakers is acquired,first similarity degrees between a feature value in the voice signal of each of the first speakers acquired by the first voice acquirer and each feature value in the plurality of unspecified speaker models stored in the first database are calculated,a plurality of the unspecified speaker models for which the first similarity degrees calculated by the first similarity degree calculator are equal to or greater than a prescribed value are specified, andeach of the first speakers and the plurality of unspecified speaker models specified by the first specifier in the second database are associated and stored in the second database, andin the execution of the identification mode processing,a voice signal of the second speaker is acquired,a plurality of second similarity degrees between a feature value in the voice signal of the second speaker acquired by the second voice acquirer and each feature value in the plurality of unspecified speaker models associated with the first speakers and stored in the second database are calculated for each of the first speakers,the calculated plurality of second similarity degrees are corrected by multiplying each of the plurality of second similarity degrees by a weighting value that corresponds to a ranking of the first similarity degrees,a total value obtained by totaling the corrected plurality of second similarity degrees is calculated for each of the first speakers, andone of the first speakers stored in the second database who corresponds to the second speaker is specified based on the calculated total values.

10. A speaker identification method, comprising:
- executing, using a processor, learning mode processing using a first database, in which a plurality of unspecified speakers and a plurality of unspecified speaker models obtained by modeling features of voices of the plurality of unspecified speakers are associated and stored, is used to create a second database, in which first speakers who are not stored in the first database and a plurality of the unspecified speaker models are associated and stored; and
  
  executing, using the processor, identification mode processing in which the second database is used to identify a second speaker,wherein, in the executing of the learning mode processing,a voice signal of each of the first speakers is acquired,first similarity degrees between a feature value in the acquired voice signal of each of the first speakers and each feature value in the plurality of unspecified speaker models of a plurality of the unspecified speakers who are different from the first speakers and are stored in the first database are calculated,a plurality of the unspecified speaker models for which the calculated first similarity degrees are equal to or greater than a prescribed value are specified,a speaker model corresponding to each of the first speakers is newly created based on the specified plurality of the unspecified speaker models and the acquired voice signals of the first speakers, andthe created speaker model is associated with the first speakers and stored in the second database, andin the executing of the identification mode processing,a voice signal of the second speaker is acquired,a plurality of second similarity degrees between a feature value in the acquired voice signal of the second speaker and feature values in the speaker models associated with the first speakers and stored in the second database are calculated for each of the first speakers,the calculated plurality of second similarity degrees are corrected by multiplying each of the plurality of second similarity degrees by a weighting value that corresponds to a ranking of the first similarity degrees,a total value obtained by totaling the corrected plurality of second similarity degrees is calculated for each of the first speakers, andone of the first speakers stored in the second database who corresponds to the second speaker is specified based on the calculated total values.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panasonic Corporation (Panasonic Holdings Corporation)
Original Assignee
Panasonic Corporation (Panasonic Holdings Corporation)
Inventors
Tsujikawa, Misaki, Matsui, Tomoko
Primary Examiner(s)
Saint Cyr, Leonard

Application Number

US15/130,944
Publication Number

US 20160314790A1
Time in Patent Office

731 Days
Field of Search

704239, 704246, 704247, 704251, 704252
US Class Current
CPC Class Codes

G10L 17/04 Training, enrolment or mode...

G10L 17/06 Decision making techniques;...

Speaker identification method and speaker identification device

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

10 Citations

10 Claims

Specification

Use Cases

Quick Links

Others

Speaker identification method and speaker identification device

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

10 Citations

10 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others