Information processing method, information processing device, and recording medium for determining registered speakers as target speakers in speaker recognition
First Claim
1. An information processing method performed by a computer, the information processing method comprising:
- detecting at least one speech segment from speech utterances that are sequentially input to a speech input unit;
extracting, from each of the at least one speech segment, a first feature quantity identifying a speaker whose voice is contained in the speech segment;
performing a comparison between the first feature quantity extracted and each of second feature quantities stored in a second storage as targets in speaker recognition for identifying respective voices of registered speakers, the second feature quantities being among second feature quantities pre-stored in a first storage and identifying respective voices of registered speakers;
performing a parsing and management of the registered speakers in the second storage, based on results of the comparison, which is performed for each consecutive speech segment of the at least one speech segment, of;
deleting, from the second storage, at least one second feature quantity among the second features quantities when a degree of similarity between the first feature quantity in the consecutive speech segments, which is present for a fixed period of time or for a fixed number of times, and the at least one second feature quantity stored in the second storage is less than or equal to a threshold and a predetermined condition is satisfied, to remove at least one registered speaker identified by the at least one second feature quantity from the registered speakers stored in the second storage and reduce a total number of registered speakers as target speakers for speaker recognition; and
when a first feature quantity having a degree of similarity between the first feature quantity and each of the second feature quantities stored in the second storage, which is less than or equal to a threshold, appears among first feature quantities in speech segments that follow the consecutive speech segments,storing, in the second storage, a second feature quantity having a degree of similarity between the first feature quantity that appeared among the first feature quantities and the second feature quantities stored in the first storage that is greater than a threshold, based on comparing the first feature quantity that appeared among the first features quantities and each of the second feature quantities stored in the first storage; and
adding, to the second storage, the first feature quantity that appeared among the first feature quantities as a feature quantity identifying a voice of a new registered speaker when a degree of similarity between the first feature quantity that appeared among the first features quantities and each of the second feature quantities stored in the first storage is less than or equal to a threshold based on a result of comparing the first feature quantity that appeared among the first features quantities and each of the second feature quantities stored in the first storage, to increase the total number of registered speakers who are target speakers for speaker recognition.
1 Assignment
0 Petitions
Accused Products
Abstract
The information processing method in the present disclosure is performed as below. At least one speech segment is detected from speech input to a speech input unit. A first feature quantity is extracted from each speech segment detected, the first feature quantity identifying a speaker whose voice is contained in the speech segment. The first feature quantity extracted is compared with each of second feature quantities stored in storage and identifying the respective voices of registered speakers who are target speakers in speaker recognition. The comparison is performed for each of consecutive speech segments, and under a predetermined condition, among the second feature quantities stored in the storage, at least one second feature quantity whose similarity with the first feature quantity is less than or equal to a threshold is deleted, thereby removing the at least one registered speaker identified by the at least one second feature quantity.
14 Citations
11 Claims
-
1. An information processing method performed by a computer, the information processing method comprising:
-
detecting at least one speech segment from speech utterances that are sequentially input to a speech input unit; extracting, from each of the at least one speech segment, a first feature quantity identifying a speaker whose voice is contained in the speech segment; performing a comparison between the first feature quantity extracted and each of second feature quantities stored in a second storage as targets in speaker recognition for identifying respective voices of registered speakers, the second feature quantities being among second feature quantities pre-stored in a first storage and identifying respective voices of registered speakers; performing a parsing and management of the registered speakers in the second storage, based on results of the comparison, which is performed for each consecutive speech segment of the at least one speech segment, of; deleting, from the second storage, at least one second feature quantity among the second features quantities when a degree of similarity between the first feature quantity in the consecutive speech segments, which is present for a fixed period of time or for a fixed number of times, and the at least one second feature quantity stored in the second storage is less than or equal to a threshold and a predetermined condition is satisfied, to remove at least one registered speaker identified by the at least one second feature quantity from the registered speakers stored in the second storage and reduce a total number of registered speakers as target speakers for speaker recognition; and when a first feature quantity having a degree of similarity between the first feature quantity and each of the second feature quantities stored in the second storage, which is less than or equal to a threshold, appears among first feature quantities in speech segments that follow the consecutive speech segments, storing, in the second storage, a second feature quantity having a degree of similarity between the first feature quantity that appeared among the first feature quantities and the second feature quantities stored in the first storage that is greater than a threshold, based on comparing the first feature quantity that appeared among the first features quantities and each of the second feature quantities stored in the first storage; and adding, to the second storage, the first feature quantity that appeared among the first feature quantities as a feature quantity identifying a voice of a new registered speaker when a degree of similarity between the first feature quantity that appeared among the first features quantities and each of the second feature quantities stored in the first storage is less than or equal to a threshold based on a result of comparing the first feature quantity that appeared among the first features quantities and each of the second feature quantities stored in the first storage, to increase the total number of registered speakers who are target speakers for speaker recognition. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An information processing device comprising:
-
a non-transitory computer-readable recording medium configured to store a program thereon; and a hardware processor configured to execute the program and cause the information processing device to; detect at least one speech segment from speech utterances that are sequentially input; extract, from each of the at least one speech segment, a first feature quantity identifying a speaker whose voice is contained in the speech segment; perform a comparison between the first feature quantity extracted and each of second feature quantities stored in a second storage as targets in speaker recognition for identifying respective registered speakers, the second feature quantities being among second feature quantities pre-stored in a first storage and identifying respective voices of registered speakers; perform a parsing and management of the registered speakers in the second storage, based on results of the compassion, which is performed for each consecutive speech segment of the at least one speech segment, and which includes to; remove at least one registered speaker identified by at least one second feature quantity from the registered speakers stored in the second storage when a degree of similarity between the first feature quantity in the consecutive speech segments, which is present for a fixed period of time or for a fixed number of times, and the at least one second feature quantity stored in the second storage is less than or equal to a threshold and a predetermined condition is satisfied, to reduce a total number of registered speakers as target speakers for speaker recognition; and when a first feature quantity having a degree of similarity between the first feature quantity and each of the second feature quantities stored in the second storage, which is less than or equal to a threshold, appears among first feature quantities in speech segments that follow the consecutive speech segments, storing, in the second storage, a second feature quantity having a degree of similarity between the first feature quantity that appeared among the first feature quantities and the second feature quantities stored in the first storage that is greater than a threshold, based on comparing the first feature quantity that appeared among the first features quantities and each of the second feature quantities stored in the first storage; and add, to the second storage, the first feature quantity that appeared among the first feature quantities as a feature quantity identifying a voice of a new registered speaker when a degree of similarity between the first feature quantity and each of the second feature quantities stored in the storage is less than or equal to a threshold based on a result of comparing the first feature quantity that appeared among the first features quantities and each of the second feature quantities stored in the first storage, to increase a total number of registered speakers who are target speakers for speaker recognition.
-
-
11. A non-transitory computer-readable recording medium for use in a computer, the recording medium having a program recorded thereon for causing the computer to perform an information processing method, the information processing method comprising:
-
detecting at least one speech segment from speech utterances that are sequentially input to a speech input unit; extracting, from each of the at least one speech segment, a first feature quantity identifying a speaker whose voice is contained in the speech segment; performing a comparison between the first feature quantity extracted and each of second feature quantities stored in a second storage as targets in speaker recognition for identifying respective voices of registered speakers, the second feature quantities being among second feature quantities pre-stored in a first storage and identifying respective voices of registered speakers; performing a parsing and management of the registered speakers in the second storage, based on results of the comparison, which is performed for each consecutive speech segment of the at least one speech segment, of; deleting, from the second storage, at least one second feature quantity among the second features quantities when a degree of similarity between the first feature quantity in the consecutive speech segments, which is present for a fixed period of time or for a fixed number of times, and the at least one second feature quantity stored in the second storage is less than or equal to a threshold and a predetermined condition is satisfied, to remove at least one registered speaker identified by the at least one second feature quantity from the registered speakers stored in the second storage and reduce a total number of registered speakers as target speakers for speaker recognition; and when a first feature quantity having a degree of similarity between the first feature quantity and each of the second feature quantities stored in the second storage, which is less than or equal to a threshold, appears among first feature quantities in speech segments that follow the consecutive speech segments, storing, in the second storage, a second feature quantity having a degree of similarity between the first feature quantity that appeared among the first feature quantities and the second feature quantities stored in the first storage that is greater than a threshold, based on comparing the first feature quantity that appeared among the first features quantities and each of the second feature quantities stored in the first storage; and adding, to the second storage, the first feature quantity that appeared among the first feature quantities as a feature quantity identifying a voice of a new registered speaker when a degree of similarity between the first feature quantity that appeared among the first features quantities and each of the second feature quantities stored in the first storage is less than or equal to a threshold based on a result of comparing the first feature quantity that appeared among the first features quantities and each of the second feature quantities stored in the first storage, to increase the total number of registered speakers who are target speakers for speaker recognition.
-
Specification