Speaker model adaptation via network of similar users
First Claim
1. A speech recognition system for recognizing speech input from computer users connected together over a network of computers, a plurality of said computers each including at least one acoustic model trained for a particular user, said system comprising:
- means for comparing acoustic models of one or more computer users, each of said computer users using one of a plurality of computers;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models;
means for modifying each of said acoustic models responsive to user production activities;
means for comparing identified similar acoustic models and, responsive to modification of one or more of said acoustic models, modifying one or more compared said identified similar acoustic models; and
means for transmitting acoustic model data over said network to other computers connected to said network.
3 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition system, method and program product for recognizing speech input from computer users connected together over a network of computers. Speech recognition computer users on the network are clustered into classes of similar users according their similarities, including characteristics nationality, profession, sex, age, etc. Each computer in the speech recognition network includes at least one user based acoustic model trained for a particular user. The acoustic models include an acoustic model domain, with similar acoustic models being clustered according to an identified domain. User characteristics are collected from databases over the network and from users using the speech recognition system and then, distributed over the network during or after user activities. Existing acoustic models are modified in response to user production activities. As recognition progresses, similar language models among similar users are identified on the network. Update information, including information about user activities and user acoustic model data, is transmitted over the network and identified similar language models are updated. Acoustic models improve for users that are connected over the network as similar users use their respective speech recognition system.
257 Citations
26 Claims
-
1. A speech recognition system for recognizing speech input from computer users connected together over a network of computers, a plurality of said computers each including at least one acoustic model trained for a particular user, said system comprising:
-
means for comparing acoustic models of one or more computer users, each of said computer users using one of a plurality of computers;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models;
means for modifying each of said acoustic models responsive to user production activities;
means for comparing identified similar acoustic models and, responsive to modification of one or more of said acoustic models, modifying one or more compared said identified similar acoustic models; and
means for transmitting acoustic model data over said network to other computers connected to said network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
means for identifying an acoustic model domain, similar acoustic models being clustered according to said identified domain.
-
-
3. A speech recognition system as in claim 2, wherein the means for identifying said acoustic model domain comprises means for identifying a domain selected from the group of domains consisting of a telephone speech domain, a speaker independent speech domain, a gender related speech domain, an age related speech domain, a broadcasting speech domain, a noise mixed with speech domain, a music mixed with speech domain, a discrete speech domain and a continuous speech domain.
-
4. A speech recognition system as in claim 2, further comprising:
means for converting speech input from a user into an acoustic model.
-
5. A speech recognition system as in claim 4, wherein the means for converting speech into an acoustic model being selected from the group consisting of:
-
means for converting speech into an acoustic prototype;
means for converting speech into a Hidden Markov Model (HMM) for words;
means for converting speech into a HMM for phones;
means for converting speech into an acoustic rank;
means for converting speech into an acoustic decision tree;
means for converting speech into a weighted mixture of decoding scores;
means for converting speech into a decoding stack threshold;
means for converting speech into a phone duration;
means for converting speech into a word duration;
means for converting speech into a decoding alternative list size; and
means for converting speech into a plurality of signal processing control parameters.
-
-
6. A speech recognition system as in claim 2, further comprising means for receiving user production activities, said means for receiving user production activities being capable of receiving activity selected from the group consisting of dictation, conversation, error correction, sound generation, noise generation and music generation.
-
7. A speech recognition system as in claim 6, further comprising means for identifying and issuing commands, queries and text from said received user production activities.
-
8. A speech recognition system as in claim 7, further comprising:
-
means for converting said commands and queries into textual data; and
means for providing said text and said converted textual data to a supervisor.
-
-
9. A speech recognition system as in claim 2, further comprising:
-
means for maintaining a plurality of user profiles; and
means for extracting acoustic features.
-
-
10. A speech recognition system as in claim 9, wherein the means for maintaining a plurality of user profiles is a server.
-
11. A speech recognition system as in claim 9, wherein the means for extracting acoustic features comprises:
means for extracting acoustic features selected from the group of features consisting of accent, vocal tract characteristics, voice source characteristics, fundamental frequency, running average pitch, running pitch variance, pitch jitter, running energy variance, speech rate, shimmer, fundamental frequency, variation in fundamental frequency and MEL cepstra.
-
12. A speech recognition system as in claim 1, wherein the means for comparing acoustic models comprises means for measuring the distance between acoustic model components, acoustic models having components separated by less than a threshold being identified as similar.
-
13. A speech recognition system as in claim 2, wherein the plurality of computers comprises:
-
at least one server;
at least one personal computer; and
at least one embedded device.
-
-
14. A speech recognition system as in claim 13, wherein at least one embedded device includes at least one personal digital assistant.
-
15. A speech recognition method for recognizing speech from each of a plurality of computer users, said method comprising the steps of:
-
a) clustering computer users coupled together over a network of connected computers into classes of similar users, at least one acoustic model being maintained on a corresponding one of said connected computers for each of said computer users;
b) for each of said classes, identifying similar acoustic models being used by clustered users;
c) modifying one user acoustic model responsive to user production activities by a corresponding clustered user;
d) comparing and adapting all said identified similar acoustic models responsive to modification of said one user acoustic model; and
e) transmitting user data over said network, said transmitted user data including information about user activities and user acoustic model data. - View Dependent Claims (16, 17, 18, 19, 20)
a telephone speech domain;
a speaker independent speech domain;
a gender related speech domain;
an age related speech domain;
a broadcasting speech domain;
a speech mixed with noise domain;
a speech mixed with music domain;
a discrete speech domain; and
a continuous speech domain.
-
-
17. A speech recognition method as in claim 15, wherein the step (a) of clustering users comprises comparing acoustic profile data for connected said users.
-
18. A speech recognition method as in claim 17 wherein said comparison is supervised, said users being classed into a plurality of established classes. identifying users having common speaker domains.
-
19. A speech recognition method as in claim 17 wherein said acoustic profile data includes user sex, age and nationality.
-
20. A speech recognition method as in claim 16, wherein the step (d) of comparing user acoustic models, similar users are identified as users having models with features falling within a specified threshold of each other.
-
21. A computer program product for recognizing speech from each of a plurality of computer users, said computer users using computers coupled together over a network, said computer program product comprising a computer usable medium having computer readable program code thereon, said computer readable program code comprising:
-
computer readable program code means for clustering computer users coupled together over a network of connected computers into classes of similar users, at least one acoustic model being maintained on a corresponding one of said connected computers for each of said computer users;
computer readable program code means for identifying similar acoustic models being used by clustered users for each of said classes;
computer readable program code means for modifying one user acoustic model responsive to user production activities by a corresponding clustered user;
computer readable program code means for comparing and adapting all said identified similar acoustic models responsive to modification of said one user acoustic model; and
computer readable program code means for transmitting user data over said network, said transmitted user data including information about user activities and user acoustic model data. - View Dependent Claims (22, 23, 24, 25, 26)
a telephone speech domain;
a speaker independent speech domain;
a gender related speech domain;
an age related speech domain;
a broadcasting speech domain;
a speech mixed with noise domain;
a speech mixed with music domain;
a discrete speech domain; and
a continuous speech domain.
-
-
23. A computer program product as in claim 21, wherein the computer readable code means for clustering users comprises computer readable code means for comparing acoustic profile data for connected said users.
-
24. A computer program product as in claim 23 wherein said comparison is supervised, said users being classed into a plurality of established classes, identifying users having common speaker domains.
-
25. A computer program product as in claim 23, wherein said acoustic profile data includes user sex, age and nationality.
-
26. A computer program product as in claim 22, wherein the computer readable code means for comparing individual user acoustic models, compares similar users having models with features falling within a specified threshold of each other.
Specification