Speech data collection over the world wide web
First Claim
Patent Images
1. A computerized method for collecting speech processing model training data using the Internet, comprising the steps of:
- enabling client computers connected to the Internet to acquire speech signals and information characterizing the speech signals using Web pages;
storing addresses of the client computers in a list in a memory of a Web server computer;
selecting from the list, based upon predetermined criteria, some of the enabled client computers to acquire the speech signals and information characterizing the speech signals using the Web pages; and
transmitting from at least one of the selected client computers, the acquired speech signals and information to the Web server computer, said Web server computer using the acquired and transmitted speech signals and information to generate and train speech processing models;
the client computers are selected on the basis of Web domains, the Web domains are associated with specific linguistic groupings.
3 Assignments
0 Petitions
Accused Products
Abstract
In a computerized method for collecting speech data, Web pages of client computers connected to the Internet are enabled to acquire speech signal and information characterizing the speech. The addresses of the enabled Web pages are stored in a list in a memory of a Web server computer. Based on predetermined criteria and the list, some of the enabled client computers are selected to acquire the speech signal and information. The acquired speech signal and information are transmitted to the server computer to generate, train, and evaluate acoustic-phonetic models.
58 Citations
12 Claims
-
1. A computerized method for collecting speech processing model training data using the Internet, comprising the steps of:
-
enabling client computers connected to the Internet to acquire speech signals and information characterizing the speech signals using Web pages; storing addresses of the client computers in a list in a memory of a Web server computer; selecting from the list, based upon predetermined criteria, some of the enabled client computers to acquire the speech signals and information characterizing the speech signals using the Web pages; and transmitting from at least one of the selected client computers, the acquired speech signals and information to the Web server computer, said Web server computer using the acquired and transmitted speech signals and information to generate and train speech processing models;
the client computers are selected on the basis of Web domains, the Web domains are associated with specific linguistic groupings. - View Dependent Claims (2, 3, 4)
-
-
5. Computer method for training acoustic-phonetic models using speech data collected over the Internet, comprising the steps of:
-
using Web pages, enabling client computers connected to the Internet to acquire speech signals and information characterizing the speech signals; storing addresses of the client computers in a list in a memory of a Web server computer; selecting from the list, based upon predetermined criteria, some of the enabled client computers to acquire the speech signals and information characterizing the speech signals using the Web pages; transmitting from at least one of the selected client computers, the acquired speech signals and information to the Web server computer; and using the acquired and transmitted speech signals and information collected at the Web server computer, to generate and train acoustic-phonetic models of a speech processing system;
selecting client computers on the basis of at least one of Web domain and linguistic groupings. - View Dependent Claims (6, 7)
-
-
8. Computer apparatus for collecting speech data over the Internet and training speech processing models with said collected speech data, comprising:
-
a plurality of client computers connected to the Internet, each client computer having a respective Web Page enabled to acquire speech signals and information characterizing the speech signals; and a Web server computer coupled across the Internet for communicating with the client computers, said Web server computer making requests of certain client computers for speech signals and information characterizing the speech signals, in response to each request from the Web server computer, said respective certain client computers transmitting acquired speech signals and information to the Web server computer for use in training speech processing models; the Web server computer selects the certain client computers on the basis of Web domains, the Web domains are associated with specific linguistic groupings. - View Dependent Claims (9, 10, 11, 12)
-
Specification