Creating statistical language models for spoken CAPTCHAs
First Claim
1. A method of constructing and training a statistical language model (SLM) to identify machine utterances of text in a system for audio Completely Automated Public Turing Tests to Tell Computers and Humans Apart (CAPTCHA), comprising:
- automatically preparing a plurality of candidate challenge items with a computing system, each of the candidate challenge items including one or more words or phrases selected from a document corpus;
causing selected ones of the plurality of candidate challenge items to be articulated by at least one machine text-to-speech (TTS) system as candidate articulations;
ranking the candidate articulations based on a human listener score attributed to such candidate articulations, which human listener score identifies at least whether a candidate articulation originated from a machine; and
training the SLM to recognize machine TTS articulations based on selecting candidate articulations according to said ranking, such that a subset of said plurality of candidate challenge items identified as originating from a machine are used as a seed set in spoken CAPTCHA.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods for creating statistical language models (SLMs) for spoken Completely Automated Turing Tests for Telling Computers and Humans Apart (CAPTCHAs) are disclosed. In these methods, candidate challenge items including one or more words are automatically selected from a document corpus. Selected ones of the challenge items are articulated by a machine text-to-speech (TTS) system as candidate articulations. Those articulations are ranked based on a human listener score indicating whether a candidate articulation originated from a machine. The SLM is then trained to recognize machine TTS articulations according to those rankings, by using a subset of the plurality of candidate challenge items identified as machine articulations as a seed set.
108 Citations
13 Claims
-
1. A method of constructing and training a statistical language model (SLM) to identify machine utterances of text in a system for audio Completely Automated Public Turing Tests to Tell Computers and Humans Apart (CAPTCHA), comprising:
-
automatically preparing a plurality of candidate challenge items with a computing system, each of the candidate challenge items including one or more words or phrases selected from a document corpus; causing selected ones of the plurality of candidate challenge items to be articulated by at least one machine text-to-speech (TTS) system as candidate articulations; ranking the candidate articulations based on a human listener score attributed to such candidate articulations, which human listener score identifies at least whether a candidate articulation originated from a machine; and training the SLM to recognize machine TTS articulations based on selecting candidate articulations according to said ranking, such that a subset of said plurality of candidate challenge items identified as originating from a machine are used as a seed set in spoken CAPTCHA. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
Specification