Audio human interactive proof based on text-to-speech and semantics
First Claim
1. A computer-implemented process for providing an automatic human interactive proof, comprising:
- selecting a single text sentence from a plurality of text sentences, and a question, wherein the selected single text sentence by itself provides information that is employed by an unknown user to formulate an answer to the question, wherein the single text sentence is different from the question;
creating an audio challenge comprising the selected single text sentence and the question, the audio challenge requiring semantic knowledge of the question to respond;
sending an oral rendition of the audio challenge to the unknown user;
receiving audio data representing a response to the audio challenge from the unknown user; and
analyzing the audio data to determine whether the unknown user is a human or a bot.
2 Assignments
0 Petitions
Accused Products
Abstract
The text-to-speech audio HIP technique described herein in some embodiments uses different correlated or uncorrelated words or sentences generated via a text-to-speech engine as audio HIP challenges. The technique can apply different effects in the text-to-speech synthesizer speaking a sentence to be used as a HIP challenge string. The different effects can include, for example, spectral frequency warping; vowel duration warping; background addition; echo addition; and varying the time duration between words, among others. In some embodiments the technique varies the set of parameters to prevent using Automated Speech Recognition tools from using previously used audio HIP challenges to learn a model which can then be used to recognize future audio HIP challenges generated by the technique. Additionally, in some embodiments the technique introduces the requirement of semantic understanding in HIP challenges.
33 Citations
20 Claims
-
1. A computer-implemented process for providing an automatic human interactive proof, comprising:
-
selecting a single text sentence from a plurality of text sentences, and a question, wherein the selected single text sentence by itself provides information that is employed by an unknown user to formulate an answer to the question, wherein the single text sentence is different from the question; creating an audio challenge comprising the selected single text sentence and the question, the audio challenge requiring semantic knowledge of the question to respond; sending an oral rendition of the audio challenge to the unknown user; receiving audio data representing a response to the audio challenge from the unknown user; and analyzing the audio data to determine whether the unknown user is a human or a bot. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system for generating an audio-based challenge for an automated human interactive proof, comprising:
-
a general purpose computing device; a computer program comprising program modules executable by the general purpose computing device, wherein the computing device is directed by the program modules of the computer program to, select a single text sentence and a question to be used as a challenge string for which semantic understanding of the challenge string is necessary to respond to the question, wherein the selected single text sentence by itself provides information that is employed by an unknown computer user to formulate an answer to the question, wherein the single text sentence is different from the question; distort parameters of a speech model of the selected single text sentence and the question with one or more distortions so that the selected single text sentence and the question are distorted when read by a text-to-speech engine; using the distorted parameters and the speech model, read the selected single text sentence and then the question such that together they form an audio challenge to the unknown computer user using a text-to-speech synthesizer; and automatically determine if a response received from the unknown computer user matches an expected response, said response being an answer to the audio challenge that a human with semantic knowledge provides and which is not a repeat of the audio challenge itself.
-
-
16. A computer-implemented process for determining whether an unknown computer user is a human, comprising:
using a computer to perform following process actions; creating an audio challenge comprising a single text sentence and an instruction concerning the single text sentence, said single text sentence by itself providing information that is employed by the unknown computer user to formulate a response to the audio challenge, the audio challenge requiring semantic knowledge thereof to respond, wherein the single text sentence is different from the instruction; sending an oral rendition of the audio challenge to the unknown computer user; receiving audio data representing the response to the audio challenge from the unknown computer user; analyzing the audio data to determine if the response was generated by a human, said response being an answer to the audio challenge that a human with semantic knowledge provides and is not a repeat of the audio challenge itself. - View Dependent Claims (17, 18, 19, 20)
Specification