Audio human interactive proof based on text-to-speech and semantics

US 10,319,363 B2
Filed: 02/17/2012
Issued: 06/11/2019
Est. Priority Date: 02/17/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented process for providing an automatic human interactive proof, comprising:

selecting a single text sentence from a plurality of text sentences, and a question, wherein the selected single text sentence by itself provides information that is employed by an unknown user to formulate an answer to the question, wherein the single text sentence is different from the question;

creating an audio challenge comprising the selected single text sentence and the question, the audio challenge requiring semantic knowledge of the question to respond;

sending an oral rendition of the audio challenge to the unknown user;

receiving audio data representing a response to the audio challenge from the unknown user; and

analyzing the audio data to determine whether the unknown user is a human or a bot.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The text-to-speech audio HIP technique described herein in some embodiments uses different correlated or uncorrelated words or sentences generated via a text-to-speech engine as audio HIP challenges. The technique can apply different effects in the text-to-speech synthesizer speaking a sentence to be used as a HIP challenge string. The different effects can include, for example, spectral frequency warping; vowel duration warping; background addition; echo addition; and varying the time duration between words, among others. In some embodiments the technique varies the set of parameters to prevent using Automated Speech Recognition tools from using previously used audio HIP challenges to learn a model which can then be used to recognize future audio HIP challenges generated by the technique. Additionally, in some embodiments the technique introduces the requirement of semantic understanding in HIP challenges.

33 Citations

View as Search Results

20 Claims

1. A computer-implemented process for providing an automatic human interactive proof, comprising:
- selecting a single text sentence from a plurality of text sentences, and a question, wherein the selected single text sentence by itself provides information that is employed by an unknown user to formulate an answer to the question, wherein the single text sentence is different from the question;
  
  creating an audio challenge comprising the selected single text sentence and the question, the audio challenge requiring semantic knowledge of the question to respond;
  
  sending an oral rendition of the audio challenge to the unknown user;
  
  receiving audio data representing a response to the audio challenge from the unknown user; and
  
  analyzing the audio data to determine whether the unknown user is a human or a bot.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The computer-implemented process of claim 1, wherein the received response is spoken by the unknown user, and wherein voice recognition is applied to recognize the received response and compare the received response with a correct answer.
  - 3. The computer-implemented process of claim 2, wherein the correct answer is automatically generated.
  - 4. The computer-implemented process of claim 1, wherein the oral rendition of the audio challenge is generated by a text-to-speech engine, and wherein one or more distortions are applied during or post generation of the oral rendition of the audio challenge.
  - 5. The computer-implemented process of claim 4, wherein the one or more distortions are applied during the generation of the oral rendition of the audio challenge and comprises applying spectral frequency warping.
  - 6. The computer-implemented process of claim 4, wherein the one or more distortions are applied during the generation of the oral rendition of the audio challenge and comprises adjusting durations between spoken words.
  - 7. The computer-implemented process of claim 4, wherein the one or more distortions are applied during the generation of the oral rendition of the audio challenge and comprises changing volumes of pronounceable sounds of the oral rendition of the audio challenge, said volume of each pronounceable sound being distorted varying between a maximum volume and a minimum volume.
  - 8. The computer-implemented process of claim 4, wherein the one or more distortions are applied after the generation of the oral rendition of the audio challenge and comprises adding echo.
  - 9. The computer-implemented process of claim 4, wherein the one or more distortions are applied after generation of the oral rendition of the audio challenge and comprises adding sounds as background.
  - 10. The computer-implemented process of claim 9, wherein the added background sounds are an additional speech generated by a text-to-speech synthesizer.
  - 11. The computer-implemented process of claim 10, wherein the additional speech added to the background is speech in a different language as that of the selected text.
  - 12. The computer-implemented process of claim 10, wherein the one or more distortions are added as additional speech are in a form of meaningless speech or recorded audio.
  - 13. The computer-implemented process of claim 4, wherein the one or more distortions applied during the generation comprises adjusting durations of pronounceable sounds during the generation of the voice of the selected text to create the audio challenge.
  - 14. The computer-implemented process of claim 1, wherein the audio challenge question is automatically generated based on a pre-set set of rules.

15. A system for generating an audio-based challenge for an automated human interactive proof, comprising:
- a general purpose computing device;
  
  a computer program comprising program modules executable by the general purpose computing device, wherein the computing device is directed by the program modules of the computer program to,select a single text sentence and a question to be used as a challenge string for which semantic understanding of the challenge string is necessary to respond to the question, wherein the selected single text sentence by itself provides information that is employed by an unknown computer user to formulate an answer to the question, wherein the single text sentence is different from the question;
  
  distort parameters of a speech model of the selected single text sentence and the question with one or more distortions so that the selected single text sentence and the question are distorted when read by a text-to-speech engine;
  
  using the distorted parameters and the speech model, read the selected single text sentence and then the question such that together they form an audio challenge to the unknown computer user using a text-to-speech synthesizer; and
  
  automatically determine if a response received from the unknown computer user matches an expected response, said response being an answer to the audio challenge that a human with semantic knowledge provides and which is not a repeat of the audio challenge itself.

16. A computer-implemented process for determining whether an unknown computer user is a human, comprising:
- using a computer to perform following process actions;
  
  creating an audio challenge comprising a single text sentence and an instruction concerning the single text sentence, said single text sentence by itself providing information that is employed by the unknown computer user to formulate a response to the audio challenge, the audio challenge requiring semantic knowledge thereof to respond, wherein the single text sentence is different from the instruction;
  
  sending an oral rendition of the audio challenge to the unknown computer user;
  
  receiving audio data representing the response to the audio challenge from the unknown computer user;
  
  analyzing the audio data to determine if the response was generated by a human, said response being an answer to the audio challenge that a human with semantic knowledge provides and is not a repeat of the audio challenge itself.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer-implemented process of claim 16, further comprising applying one or more distortions to the oral rendition of the audio challenge.
  - 18. The computer-implemented process of claim 17, wherein the one or more distortions comprise spectral frequency warping, warping of durations between spoken words, adding echo, adding background sounds comprising noise, music, or another speech of the same or different language as of that of the oral rendition of the audio challenge.
  - 19. The computer-implemented process of claim 17, wherein the one or more distortions comprises adjusting durations of pronounceable sounds of the oral rendition of the audio challenge.
  - 20. The computer-implemented process of claim 17, wherein the one or more distortions comprises changing volumes of pronounceable sounds of the oral rendition of the audio challenge, said volume of each pronounceable sound of the oral rendition being distorted varying between a maximum volume and a minimum volume.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Qian, Yao, Soong, Frank Kao-Ping, Zhu, Bin Benjamin
Primary Examiner(s)
Le, Thuykhanh

Application Number

US13/399,496
Publication Number

US 20130218566A1
Time in Patent Office

2,671 Days
Field of Search

704258, 704260
US Class Current
CPC Class Codes

G06F 2221/2133   Verifying human interaction...

G10L 13/033   Voice editing, e.g. manipul...

G10L 15/00   Speech recognition G10L17/0...

G10L 21/003   Changing voice quality, e.g...

Audio human interactive proof based on text-to-speech and semantics

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

33 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Audio human interactive proof based on text-to-speech and semantics

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

33 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links