Method and system for simulated interactive conversation

US 7,797,146 B2
Filed: 05/13/2003
Issued: 09/14/2010
Est. Priority Date: 05/13/2003
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of simulating interactive communication between a user and a human subject, comprising:

assigning at least one phrase to a stored content sequence, wherein the content sequence comprises a content clip of the subject, the subject being a human recorded on video, the content clip including a contemporaneously-recorded head and mouth of the subject and contemporaneously-recorded audio of the subject, wherein the content clip is free of any superimposed facial features;

parsing the at least one phrase to produce at least one phonetic clone;

associating the at least one phonetic clone with the stored content sequence;

creating a transition between the content clip and the second content sequence by frame-matching a frame of the stored content sequence, the content sequence including the human subject speaking, with a frame of a second content sequence, the frame-matching being performed with respect to the recorded video of the entire head and facial features of the human subject;

receiving an utterance from the user;

matching the utterance to the at least one phonetic clone; and

in response to matching the utterance, displaying the stored content sequence associated with the at least one phonetic clone in succession with the second content sequence.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of simulating interactive communication between a user and a human subject. The method comprises: assigning at least one phrase to a stored content sequence, wherein the content sequence comprises a content clip of the subject; parsing the at least one phrase to produce at least one phonetic clone; associating the at least one phonetic clone with the stored content sequence; receiving an utterance from the user; matching the utterance to the at least one phonetic clone; and displaying the stored content sequence associated with the at least one phonetic clone.

109 Citations

View as Search Results

33 Claims

1. A computer-implemented method of simulating interactive communication between a user and a human subject, comprising:
- assigning at least one phrase to a stored content sequence, wherein the content sequence comprises a content clip of the subject, the subject being a human recorded on video, the content clip including a contemporaneously-recorded head and mouth of the subject and contemporaneously-recorded audio of the subject, wherein the content clip is free of any superimposed facial features;
  
  parsing the at least one phrase to produce at least one phonetic clone;
  
  associating the at least one phonetic clone with the stored content sequence;
  
  creating a transition between the content clip and the second content sequence by frame-matching a frame of the stored content sequence, the content sequence including the human subject speaking, with a frame of a second content sequence, the frame-matching being performed with respect to the recorded video of the entire head and facial features of the human subject;
  
  receiving an utterance from the user;
  
  matching the utterance to the at least one phonetic clone; and
  
  in response to matching the utterance, displaying the stored content sequence associated with the at least one phonetic clone in succession with the second content sequence.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The method of claim 1, wherein parsing the phrase to produce the at least one phonetic clone further comprises:
    - performing a first partial parsing of the phrase to produce at least one first partially parsed phrase;
      
      sub-parsing the at least one first partially parsed phrase to produce at least one first sub-parsed phrase; and
      
      generating at least one phonetic clone from the at least one first sub-parsed phrase.
  - 3. The method of claim 1, wherein parsing the phrase to produce the at least one phonetic clone further comprises:
    - performing a second partial parsing of the phrase to produce at least one second partially parsed phrase;
      
      sub-parsing the at least one second partially parsed phrase to produce at least one second sub-parsed phrase; and
      
      generating at least one phonetic clone from the at least one second partially parsed phrase.
  - 4. The method of claim 1, wherein parsing the phrase to produce the at least one phonetic clone further comprises:
    - selecting a keyword from the phrase;
      
      selecting at least one synonym of the keyword; and
      
      generating at least one phonetic clone of the at least one synonym of the keyword.
  - 5. The method of claim 1, wherein parsing the phrase to produce the at least one phonetic clone further comprises:
    - selecting a qualifier from the phrase;
      
      selecting at least one synonym of the qualifier; and
      
      generating at least one phonetic clone of the at least one synonym of the qualifier.
  - 6. The method of claim 1, wherein matching the utterance to the at least one phonetic clone further comprises:
    - processing the utterance to generate a perceived sound match;
      
      comparing the perceived sound match to the at least one phonetic clone;
      
      performing an arithmetic operation on the at least one phonetic clone and the perceived sound match to generate a result;
      
      comparing the result to a threshold amount; and
      
      if the result is greater than the threshold amount, determining that a match has been found.
  - 7. The method of claim 6, wherein performing the arithmetic operation further comprises:
    - counting the number of letters matched between the perceived sound match and the at least one phonetic clone; and
      
      multiplying the number of letters matched by a priority number to produce a product as the result.
  - 8. The method of claim 1, further comprising:
    - storing a video clip as a content clip; and
      
      following display of the content sequence, displaying a neutral image of the subject.
  - 9. The method of claim 8, further comprising:
    - selecting a begin clip based on frame-matching the last frame of the begin clip with the first frame of the content clip and the first frame of the begin clip with the neutral image of the subject; and
      
      storing the begin clip sequentially prior to the stored content clip within the stored content sequence.
  - 10. The method of claim 8, further comprising:
    - selecting an end clip based on frame-matching the first frame of the end clip with the last frame of the content clip and the last frame of the end clip with the neutral image of the subject; and
      
      storing the end clip sequentially after the stored content clip within the stored content sequence.
  - 11. The method of claim 1, further comprising:
    - displaying a neutral image of the subject when not displaying the content sequence.
  - 12. The method of claim 1, further comprising:
    - storing a listening clip of the subject; and
      
      displaying the listening clip of the subject when not displaying the content sequence.
  - 13. The method of claim 1, further comprising:
    - storing a multimedia object; and
      
      displaying the multimedia object to the user upon request of the user.
  - 14. The method of claim 13, wherein displaying the multimedia object to the user is performed upon matching the utterance of the user to a request to display the multimedia object.
  - 15. The method of claim 13, wherein displaying the multimedia object to the user is performed based on the stored multimedia object being associated with the matched phonetic clone.
  - 16. The method of claim 1, further comprising:
    - receiving a halt command from the user; and
      
      displaying a neutral image of the subject following the halt command.
  - 17. The method of claim 10, comprising:
    - receiving a halt command from the user;
      
      transitioning from the content clip to a neutral image of the subject; and
      
      displaying the neutral image of the subject following the halt command.
  - 18. The method of claim 1, comprising a second human subject and further comprising:
    - assigning at least one second phrase to a stored video clip of the second subject;
      
      parsing the at least one second phrase to produce at least one phonetic clone of the at least one second phrase;
      
      associating the at least one phonetic clone of the at least one second phrase with the stored video clip of the second subject;
      
      receiving an utterance from the user;
      
      matching the utterance to the at least one phonetic clone of the second subject;
      
      comparing the match of the at least one phonetic clone of the second subject with the match of the at least one phonetic clone of the subject; and
      
      if the match of the at least one phonetic clone of the second subject is superior to the match of the at least one phonetic clone of the subject, displaying the stored video clip of the second subject associated with the at least one phonetic clone and not displaying the stored video clip of the subject.
  - 19. The method of claim 18, further comprising displaying the stored content sequence of the subject following the displaying of the stored content sequence of the second subject.

20. A system for simulating interactive communication between a user and a human subject, the system comprising:
- a display for displaying the subject, the subject being a human recorded on video;
  
  a memory; and
  
  a processor, coupled to the memory and the display, the processor operable to;
  
  assign at least one phrase to a stored content sequence of the subject,wherein the content sequence comprises a content clip of the subject, the content sequence including contemporaneously-recorded audio of the human that is played simultaneously with the video of the human, wherein the video of the human includes a contemporaneously-recorded head and mouth of the human, wherein the content clip is free of any superimposed facial features;
  
  parse the at least one phrase to produce at least one phonetic clone of the at least one phrase;
  
  associate the at least one phonetic clone with the stored content sequence;
  
  create a transition between the content clip and the second content sequence by frame-matching a frame of the stored content sequence, the content sequence including the human subject speaking, with a frame of a second content sequence, the frame-matching being performed with respect to the recorded video of the entire head and facial features of the human subject;
  
  receive an utterance from the user;
  
  match the utterance to the at least one phonetic clone; and
  
  in response to the match, display the stored content sequence associated with the at least one phonetic clone in succession with the second content sequence.

21. A computer-implemented method of simulating interactive communication between a user and a human subject, comprising:
- receiving a voice input from the user;
  
  matching the voice input to one of a plurality of a stored phonetic clones, the phonetic clones each corresponding to a target speech phrase associated with a stored content sequence file depicting the subject, the number of stored phonetic clones being greater than the number of stored content sequence files, the subject being a human recorded on video, the content sequence including contemporaneously-recorded audio of the human that is played simultaneously with the video of the human, wherein the video of the human includes a contemporaneously-recorded head and mouth of the human, wherein the content clip is free of any superimposed facial features;
  
  creating a transition between the content clip and the second content sequence by frame-matching a frame of the stored content sequence, the content sequence including the human subject speaking, with a frame of a second content sequence, the frame-matching being performed with respect to the recorded video of the entire head and facial features of the human subject; and
  
  in response to the matching, displaying the stored content sequence file matched to the phonetic clone and the second content sequence in succession.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 22. The method of claim 21, wherein matching the voice input to the phonetic clone further comprises:
    - processing the voice input to generate a perceived sound match;
      
      comparing the perceived sound match to the phonetic clone;
      
      performing an arithmetic operation on the phonetic clone and the perceived sound match to generate a result;
      
      comparing the result to a threshold amount; and
      
      if the result is greater than the threshold amount, determining that a match has been found.
  - 23. The method of claim 21, wherein the target speech phrase is associated with a stored multimedia object and further comprising:
    - displaying the stored multimedia object matched to the phonetic clone.
  - 24. The method of claim 21, further comprising:
    - displaying a stored neutral image of the subject following the display of the stored content sequence file of the subject.
  - 25. The method of claim 21, further comprising:
    - displaying a stored neutral image of the subject prior to the display of the stored content sequence file of the subject.
  - 26. The method of claim 21, further comprising:
    - displaying a stored listening clip of the subject following the display of the stored content sequence file of the subject.
  - 27. The method of claim 21, further comprising:
    - displaying a stored listening clip of the subject prior to the display of the stored content sequence file of the subject.
  - 28. The method of claim 21, further comprising:
    - receiving a second voice input from the user during the display of the stored content sequence file;
      
      matching the second voice input from the user to a stored phonetic clone of a halt utterance; and
      
      terminating the display of the stored content sequence file.
  - 29. The method of claim 28, further comprising:
    - following termination of the display of the stored content sequence file, displaying a stored neutral image of the subject.
  - 30. The method of claim 28, further comprising:
    - following termination of the display of the stored content sequence file, transitioning to a stored neutral image of the subject.

31. A conversation system for simulating interactive communication between a user and a first human subject and a second human subject, comprising:
- a display for displaying the first and second subjects, the first and second subjects being humans recorded on video;
  
  a memory; and
  
  a processor, coupled to the memory and the display, the processor operable to;
  
  receive a voice input from the user;
  
  match the voice input to one of a plurality of a stored phonetic clones, a first portion of the phonetic clones each corresponding to a target speech phrase associated with a stored content sequence file depicting the first subject and a second portion of the phonetic clones each corresponding to a target speech phrase associated with a stored content sequence file depicting the second subject, the number of stored phonetic clones being greater than the number of stored content sequence files, the content sequence including a contemporaneously-recorded head and mouth of the second subject and contemporaneously-recorded audio of the second subject, wherein the content clip is free of any superimposed facial features;
  
  in response to the match, display the stored content sequence file matched to the phonetic clone in succession with a second content sequence; and
  
  frame-match a frame of the stored content sequence with a frame of the second content sequence, the stored content sequence including a human subject speaking, the frame-matching being performed with respect to the recorded video of the entire head and facial features of the second human subject to create a transition for the stored content clip.
- View Dependent Claims (32, 33)
- - 32. The system of claim 31, wherein the processor is further operable to:
    - score the quality of the match between the voice input from the user and a stored phonetic clone corresponding to a target speech phrase associated with a stored content sequence file depicting the first subject, producing a first subject score;
      
      score the quality of the match between the voice input from the user and a stored phonetic clone corresponding to a target speech phrase associated with a stored content sequence file depicting the second subject, producing a second subject score; and
      
      if the first subject score is greater than the second subject score, display the stored content sequence associated with the first subject.
  - 33. The system of claim 31, wherein the processor is further operable to, following the display of the stored content sequence associated with the first subject, display the stored content sequence associated with the second subject.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Interactive Drama Incorporated
Original Assignee
Interactive Drama Incorporated
Inventors
Harless, Michael G., Zier, Marcia A., Harless, William G.
Primary Examiner(s)
Shah; Kamini S
Assistant Examiner(s)
Patel; Shambhavi

Application Number

US10/438,168
Publication Number

US 20040230410A1
Time in Patent Office

2,681 Days
Field of Search

703/12, 345/202, 345/473, 345/474, 345/475, 345/727, 434/185, 434/308, 434/323, 463/35, 704/260, 704/270, 704/275, 704/276, 709/246, 709/206, 709/250
US Class Current

703/12
CPC Class Codes

G10L 15/26 Speech to text systems G10L...

Method and system for simulated interactive conversation

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

109 Citations

33 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for simulated interactive conversation

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

109 Citations

33 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links