COMPUTER GENERATED EMULATION OF A SUBJECT

US 20150052084A1
Filed: 08/13/2014
Published: 02/19/2015
Est. Priority Date: 08/16/2013
Status: Active Grant

First Claim

Patent Images

1. A system for emulating a subject, to allow a user to interact with a computer generated talking head with the subject'"'"'s face and voice;

said system comprising a processor, a user interface and a personality storage section,the user interface being configured to emulate the subject, by displaying a talking head which comprises the subject'"'"'s face and output speech from the mouth of the face with the subject'"'"'s voice, the user interface further comprising a receiver for receiving a query from the user, the emulated subject being configured to respond to the query received from the user,the processor comprising a dialogue section and a talking head generation section,wherein said dialogue section is configured to generate a response to a query inputted by a user from the user interface and generate a response to be outputted by the talking head, the response being generated by retrieving information from said personality storage section, said personality storage section comprising content created by or about the subject,and said talking head generation section is configured to;

convert said response into a sequence of acoustic units, the talking head generation section further comprising a statistical model, said statistical model comprising a plurality of model parameters, said model parameters being derived from said personality storage section, the model parameters describing probability distributions which relate an acoustic unit to an image vector and speech vector, said image vector comprising a plurality of parameters which define the subject'"'"'s face and said speech vector comprising a plurality of parameters which define the subject'"'"'s voice, the talking head generation section being further configured to output a sequence of speech vectors and image vectors which are synchronised such that the head appears to talk.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for emulating a subject, to allow a user to interact with a computer generated talking head with the subject'"'"'s face and voice;

- said system comprising a processor, a user interface and a personality storage section,
- the user interface being configured to emulate the subject, by displaying a talking head which comprises the subject'"'"'s face and output speech from the mouth of the face with the subject'"'"'s voice, the user interface further comprising a receiver for receiving a query from the user, the emulated subject being configured to respond to the query received from the user,
- the processor comprising a dialogue section and a talking head generation section,
- wherein said dialogue section is configured to generate a response to a query inputted by a user from the user interface and generate a response to be outputted by the talking head, the response being generated by retrieving information from said personality storage section, said personality storage section comprising content created by or about the subject,
- and said talking head generation section is configured to:
- convert said response into a sequence of acoustic units, the talking head generation section further comprising a statistical model, said statistical model comprising a plurality of model parameters, said model parameters being derived from said personality storage section, the model parameters describing probability distributions which relate an acoustic unit to an image vector and speech vector, said image vector comprising a plurality of parameters which define the subject'"'"'s face and said speech vector comprising a plurality of parameters which define the subject'"'"'s voice, the talking head generation section being further configured to output a sequence of speech vectors and image vectors which are synchronised such that the head appears to talk.

Citations

20 Claims

1. A system for emulating a subject, to allow a user to interact with a computer generated talking head with the subject'"'"'s face and voice;
- said system comprising a processor, a user interface and a personality storage section,the user interface being configured to emulate the subject, by displaying a talking head which comprises the subject'"'"'s face and output speech from the mouth of the face with the subject'"'"'s voice, the user interface further comprising a receiver for receiving a query from the user, the emulated subject being configured to respond to the query received from the user,the processor comprising a dialogue section and a talking head generation section,wherein said dialogue section is configured to generate a response to a query inputted by a user from the user interface and generate a response to be outputted by the talking head, the response being generated by retrieving information from said personality storage section, said personality storage section comprising content created by or about the subject,and said talking head generation section is configured to;
  
  convert said response into a sequence of acoustic units, the talking head generation section further comprising a statistical model, said statistical model comprising a plurality of model parameters, said model parameters being derived from said personality storage section, the model parameters describing probability distributions which relate an acoustic unit to an image vector and speech vector, said image vector comprising a plurality of parameters which define the subject'"'"'s face and said speech vector comprising a plurality of parameters which define the subject'"'"'s voice, the talking head generation section being further configured to output a sequence of speech vectors and image vectors which are synchronised such that the head appears to talk.
- View Dependent Claims (2, 3, 4, 5, 6, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. A system according to claim 1, wherein the content created by or about the subject comprises posts collected from social media websites, e-mails and other content from or about the subject which has been provided to the personality storage section.
  - 3. A system according to claim 1, wherein the dialogue section is configured to navigate a set of rules stored in said personality storage section to generate the response.
  - 4. A system according to claim 1, wherein the dialogue section is configured to retrieve a response from said personality storage section by searching information which has been stored in said personality storage section in an unstructured form.
  - 5. A system according to claim 4, wherein said dialogue section is configured to search said information stored in a non-hierarchical form using a word-vector or n-gram search model.
  - 6. A system according to claim 1, wherein the dialogue section is configured to interpret said query and based on said interpretation select to generate said response using a set of rules stored in said personality storage section or by searching information stored in an unstructured form.
  - 11. A system according to claim 1, configured to output an expressive response such that said face and voice demonstrate expression, said processor further comprising an expression deriving section configured to determine the expression with which to output the generated response, and wherein the said model parameters describe probability distributions which relate an acoustic unit to an image vector and speech vector for an associated expression.
  - 12. A system according to claim 11, wherein the model parameter in each probability distribution in said associated expression is expressed as a weighted sum of parameters of the same type, and wherein the weighting used is expression dependent, such that converting said sequence of acoustic units to a sequence of image vectors comprises retrieving the expression dependent weights for said selected expression.
  - 13. A system according to claim 12, wherein the parameters are provided in clusters and each cluster comprises at least one sub-cluster, wherein said expression dependent weights are retrieved for each cluster such that there is one weight per sub-cluster.
  - 14. A system according to claim 11, wherein said expression deriving section is configured to extract expressive features from said response to form an expressive linguistic feature vector constructed in a first space and map said expressive linguistic feature vector to an expressive synthesis feature vector that is constructed in a second space, said expressive linguistic feature vector being related to the model parameters of said acoustical model.
  - 15. A system according to claim 14, wherein said expression deriving section is configured to extract expressive features from said response to form an expressive linguistic feature vector constructed in a first space and map said expressive linguistic feature vector to the said expression dependent weights.
  - 16. A system according to claim 1, wherein said image vector comprises parameters which allow the face to be constructed from a weighted sum of modes using weighting parameters, and wherein the modes represent reconstructions of a face or part thereof.
  - 17. A system according to claim 16, wherein the modes comprise modes to represent shape and appearance of the face.
  - 18. A system according to claim 16, wherein the same weighting parameter is used for a shape mode and its corresponding appearance mode.
  - 19. A system for generating a personality file, said personality file being used to store information relating to the speech, face and dialogue intelligence of a subject such that the subject can be emulated using a system in accordance with claim 1, said personality file being stored in said personality storage section,the system for generating a personality file comprising:
    - an interface for inputting information identifying content created by or about the subject;
      
      an audio-visual recording system configured to record the voice and face of a subject, when reading known text, while using a range of different emotions;
      
      and a processor being configured to;
      
      curate said information identifying content created by or about said user, said curation comprising organising said content into documents and building an n-gram language model for said documents and also a word vector model for each document; and
      
      produce a statistical model, said statistical model comprising a plurality of model parameters describing probability distributions which relate an acoustic unit to an image vector and speech vector, said image vector comprising a plurality of parameters which define the subject'"'"'s face and said speech vector comprising a plurality of parameters which define the subject'"'"'s voice,the processor being configured to train said statistical model such that a sequence of speech vectors and image vectors which are synchronised when outputted cause the generated head to appear to talk.

7. A system for creating a response to an inputted user query, said system comprising:
- a personality file storage section, said personality file storage section comprising a plurality of documents stored in an unstructured form;
  
  a query conversion section configured to convert said query into a word vector;
  
  a first comparison section configured to compare said word vector generated from said query with word vectors generated from the documents in said personality file storage section and output identified documents;
  
  a second comparison section configured to compare said word vector selected from said query and passages from said identified documents and to rank said selected passages, said ranking being based on the number of matches between said selected passage and said query; and
  
  a concatenation section adapted to concatenate selected passages together using sentence connectors, wherein said sentence connectors are chosen from a plurality of sentence connectors, said sentence connectors being chosen on the basis of a statistical model.
- View Dependent Claims (8, 9, 10)
- - 8. A system according to claim 7, wherein the said ranking is based on a normalised measure of the number of matches between said selected passage and said query.
  - 9. A system according to claim 7, wherein said sentence connectors are chosen using a language model.
  - 10. A system according to claim 7, wherein the system is configured to set a predetermined size for the response.

20. A method for emulating a subject, to allow a user to interact with a computer generated talking head with the subject'"'"'s face and voice;
- the method comprising;
  
  receiving a user inputted query;
  
  generating a response to a query inputted by a user from the user interface and generate a response to be outputted by the talking head, the response being generated by retrieving information from said personality storage section, said personality storage section comprising content created by or about the subject; and
  
  outputting said response by displaying a talking head which comprises the subject'"'"'s face and output speech from the mouth of the face with the subject'"'"'s voice,wherein said talking head outputs said response by;
  
  converting said response into a sequence of acoustic units using a statistical model, said statistical model comprising a plurality of model parameters, the model parameters describing probability distributions which relate an acoustic unit to an image vector and speech vector, said image vector comprising a plurality of parameters which define the subject'"'"'s face and said speech vector comprising a plurality of parameters which define the subject'"'"'s voice,the talking head appearing to talk by outputting a sequence of speech vectors and image vectors which are synchronised.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
CIPOLLA, Roberto, LATORRE-MARTINEZ, Javier, CHEN, Langzhou, BRAUNSCHWEILER, Norbert, KOLLURU, Balakrishna Venkata Jagannadha, WAN, Vincent Ping Leung, STENGER, Bjorn Dietmar Rafael, MAIA, Ranniery Da Silva, YANAGISAWA, Kayoko, STYLIANOU, Ioannis, BLOKLAND, Robert Arthur

Granted Patent

US 9,959,368 B2
Time in Patent Office

Days
Field of Search
US Class Current

706/11
CPC Class Codes

G06F 16/90335   Query processing

G06N 20/00   Machine learning

G06N 3/006   based on simulated virtual ...

G06T 13/40   of characters, e.g. humans,...

G10L 13/027   Concept to speech synthesis...

G10L 13/086   Detection of language

COMPUTER GENERATED EMULATION OF A SUBJECT

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

COMPUTER GENERATED EMULATION OF A SUBJECT

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links