×

System and method for triphone-based unit selection for visual speech synthesis

  • US 9,583,098 B1
  • Filed: 10/25/2007
  • Issued: 02/28/2017
  • Est. Priority Date: 05/10/2002
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • receiving text for conversion to speech;

    calculating a target cost of a target sequence of tri-phones associated with the speech based on a phonetic distance, a coarticulation parameter, and a speech rate of the speech, to yield a calculated target cost;

    identifying, based on the calculated target cost and following phonemes associated with a plurality of tri-phones, a plurality of candidate tri-phones;

    sampling each candidate tri-phone in the plurality of candidate tri-phones to identify how many frames are associated with the each candidate tri-phone;

    adding, where necessary, at least one frame to frames in the each candidate tri-phone of the plurality of candidate tri-phones to reach a same number of frames as in a corresponding tri-phone in the target sequence of tri-phones, to yield an updated candidate tri-phone;

    building a video frame lattice of candidate video frames, wherein each candidate video frame in the candidate video frames is associated with a tri-phone comprising one of the updated candidate tri-phone or another tri-phone from the plurality of candidate tri-phones;

    determining image coefficients for each frame in the video frame lattice of candidate video frames, wherein the image coefficients for the each frame are based on a turning point of the updated candidate tri-phone, the turning point being a change of direction in a mouth of a speaker pronouncing the updated candidate tri-phone;

    assigning a joint cost to each pair of adjacent video frames in the video frame lattice, where the joint cost is based on the image coefficients and geometric features of the each pair of adjacent video frames in the video frame lattice; and

    constructing a video sequence of the mouth of the speaker moving in synchronization with the speech by finding, using a Viterbi search, a path through the video frame lattice based on a minimum of a sum of the calculated target cost and the joint cost over the video sequence.

View all claims
  • 10 Assignments
Timeline View
Assignment View
    ×
    ×