Variable rate video playback with synchronized audio

US 5,893,062 A
Filed: 12/05/1996
Issued: 04/06/1999
Est. Priority Date: 12/05/1996
Status: Expired due to Term

First Claim

Patent Images

1. For use with an audiovisual display system in which an original set of audio data and a related original set of video data can be used to generate an audiovisual display at a normal display rate, a method for enabling the apparent display rate of the audiovisual display to be varied from the normal display rate, the method comprising the steps of:

defining a correspondence between the original set of audio data and the original set of video data;

determining a target display rate or rates for the audiovisual display by evaluating the audio and/or video data to automatically determine the value of the target display rate or rates, the evaluation comprising the steps of analyzing the original set of audio data and calculating the target display rate or rates based upon the analysis of the original set of audio data, wherein;

the step of analyzing the original set of audio data further comprises the step of ascertaining the stress with which spoken portions of the audio data are uttered; and

the step of calculating further comprises the step of calculating the target display rate or rates based upon the relative stresses of the spoken portions of the audio data;

creating a modified set of audio data, based upon the target display rate or rates and an evaluation of the content of the original set of audio data, that corresponds to the original set of audio data; and

creating a modified set of video data, based upon the modified set of audio data, the correspondence between the modified set of audio data and the original set of audio data, and the correspondence between the original set of audio data and the original set of video data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention enables the apparent display rate of an audiovisual display to be varied. The invention can modify an original set of audio data in accordance with a target display rate, then modify a related original set of video data to conform to the modifications made to the audio data set, such that the modified audio and video data sets are synchronized. When the modified audio and video data sets so produced are used to generate an audiovisual display, the audiovisual display has an apparent display rate that approximates the target display rate. The target display rate can be faster or slower than a normal display rate at which an audiovisual display system generates an audiovisual display from the original sets of audio and video data. The target display rate can be established solely by a user instruction, by analysis of the audiovisual data, or by modification of a user-specified nominal target display rate based upon analysis of the audiovisual data. Preferably, the method for modifying the original audio data set is one that produces a modified audio data set that can be used to generate an audio display having little or no distortion.

Citations

43 Claims

1. For use with an audiovisual display system in which an original set of audio data and a related original set of video data can be used to generate an audiovisual display at a normal display rate, a method for enabling the apparent display rate of the audiovisual display to be varied from the normal display rate, the method comprising the steps of:
- defining a correspondence between the original set of audio data and the original set of video data;
  
  determining a target display rate or rates for the audiovisual display by evaluating the audio and/or video data to automatically determine the value of the target display rate or rates, the evaluation comprising the steps of analyzing the original set of audio data and calculating the target display rate or rates based upon the analysis of the original set of audio data, wherein;
  
  the step of analyzing the original set of audio data further comprises the step of ascertaining the stress with which spoken portions of the audio data are uttered; and
  
  the step of calculating further comprises the step of calculating the target display rate or rates based upon the relative stresses of the spoken portions of the audio data;
  
  creating a modified set of audio data, based upon the target display rate or rates and an evaluation of the content of the original set of audio data, that corresponds to the original set of audio data; and
  
  creating a modified set of video data, based upon the modified set of audio data, the correspondence between the modified set of audio data and the original set of audio data, and the correspondence between the original set of audio data and the original set of video data.
- View Dependent Claims (2)
- - 2. A method as in claim 1, wherein the step of ascertaining stress further comprises the step of computing energy terms for the spoken portions of the audio data.

3. For use with an audiovisual display system in which an original set of audio data and a related original set of video data can be used to generate an audiovisual display at a normal display rate, a method for enabling the apparent display rate of the audiovisual display to be varied from the normal display rate, the method comprising the steps of:
- defining a correspondence between the original set of audio data and the original set of video data;
  
  determining a target display rate or rates for the audiovisual display by evaluating the audio and/or video data to automatically determine the value of the target display rate or rates, the evaluation comprising the steps of analyzing the original set of audio data and calculating the target display rate or rates based upon the analysis of the original set of audio data, wherein;
  
  the step of analyzing the original set of audio data further comprises the step of ascertaining the speaking rate at which spoken portions of the audio data are uttered; and
  
  the step of calculating further comprises the step of calculating the target display rate or rates based upon the relative speeds of the spoken portions of the audio data;
  
  creating a modified set of audio data, based upon the target display rate or rates and an evaluation of the content of the original set of audio data, that corresponds to the original set of audio data; and
  
  creating a modified set of video data, based upon the modified set of audio data, the correspondence between the modified set of audio data and the original set of audio data, and the correspondence between the original set of audio data and the original set of video data.
- View Dependent Claims (4)
- - 4. A method as in claim 3, wherein the step of ascertaining speaking rates further comprises the step of ascertaining spectral changes in the spoken portions of the audio data.

5. For use with an audiovisual display system in which an original set of audio data and a related original set of video data can be used to generate an audiovisual display at a normal display rate, a method for enabling the apparent display rate of the audiovisual display to be varied from the normal display rate, the method comprising the steps of:
- defining a correspondence between the original set of audio data and the original set of video data;
  
  determining a target display rate or rates for the audiovisual display by evaluating the audio and/or video data to automatically determine the value of the target display rate or rates, the evaluation comprising the steps of analyzing the original set of audio data and calculating the target display rate or rates based upon the analysis of the original set of audio data, wherein;
  
  the step of analyzing the original set of audio data further comprises the steps of;
  
  ascertaining the stress with which spoken portions of the audio data are uttered;
  
  ascertaining the speaking rate at which spoken portions of the audio data are uttered; and
  
  combining corresponding stresses and speaking rates to produce audio tension values for the spoken portions; and
  
  the step of calculating further comprises the step of calculating the target display rate or rates based upon the audio tension values of the spoken portions of the audio data;
  
  creating a modified set of audio data, based upon the target display rate or rates and an evaluation of the content of the original set of audio data, that corresponds to the original set of audio data; and
  
  creating a modified set of video data, based upon the modified set of audio data, the correspondence between the modified set of audio data and the original set of audio data, and the correspondence between the original set of audio data and the original set of video data.
- View Dependent Claims (6)
- - 6. A method as in claim 5, further comprising the step of ascertaining the value of a nominal target display rate specified by a user of the audiovisual display system, wherein the step of calculating further comprises the step of combining the audio tension values with the nominal target display rate to produce the target display rate.

7. For use with an audiovisual display system in which an original set of audio data and a related original set of video data can be used to generate an audiovisual display at a normal display rate, a method for enabling the apparent display rate of the audiovisual display to be varied from the normal display rate, the method comprising the steps of:
- defining a correspondence between the original set of audio data and the original set of video data;
  
  determining a target display rate or rates for the audiovisual display by evaluating the audio and/or video data to automatically determine the value of the target display rate or rates, the evaluation comprising the steps of analyzing the original set of audio data and calculating the target display rate or rates based upon the analysis of the original set of audio datacreating a modified set of audio data, based upon the target display rate or rates and an evaluation of the content of the original set of audio data, that corresponds to the original set of audio data, wherein the step of creating a modified set of audio data further comprises the steps of;
  
  (i) dividing the original set of audio data into a plurality of segments, each segment representing a contiguous portion of the original set of audio data that occurs during a specified duration of time, each segment being adjacent to one or two other segments such that there are no gaps between segments and adjacent segments do not overlap;
  
  (ii) selecting a first segment;
  
  (iii) selecting a second segment, the second segment temporally adjacent to the first segment;
  
  (iv) overlapping an end portion of the first segment with an end portion of the second segment that is adjacent to the first segment, the end portion of the first segment including first segment overlap data and the end portion of the second segment including second segment overlap data;
  
  (v) identifying as part of the modified set of audio data the audio data from the first segment that is not part of the first segment overlap data;
  
  (vi) blending corresponding first segment overlap data and second segment overlap data; and
  
  (vii) determining whether there are additional segments in the original set of audio data that have not been overlapped with an adjacent segment, wherein;
  
  if there are additional segments, the following steps are further performed;
  
  (viii) combining the blended overlap data with the audio data from the second segment that is not part of the second segment overlap data;
  
  (ix) selecting the combined data as a new first segment;
  
  (x) selecting a new second segment that is temporally adjacent to the new first segment and that has not previously been selected as a segment; and
  
  (xi) repeating steps (i) through (vii); and
  
  if there are not additional segments, the following step is further performed;
  
  (xii) identifying as part of the modified set of audio data the blended data and the audio data from the second segment that is not part of the second segment overlap data; and
  
  creating a modified set of video data, based upon the modified set of audio data, the correspondence between the modified set of audio data and the original set of audio data, and the correspondence between the original set of audio data and the original set of video data.

8. A method for modifying an original set of audio data to produce a modified set of audio data, comprising the steps of:
- (i) dividing the original set of audio data into a plurality of segments, each segment representing a contiguous portion of the original set of audio data that occurs during a specified duration of time, each segment being adjacent to one or two other segments such that there are no gaps between segments and adjacent segments do not overlap;
  
  (ii) selecting a first segment;
  
  (iii) selecting a second segment, the second segment temporally adjacent to the first segment;
  
  (iv) overlapping an end portion of the first segment with an end portion of the second segment that is adjacent to the first segment, the end portion of the first segment including first segment overlap data and the end portion of the second segment including second segment overlap data;
  
  (v) identifying as part of the modified set of audio data the audio data from the first segment that is not part of the first segment overlap data;
  
  (vi) blending corresponding first segment overlap data and second segment overlap data; and
  
  (vii) determining whether there are additional segments in the original set of audio data that have not been overlapped with an adjacent segment, wherein;
  
  if there are additional segments, the following steps are further performed;
  
  (viii) combining the blended overlap data with the audio data from the second segment that is not part of the second segment overlap data;
  
  (ix) selecting the combined data as a new first segment;
  
  (x) selecting a new second segment that is temporally adjacent to the new first segment and that has not previously been selected as a segment; and
  
  (xi) repeating steps (i) through (vii); and
  
  if there are not additional segments, the following step is further performed;
  
  (xii) identifying as part of the modified set of audio data the blended data and the audio data from the second segment that is not part of the second segment overlap data.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 9. A method as in claim 8, wherein the step of selecting a first segment further comprises the step of selecting the first temporally occurring segment in the original audio data set.
  - 10. A method as in claim 8, wherein the step of selecting a first segment further comprises the step of selecting the last temporally occurring segment in the original audio data set.
  - 11. A method as in claim 8, wherein the step of overlapping further comprises the steps of:
    - determining a duration of time to be used as a target overlap;
      
      selecting a set of trial overlaps based upon the target overlap, each trial overlap representing a different duration of time, wherein if one or more of the set of trial overlaps is negative, data from the adjacent ends of the first and second segments are added to the ends of the second and first segments, respectively, to create extended first and second segments that encompass the negative overlap;
      
      for each trial overlap, identifying as the first segment overlap data the audio data in the first segment or extended first segment that occurs during a period of time that is equal to the trial overlap and that is temporally adjacent to the second segment or extended second segment;
      
      for each trial overlap, identifying as the second segment overlap data the audio data in the second segment or extended second segment that occurs during a period of time that is equal to the trial overlap and that is temporally adjacent to the first segment or extended first segment;
      
      for each trial overlap, calculating the correlation between the corresponding first and second segment overlap data;
      
      selecting the trial overlap for which the calculated correlation is highest as the overlap of the first and second segments or extended first and second segments.
  - 12. A method as in claim 11, wherein the step of calculating further comprises the step of calculating the mean cross product of corresponding data from the first segment overlap data and the second segment overlap data, after the mean value of the first segment overlap data has been subtracted from each of the first segment overlap data and the mean value of the second segment overlap data has been subtracted from each of the second segment overlap data.
  - 13. A method as in claim 12, wherein the step of selecting as the overlap of the first and second segments or extended first and second segments further comprises the step of selecting the trial overlap having the highest mean cross product.
  - 14. A method as in claim 11, wherein the magnitude of the target overlap is dependent upon a target display rate of the display of the audio data set.
  - 15. A method as in claim 14, wherein the magnitude of the target overlap is calculated from the equation t= (s-1)/s!* d, where t is the magnitude of the target overlap, s is the target display rate, and d is the length of each segment.
  - 16. A method as in claim 11, wherein the step of selecting a set of trial overlaps further comprises the step of selecting a set of trial overlaps that are centered about the target overlap.
  - 17. A method as in claim 11, wherein the step of selecting a set of trial overlaps further comprises the step of selecting a set of trial overlaps such that the duration of time from the largest overlap to the smallest overlap is at least large enough to include one pitch pulse of the lowest frequency pitch expected to be encountered in the audio data.
  - 18. A method as in claim 8, wherein the step of blending further comprises the step of performing a linear cross fade of the first segment overlap data with the corresponding data from the second segment overlap data.
  - 19. A method as in claim 8, further comprising the step of generating a display from the data identified as part of the modified audio data set.

20. A system for modifying an original set of audio data to produce a modified set of audio data, comprising:
- (i) means for dividing the original set of audio data into a plurality of segments, each segment representing a contiguous portion of the original set of audio data that occurs during a specified duration of time, each segment being adjacent to one or two other segments such that there are no gaps between segments and adjacent segments do not overlap;
  
  (ii) means for selecting a first segment;
  
  (iii) means for selecting a second segment, the second segment temporally adjacent to the first segment;
  
  (iv) means for overlapping an end portion of the first segment with an end portion of the second segment that is adjacent to the first segment, the end portion of the first segment including first segment overlap data and the end portion of the second segment including second segment overlap data;
  
  (v) means for identifying as part of the modified set of audio data the audio data from the first segment that is not part of the first segment overlap data;
  
  (vi) means for blending corresponding first segment overlap data and second segment overlap data;
  
  (vii) means for determining whether there are additional segments in the original set of audio data that have not been overlapped with an adjacent segment;
  
  (viii) means for performing, if there are additional segments, the following functions;
  
  combining the blended overlap data with the audio data from the second segment that is not part of the second segment overlap data;
  
  selecting the combined data as a new first segment;
  
  selecting a new second segment that is temporally adjacent to the new first segment and that has not previously been selected as a segment; and
  
  repeating steps (i) through (vii); and
  
  means for performing, if there are not additional segments, the function of identifying as part of the modified set of audio data the blended data and the audio data from the second segment that is not part of the second segment overlap data.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 22. A system as in claim 20, wherein the means for selecting a first segment further comprises means for selecting the first temporally occurring segment in the original audio data set.
  - 23. A system as in claim 20, wherein the means for selecting a first segment further comprises means for selecting the last temporally occurring segment in the original audio data set.
  - 24. A system as in claim 20, wherein the means for overlapping further comprises:
    - means for determining a duration of time to be used as a target overlap;
      
      means for selecting a set of trial overlaps based upon the target overlap, each trial overlap representing a different duration of time, wherein it one or more of the set of trial overlaps is negative, data from the adjacent ends of the first and second segments are added to the ends of the second and first segments, respectively, to create extended first and second segments that encompass the negative overlap;
      
      means for identifying as the first segment overlap data, for each trial overlap, the audio data in the first segment or extended first segment that occurs during a period of time that is equal to the trial overlap and that is temporally adjacent to the second segment or extended second segment;
      
      means for identifying as the second segment overlap data, for each trial overlap, the audio data in the second segment or extended second segment that occurs during a period of time that is equal to the trial overlap and that is temporally adjacent to the first segment or extended first segment;
      
      means for calculating, for each trial overlap, the correlation between the corresponding first and second segment overlap data;
      
      means for selecting the trial overlap for which the calculated correlation is highest as the overlap of the first and second segments or extended first and second segments.
  - 25. A system as in claim 24, wherein the means for calculating further comprises means for calculating the mean cross product of corresponding data from the first segment overlap data and the second segment overlap data, after the mean value of the first segment overlap data has been subtracted from each of the first segment overlap data and the mean value of the second segment overlap data has been subtracted from each of the second segment overlap data.
  - 26. A system as in claim 25, wherein the means for selecting as the overlap of the first and second segments or extended first and second segments further comprises means for selecting the trial overlap having the highest mean cross product.
  - 27. A system as in claim 24, wherein the magnitude of the target overlap is dependent upon a target display rate of the display of the audio data set.
  - 28. A system as in claim 27, wherein the magnitude of the target overlap is calculated from the equation t= (s-1)/s!*d, where t is the magnitude of the target overlap, s is the target display rate, and d is the length of each segment.
  - 29. A system as in claim 24, wherein the means for selecting a set of trial overlaps further comprises means for selecting a set of trial overlaps that are centered about the target overlap.
  - 30. A system as in claim 24, wherein the means for selecting a set of trial overlaps further comprises means for selecting a set of trial overlaps such that the duration of time from the largest overlap to the smallest overlap is at least large enough to include one pitch pulse of the lowest frequency pitch expected to be encountered in the audio data.
  - 31. A system as in claim 20, wherein the means for blending further comprises means for performing a linear cross fade of the first segment overlap data with the corresponding data from the second segment overlap data.
  - 32. A system as in claim 20, further comprising means for generating a display from the data identified as part or the modified audio data set.

21. A computer readable storage medium encoded with one or more computer programs for modifying an original set of audio data to produce a modified set of audio data, comprising:
- (i) instructions for dividing the original set of audio data into a plurality of segments, each segment representing a contiguous portion of the original set of audio data that occurs during a specified duration of time, each segment being adjacent to one or two other segments such that there are no gaps between segments and adjacent segments do not overlap;
  
  (ii) instructions for selecting a first segment;
  
  (iii) instructions for selecting a second segment, the second segment temporally adjacent to the first segment;
  
  (iv) instructions for overlapping an end portion of the first segment with an end portion of the second segment that is adjacent to the first segment, the end portion of the first segment including first segment overlap data and the end portion of the second segment including second segment overlap data;
  
  (v) instructions for identifying as part of the modified set of audio data the audio data from the first segment that is not part of the first segment overlap data;
  
  (vi) instructions for blending corresponding first segment overlap data and second segment overlap data; and
  
  (vii) instructions for determining whether there are additional segments in the original set of audio data that have not been overlapped with an adjacent segment, wherein;
  
  if there are additional segments, the following instructions are further performed;
  
  (viii) instructions for combining the blended overlap data with the audio data from the second segment that is not part of the second segment overlap data;
  
  (ix) instructions for selecting the combined data as a new first segment;
  
  (x) instructions for selecting a new second segment that is temporally adjacent to the new first segment and that has not previously been selected as a segment; and
  
  (xi) instructions for repeating steps (i) through (vii); and
  
  if there are not additional segments, the following instructions are further performed;
  
  (xii) instructions for identifying as part of the modified set of audio data the blended data and the audio data from the second segment that is not part of the second segment overlap data.
- View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43)
- - 33. A computer readable storage medium as in claim 21, wherein the instructions for selecting a first segment further comprise instructions for selecting the first temporally occurring segment in the original audio data set.
  - 34. A computer readable storage medium as in claim 21, wherein the instructions for selecting a first segment further comprise instructions for selecting the last temporally occurring segment in the original audio data set.
  - 35. A computer readable storage medium as in claim 21, wherein the instructions for overlapping further comprise:
    - instructions for determining a duration of time to be used as a target overlap;
      
      instructions for selecting a set of trial overlaps based upon the target overlap, each trial overlap representing a different duration of time, wherein if one or more of the set of trial overlaps is negative, data from the adjacent ends of the first and second segments are added to the ends of the second and first segments, respectively, to create extended first and second segments that encompass the negative overlap;
      
      instructions for identifying as the first segment overlap data, for each trial overlap, the audio data in the first segment or extended first segment that occurs during a period of time that is equal to the trial overlap and that is temporally adjacent to the second segment or extended second segment;
      
      instructions for identifying as the second segment overlap data, for each trial overlap, the audio data in the second segment or extended second segment that occurs during a period of time that is equal to the trial overlap and that is temporally adjacent to the first segment or extended first segment;
      
      instructions for calculating, for each trial overlap, the correlation between the corresponding first and second segment overlap data;
      
      instructions for selecting the trial overlap for which the calculated correlation is highest as the overlap of the first and second segments or extended first and second segments.
  - 36. A computer readable storage medium as in claim 35, wherein the instructions for calculating further comprise instructions for calculating the mean cross product of corresponding data from the first segment overlap data and the second segment overlap data, after the mean value of the first segment overlap data has been subtracted from each of the first segment overlap data and the mean value of the second segment overlap data has been subtracted from each of the second segment overlap data.
  - 37. A computer readable storage medium as in claim 36, wherein the instructions for selecting as the overlap of the first and second segments or extended first and second segments further comprise instructions for selecting the trial overlap having the highest mean cross product.
  - 38. A computer readable storage medium as in claim 35, wherein the magnitude of the target overlap is dependent upon a target display rate of the display of the audio data set.
  - 39. A computer readable storage medium as in claim 38, wherein the magnitude of the target overlap is calculated from the equation t= (s-1)/s!* d, where t is the magnitude of the target overlap, s is the target display rate, and d is the length of each segment.
  - 40. A computer readable storage medium as in claim 35, wherein the instructions for selecting a set of trial overlaps further comprise instructions for selecting a set of trial overlaps that are centered about the target overlap.
  - 41. A computer readable storage medium as in claim 35, wherein the instructions for selecting a set of trial overlaps further comprise instructions for selecting a set of trial overlaps such that the duration of time from the largest overlap to the smallest overlap is at least large enough to include one pitch pulse of the lowest frequency pitch expected to be encountered in the audio data.
  - 42. A computer readable storage medium as in claim 21, wherein the instructions for blending further comprise instructions for performing a linear cross fade of the first segment overlap data with the corresponding data from the second segment overlap data.
  - 43. A computer readable storage medium as in claim 21, further comprising instructions for generating a display from the data identified as part of the modified audio data set.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Interval Research Corporation
Original Assignee
Interval Research Corporation
Inventors
Ahmad, Subutai, Covell, Michele, Bhadkamkar, Neal A.
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
ZINTEL, HAROLD ALBERT

Application Number

US08/760,769
Time in Patent Office

852 Days
Field of Search

386/66, 386/54, 704/503, 704/270, 364/514
US Class Current

704/270
CPC Class Codes

G11B 27/10   Indexing; Addressing; Timin...

G11B 27/34   Indicating arrangements in...

G11B 33/10   Indicating arrangements; Wa...

H04N 21/426   Internal components of the ...

H04N 21/43072   of multiple content streams...

H04N 21/4341   Demultiplexing of audio and...

H04N 21/439   Processing of audio element...

H04N 21/47217   for controlling playback fu...

H04N 5/04   Synchronising for televisio...

H04N 5/602   for digital sound signals

H04N 5/783   Adaptations for reproducing...

Variable rate video playback with synchronized audio

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

43 Claims

Specification

Solutions

Use Cases

Quick Links

Variable rate video playback with synchronized audio

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

43 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links