Systems and methods for providing a multi-modal evaluation of a presentation

US 10,311,743 B2
Filed: 04/08/2014
Issued: 06/04/2019
Est. Priority Date: 04/08/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented system for providing a multi-modal evaluation of a presentation, comprising:

a motion capture device configured to generate motion data representing motion of an examinee giving a presentation, the motion data generated by the motion capture device representing three dimensional depth information, motion based on anchor points at respective positions of the examinee, or video frames;

an audio recording device configured to generate audio data representing audio of the examinee giving the presentation; and

a processing system configured to;

generate a plurality of non-verbal metrics of the presentation based on the motion data, the non-verbal metrics selected from the group consisting of a metric of gesticulation, a metric of posture, a metric of eye contact, and a metric of facial expression,wherein the metric of gesticulation is generated based on the depth measurements indicating an amount of hand gesturing and based on a magnitude or a rate of pixel value changes between the video frames;

wherein the metric of posture is generated based on changes in relative distances among the anchor points;

wherein the metric of eye contact or facial expression is generated based on analysis of the video frames;

generate a plurality of audio metrics of the presentation based on the audio data, wherein the audio metrics are selected from the group consisting of a content metric, a non-content transcript based metric, and a non-content metric,wherein the content metric is generated based on generating a first transcript based on the audio data, and then comparing the first transcript to a model transcript or to a presentation topic prompt;

wherein the non-content transcript based metric is generated based on the first transcript and comparing sounds produced by the examinee at points in the first transcript to proper pronunciation of words at the points in the first transcript;

wherein the non-content metric is generated based on one or more of stresses, accents, and discontinuities in the audio data; and

generate and output a presentation score indicating an evaluation of the presentation based on inputting the non-verbal metrics and the audio metrics to a model comprising weights for a plurality of the non-verbal and audio metrics, the weights being based on correlations between human scores and the non-verbal and audio metrics within a collection of human-scored presentations.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are described for providing a multi-modal evaluation of a presentation. A system includes a motion capture device configured to detect motion an examinee giving a presentation and an audio recording device configured to capture audio of the examinee giving the presentation. One or more data processors are configured to extract a non-verbal feature of the presentation based on data collected by the motion capture device and an audio feature of the presentation based on data collected by the audio recording device. The one or more data processors are further configured to generate a presentation score based on the non-verbal feature and the audio feature.

Citations

16 Claims

1. A computer-implemented system for providing a multi-modal evaluation of a presentation, comprising:
- a motion capture device configured to generate motion data representing motion of an examinee giving a presentation, the motion data generated by the motion capture device representing three dimensional depth information, motion based on anchor points at respective positions of the examinee, or video frames;
  
  an audio recording device configured to generate audio data representing audio of the examinee giving the presentation; and
  
  a processing system configured to;
  
  generate a plurality of non-verbal metrics of the presentation based on the motion data, the non-verbal metrics selected from the group consisting of a metric of gesticulation, a metric of posture, a metric of eye contact, and a metric of facial expression,wherein the metric of gesticulation is generated based on the depth measurements indicating an amount of hand gesturing and based on a magnitude or a rate of pixel value changes between the video frames;
  
  wherein the metric of posture is generated based on changes in relative distances among the anchor points;
  
  wherein the metric of eye contact or facial expression is generated based on analysis of the video frames;
  
  generate a plurality of audio metrics of the presentation based on the audio data, wherein the audio metrics are selected from the group consisting of a content metric, a non-content transcript based metric, and a non-content metric,wherein the content metric is generated based on generating a first transcript based on the audio data, and then comparing the first transcript to a model transcript or to a presentation topic prompt;
  
  wherein the non-content transcript based metric is generated based on the first transcript and comparing sounds produced by the examinee at points in the first transcript to proper pronunciation of words at the points in the first transcript;
  
  wherein the non-content metric is generated based on one or more of stresses, accents, and discontinuities in the audio data; and
  
  generate and output a presentation score indicating an evaluation of the presentation based on inputting the non-verbal metrics and the audio metrics to a model comprising weights for a plurality of the non-verbal and audio metrics, the weights being based on correlations between human scores and the non-verbal and audio metrics within a collection of human-scored presentations.
- View Dependent Claims (2, 3, 4, 5, 6, 10)
- - 2. The system of claim 1, wherein the motion capture device is a three dimensional depth measurement device or a marker based motion detection device.
  - 3. The system of claim 1, wherein generating the content metric or the non-content transcript based metric includes:
    - providing the audio data to an automatic speech recognizer;
      
      wherein the automatic speech recognizer generates the first transcript based on the audio data.
  - 4. The system of claim 3, wherein the presentation is given based on the presentation topic prompt, wherein the content metric is generated based on content in the presentation being on topic relative to the presentation topic prompt based on the first transcript.
  - 5. The system of claim 3, wherein the automatic speech recognizer further generates the non-content metric based on the audio data;
    - wherein the non-content metric comprises a fluency or prosody measurement indicating a quality of speech of the examinee.
  - 6. The method of claim 1, wherein the presentation score is used to automatically determine whether an interviewee is to receive a call back for a second interview.
  - 10. The method of claim 1, wherein the presentation is given based on the presentation topic prompt, wherein the content metric is generated based on content in the presentation being on topic relative to the presentation topic prompt based on the first transcript.

7. A computer-implemented method of providing a multi-modal evaluation of a presentation, comprising:
- generating motion data representing motion of an examinee giving a presentation using a motion capture device, the motion data representing three dimensional depth information, motion based on anchor points at respective positions of the examinee, or video frames;
  
  generating audio data representing audio of the examinee giving the presentation using an audio recording device;
  
  generating a plurality of non-verbal metrics of the presentation based on the motion data, the non-verbal metrics selected from the group consisting of a metric of gesticulation, a metric of posture, a metric of eye contact, and a metric of facial expression,wherein the metric of gesticulation is generated based on the depth measurements indicating an amount of hand gesturing and based on a magnitude or a rate of pixel value changes between the video frames;
  
  wherein the metric of posture is generated based on changes in relative distances among the anchor points;
  
  wherein the metric of eye contact or facial expression is generated based on analysis of the video frames;
  
  generating a plurality of audio metrics of the presentation based on the audio data, wherein the audio metrics are selected from the group consisting of a content metric, a non-content transcript based metric, and a non-content metric,wherein the content metric is generated based on generating a first transcript based on the audio data, and then comparing the first transcript to a model transcript or to a presentation topic prompt;
  
  wherein the non-content transcript based metric is generated based on the first transcript and comparing sounds produced by the examinee at points in the first transcript to proper pronunciation of words at the points in the first transcript;
  
  wherein the non-content metric is generated based on one or more of stresses, accents, and discontinuities in the audio data; and
  
  generating and outputting a presentation score indicating an evaluation of the presentation based on inputting the non-verbal metrics and the audio metrics to a model comprising weights for a plurality of the non-verbal and audio metrics, the weights being based on correlations between human scores and the non-verbal and audio metrics within a collection of human-scored presentations.
- View Dependent Claims (8, 9, 11)
- - 8. The method of claim 7, wherein the motion capture device is a three dimensional depth measurement device or a marker based motion detection device.
  - 9. The method of claim 7, wherein generating the content metric or the non-content transcript based metric includes:
    - providing the audio data to an automatic speech recognizer;
      
      wherein the automatic speech recognizer generates the first transcript based on the audio data.
  - 11. The method of claim 9, wherein the automatic speech recognizer further generates the non-content metric based on the audio data;
    - wherein the non-content metric a fluency or prosody measurement indicating a quality of speech of the examinee.

12. A non-transitory computer-readable medium encoded with instructions for commanding one or more data processors to execute steps for providing a multi-modal evaluation of a presentation, comprising:
- generating motion data representing motion of an examinee giving a presentation using a motion capture device, the motion data representing three dimensional depth information, motion based on anchor points at respective positions of the examinee, or video frames;
  
  generating audio data representing audio of the examinee giving the presentation using an audio recording device;
  
  generating a plurality of non-verbal metrics of the presentation based on the motion data, the non-verbal metrics selected from the group consisting of a metric of gesticulation, a metric of posture, a metric of eye contact, and a metric of facial expression,wherein the metric of gesticulation is generated based on the depth measurements indicating an amount of hand gesturing and based on a magnitude or a rate of pixel value changes between the video frames;
  
  wherein the metric of posture is generated based on changes in relative distances among the anchor points;
  
  wherein the metric of eye contact or facial expression is generated based on analysis of the video frames;
  
  generating a plurality of audio metrics of the presentation based on the audio data, wherein the audio metrics are selected from the group consisting of a content metric, a non-content transcript based metric, and a non-content metric,wherein the content metric is generated based on generating a first transcript based on the audio data, and then comparing the first transcript to a model transcript or to a presentation topic prompt;
  
  wherein the non-content transcript based metric is generated based on the first transcript and comparing sounds produced by the examinee at points in the first transcript to proper pronunciation of words at the points in the first transcript;
  
  wherein the non-content metric is generated based on one or more of stresses, accents, and discontinuities in the audio data; and
  
  generating and outputting a presentation score indicating an evaluation of the presentation based on inputting the non-verbal metrics and the audio metrics to a model comprising weights for a plurality of the non-verbal and audio metrics, the weights being based on correlations between human scores and the non-verbal and audio metrics within a collection of human-scored presentations.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The computer-readable medium of claim 12, wherein the motion capture device is a three dimensional depth measurement device or a marker based motion detection device.
  - 14. The computer-readable medium of claim 12, wherein generating the content metric or the non-content transcript based metric includes:
    - providing the audio data to an automatic speech recognizer;
      
      wherein the automatic speech recognizer generates the first transcript based on the audio data.
  - 15. The computer-readable medium of claim 14, wherein the presentation is given based on the presentation topic prompt, wherein the content metric is generated based on content in the presentation being on topic relative to the presentation topic prompt based on the first transcript.
  - 16. The computer-readable medium of claim 14, wherein the automatic speech recognizer further generates the non-content metric based on the audio data;
    - wherein the non-content metric comprises a fluency or prosody measurement indicating a quality of speech of the examinee.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Educational Testing Service
Original Assignee
Educational Testing Service
Inventors
Chen, Lei, Feng, Gary, Leong, Chee Wee, Kitchen, Christopher, Lee, Chong Min
Primary Examiner(s)
Fletcher, Jerry-Daryl

Application Number

US14/247,757
Publication Number

US 20140302469A1
Time in Patent Office

1,883 Days
Field of Search

434236
US Class Current
CPC Class Codes

G09B 19/00 Teaching not covered by oth...

Systems and methods for providing a multi-modal evaluation of a presentation

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for providing a multi-modal evaluation of a presentation

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links