Public speaking trainer with 3-D simulation and real-time feedback

US 10,446,055 B2
Filed: 08/11/2015
Issued: 10/15/2019
Est. Priority Date: 08/13/2014
Status: Active Grant

First Claim

Patent Images

1. A method of public speaking training, comprising:

providing a speech analysis engine;

using the speech analysis engine executing on a first computer system to extract a plurality of features from a plurality of prerecorded speeches;

providing manual ratings from public speaking experts for an overall quality of each of the plurality of prerecorded speeches;

using a machine learning algorithm executing on the first computer system to compare the manual ratings of the prerecorded speeches to the plurality of features extracted from the prerecorded speeches, wherein the machine learning algorithm generates a predictive model defining correlations between the plurality of features and the manual ratings, and wherein the predictive model includes a plurality of rating scales with thresholds for the plurality of features, wherein a first rating scale for a first feature of the plurality of features includes a plurality of thresholds for rating the first feature and a first threshold of the plurality of thresholds is above a minimum and below a maximum of the first rating scale;

providing a second computer system including a display monitor, a microphone, and a video capture device;

presenting a first interface on the display monitor of the second computer system allowing entry of an environment configuration, an audience configuration, and a presentation configuration by a user, wherein the environment configuration, audience configuration, and presentation are separately configurable by the user independent from each other, wherein the audience configuration includes a size of the audience, a type of audience, and a technical expertise of the audience that are independently configurable by the user, wherein the type of audience includes executives, upper management, technical professionals, or students, wherein the presentation configuration includes a desired presentation length and a presentation topic that are independently configurable by the user;

providing a prompt allowing input of presentation materials;

receiving a presentation material onto the second computer system through the prompt, wherein the presentation material includes a plurality of slides;

rendering a simulated environment on the display monitor using the second computer system in accordance with the environment configuration and including a number of simulated audience members in the simulated environment determined from the size of the audience of the audience configuration;

recording a presentation by the user onto the second computer system using the microphone and the video capture device;

extracting, by the second computer system executing the speech analysis engine while recording the presentation, the plurality of features from the presentation, wherein the plurality of features includes a pitch variability, volume variability, pace, pace variability, and length and timing of pauses;

providing an audio signal from the microphone to a speech-to-text application executing on the second computer system to generate a transcript of the presentation in real time;

performing natural language processing on the transcript of the presentation, using the second computer system, to determine a linguistic complexity of the presentation, wherein the linguistic complexity is included in the plurality of features;

analyzing the transcript of the presentation, by the second computer system, to provide a metric of proper use of domain specific terms for the technical expertise of the audience, wherein the proper use of domain specific terms is included in the plurality of features;

analyzing, by the second computer system, usage of the presentation material while recording the presentation, including an amount of eye contact with the simulated audience members versus eye contact with the presentation material, wherein the amount of eye contact is included in the plurality of features;

analyzing the presentation by comparing the plurality of features against the thresholds of the rating scales of the predictive model, wherein a rating for the first feature is determined by comparing the first feature extracted from the presentation against the plurality of thresholds associated with the first feature in the predictive model, and wherein the analyzing is tailored to the technical expertise of the audience, the desired presentation length, and the presentation topic entered by the user;

animating the simulated audience member in response to at least one of the plurality of features;

drawing a real-time metric graph on the display monitor for the first feature while recording the presentation; and

presenting a second interface on the display monitor of the second computer system after recording the presentation to play the recording of the presentation along with a graph illustrating a second feature of the plurality of features on a scale obtained from the predictive model, wherein the second feature is selected based on the type of audience, and wherein the graph moves along with the recording of the presentation.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A public speaking trainer has a computer system including a display monitor. A microphone is coupled to the computer system. A video capture device is coupled to the computer system. A biometric device is coupled to the computer system. A simulated environment including a simulated audience member is rendered on the display monitor using the computer system. A presentation is recorded onto the computer system using the microphone and video capture device. A first feature of the presentation is extracted based on data from the microphone and video capture device while recording the presentation. A metric is calculated based on the first feature. The simulated audience member is animated in response to a change in the metric. A score is generated based on the metric. The score is displayed on the display monitor of the computer system after recording the presentation. A training video is suggested based on the score.

Citations

11 Claims

1. A method of public speaking training, comprising:
- providing a speech analysis engine;
  
  using the speech analysis engine executing on a first computer system to extract a plurality of features from a plurality of prerecorded speeches;
  
  providing manual ratings from public speaking experts for an overall quality of each of the plurality of prerecorded speeches;
  
  using a machine learning algorithm executing on the first computer system to compare the manual ratings of the prerecorded speeches to the plurality of features extracted from the prerecorded speeches, wherein the machine learning algorithm generates a predictive model defining correlations between the plurality of features and the manual ratings, and wherein the predictive model includes a plurality of rating scales with thresholds for the plurality of features, wherein a first rating scale for a first feature of the plurality of features includes a plurality of thresholds for rating the first feature and a first threshold of the plurality of thresholds is above a minimum and below a maximum of the first rating scale;
  
  providing a second computer system including a display monitor, a microphone, and a video capture device;
  
  presenting a first interface on the display monitor of the second computer system allowing entry of an environment configuration, an audience configuration, and a presentation configuration by a user, wherein the environment configuration, audience configuration, and presentation are separately configurable by the user independent from each other, wherein the audience configuration includes a size of the audience, a type of audience, and a technical expertise of the audience that are independently configurable by the user, wherein the type of audience includes executives, upper management, technical professionals, or students, wherein the presentation configuration includes a desired presentation length and a presentation topic that are independently configurable by the user;
  
  providing a prompt allowing input of presentation materials;
  
  receiving a presentation material onto the second computer system through the prompt, wherein the presentation material includes a plurality of slides;
  
  rendering a simulated environment on the display monitor using the second computer system in accordance with the environment configuration and including a number of simulated audience members in the simulated environment determined from the size of the audience of the audience configuration;
  
  recording a presentation by the user onto the second computer system using the microphone and the video capture device;
  
  extracting, by the second computer system executing the speech analysis engine while recording the presentation, the plurality of features from the presentation, wherein the plurality of features includes a pitch variability, volume variability, pace, pace variability, and length and timing of pauses;
  
  providing an audio signal from the microphone to a speech-to-text application executing on the second computer system to generate a transcript of the presentation in real time;
  
  performing natural language processing on the transcript of the presentation, using the second computer system, to determine a linguistic complexity of the presentation, wherein the linguistic complexity is included in the plurality of features;
  
  analyzing the transcript of the presentation, by the second computer system, to provide a metric of proper use of domain specific terms for the technical expertise of the audience, wherein the proper use of domain specific terms is included in the plurality of features;
  
  analyzing, by the second computer system, usage of the presentation material while recording the presentation, including an amount of eye contact with the simulated audience members versus eye contact with the presentation material, wherein the amount of eye contact is included in the plurality of features;
  
  analyzing the presentation by comparing the plurality of features against the thresholds of the rating scales of the predictive model, wherein a rating for the first feature is determined by comparing the first feature extracted from the presentation against the plurality of thresholds associated with the first feature in the predictive model, and wherein the analyzing is tailored to the technical expertise of the audience, the desired presentation length, and the presentation topic entered by the user;
  
  animating the simulated audience member in response to at least one of the plurality of features;
  
  drawing a real-time metric graph on the display monitor for the first feature while recording the presentation; and
  
  presenting a second interface on the display monitor of the second computer system after recording the presentation to play the recording of the presentation along with a graph illustrating a second feature of the plurality of features on a scale obtained from the predictive model, wherein the second feature is selected based on the type of audience, and wherein the graph moves along with the recording of the presentation.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, further including:
    - providing a biometric device coupled to the second computer system; and
      
      extracting a third feature of the presentation based on a data from the biometric device.
  - 3. The method of claim 1, further including:
    - recording presentations for a plurality of users within an organization; and
      
      presenting a dashboard that lists the plurality of users and a summary of activity of the plurality of users.
  - 4. The method of claim 1, further including determining the amount of eye contact with the simulated audience members versus eye contact with the presentation material by:
    - providing a button to toggle between displaying the simulated audience and displaying the presentation material; and
      
      recording an amount of time that the presentation material is displayed.

5. A method of public speaking training, comprising:
- using a speech analysis engine to extract a plurality of features from a plurality of prerecorded speeches;
  
  providing manual ratings from public speaking experts for an overall quality of each of the plurality of prerecorded speeches;
  
  using a machine learning algorithm to generate a predictive model defining correlations between the plurality of features and the manual ratings, wherein the predictive model includes a plurality of rating scales for the plurality of features, and wherein a first rating scale for a first feature of the plurality of features includes a plurality of thresholds for rating the first feature and a first threshold of the plurality of thresholds is above a minimum and below a maximum of the rating scale;
  
  separately receiving entry of a presentation configuration, an environment configuration, and an audience configuration from a user, wherein the audience configuration includes a type of audience and the type of audience is selected from a list consisting of executives, upper management, technical professionals, and students;
  
  providing a simulated audience member in accordance with the audience configuration;
  
  receiving a presentation by the user after generating the predictive model;
  
  extracting the first feature from the presentation;
  
  analyzing the presentation by comparing the feature against the plurality of thresholds on the first rating scale of the predictive model;
  
  animating the simulated audience member based on a result of analyzing the presentation; and
  
  displaying the presentation and a graph of the feature for review after receiving the presentation, wherein the feature and first threshold are illustrated on the rating scale from the predictive model.
- View Dependent Claims (6, 7, 8, 9, 10, 11)
- - 6. The method of claim 5, further including providing the simulated audience member using a virtual reality headset.
  - 7. The method of claim 5, further including:
    - receiving a presentation material for the presentation;
      
      providing a button to toggle between displaying the simulated audience member and displaying the presentation material; and
      
      recording an amount of time that the presentation material is displayed.
  - 8. The method of claim 5, further including:
    - importing a picture of a person; and
      
      rendering the simulated audience member with a face from the picture.
  - 9. The method of claim 5, further including receiving a room configuration from the user, wherein the room configuration includes a board room or an auditorium.
  - 10. The method of claim 5, wherein animating the simulated audience member includes using an application programming interface to set the simulated audience member to engaged, neutral, or bored, wherein a software engine automatically animates the audience member based on the setting.
  - 11. The method of claim 5, further including receiving an environment configuration from the user, wherein the environment configuration, presentation configuration, and audience configuration are entered by the user separately from each other.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Pitchvantage, LLC
Original Assignee
Pitchvantage, LLC
Inventors
Gupta, Anindya, Makhiboroda, Yegor, Story, Brad H.
Primary Examiner(s)
Saint-Vil, Eddy
Assistant Examiner(s)
Ermlick, William D

Application Number

US14/823,780
Publication Number

US 20160049094A1
Time in Patent Office

1,526 Days
Field of Search

None
US Class Current
CPC Class Codes

G06T 13/205   driven by audio data

G06T 13/40   of characters, e.g. humans,...

G09B 19/04   Speaking with audible prese...

G09B 5/06   with both visual and audibl...

G09B 7/02   of the type wherein the stu...

G09B 9/00   Simulators for teaching or ...

G10L 15/02   Feature extraction for spee...

G10L 15/04   Segmentation; Word boundary...

G10L 15/187   Phonemic context, e.g. pron...

G10L 25/03   characterised by the type o...

G10L 25/27   characterised by the analys...

H04N 5/76   Television signal recording

Public speaking trainer with 3-D simulation and real-time feedback

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Public speaking trainer with 3-D simulation and real-time feedback

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links