Extracting and displaying key points of a video conference

US 9,672,829 B2
Filed: 03/23/2015
Issued: 06/06/2017
Est. Priority Date: 03/23/2015
Status: Active Grant

First Claim

Patent Images

1. A method for summarizing speech, the method comprising:

receiving data corresponding to a video conference, including an audio component and a video component;

determining a first participant is speaking based on comparing one or more images of the first participant contained in the video component with one or more template images;

determining a voiceprint of the first participant based on the received audio component, wherein the voiceprint of the first participant includes information detailing one or more unique parameters of a voice waveform of the first participant;

associating the determined voiceprint of the first participant with at least one of the one or more images of the first participant;

determining one or more key points within content spoken by the first participant;

based on detecting the first participant within the video component, overlaying one or more most recent key points of the one or more key points within the video component, wherein the one or more most recent key points are displayed in close spatial proximity to the first participant to indicate an association between the one or more key points and the first participant;

receiving a user input to the overlaid one or more most recent key points; and

based on receiving the user input to the overlaid one or more most recent key points, expanding the overlay to include both the one or more most recent key points and the one or more key points,wherein one or more steps of the above method are performed using one or more computers.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments of the present invention disclose a method, system, and computer program product for speech summarization. A computer receives audio and video components from a video conference. The computer determines which participant is speaking based on comparing images of the participants with template images of speaking and non-speaking faces. The computer determines the voiceprint of the speaking participant by applying a Hidden Markov Model to a brief recording of the voice waveform of the participant and associates the determined voiceprint with the face of the speaking participant. The computer recognizes and transcribes the content of statements made by the speaker, determines the key points, and displays them over the face of the participant in the video conference.

Citations

9 Claims

1. A method for summarizing speech, the method comprising:
- receiving data corresponding to a video conference, including an audio component and a video component;
  
  determining a first participant is speaking based on comparing one or more images of the first participant contained in the video component with one or more template images;
  
  determining a voiceprint of the first participant based on the received audio component, wherein the voiceprint of the first participant includes information detailing one or more unique parameters of a voice waveform of the first participant;
  
  associating the determined voiceprint of the first participant with at least one of the one or more images of the first participant;
  
  determining one or more key points within content spoken by the first participant;
  
  based on detecting the first participant within the video component, overlaying one or more most recent key points of the one or more key points within the video component, wherein the one or more most recent key points are displayed in close spatial proximity to the first participant to indicate an association between the one or more key points and the first participant;
  
  receiving a user input to the overlaid one or more most recent key points; and
  
  based on receiving the user input to the overlaid one or more most recent key points, expanding the overlay to include both the one or more most recent key points and the one or more key points,wherein one or more steps of the above method are performed using one or more computers.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, wherein determining one or more key points within content spoken by the first participant further comprises:
    - detecting a change in a speaking speed of the first participant based on a deviation from an average spoken words per second of the first participant.
  - 3. The method of claim 1, wherein determining one or more key points within content spoken by the first participant further comprises:
    - detecting one or more terms used frequently.

4. A computer program product for a speech summarization system, the computer program product comprising:
- one or more computer-readable storage media and program instructions stored on the one or more computer-readable storage media, the program instructions comprising;
  
  program instructions to receive data corresponding to a video conference, including an audio component and a video component;
  
  program instructions to determine a first participant is speaking based on comparing one or more images of the first participant contained in the video component with one or more template images;
  
  program instructions to determine a voiceprint of the first participant based on the received audio component, wherein the voiceprint of the first participant includes information detailing one or more unique parameters of a voice waveform of the first participant;
  
  program instructions to associate the determined voiceprint of the first participant with at least one of the one or more images of the first participant;
  
  program instructions to determine one or more key points within content spoken by the first participant;
  
  based on detecting the first participant within the video component, program instructions to overlay one or more most recent key points of the one or more key points within the video component, wherein the one or more most recent key points are displayed in close spatial proximity to the first participant to indicate an association between the one or more key points and the first participant;
  
  program instructions to receive a user input to the overlaid one or more most recent key points; and
  
  based on receiving the user input to the overlaid one or more most recent key points, program instructions to expand the overlay to include both the one or more most recent key points and the one or more key points.
- View Dependent Claims (5, 6)
- - 5. The computer program product of claim 4, wherein determining one or more key points within the content spoken by the first participant further comprises:
    - program instructions to detect a change in a speaking speed of the first participant based on a deviation from an average spoken words per second of the first participant.
  - 6. The computer program product of claim 4, wherein determining one or more key points within the content spoken by the first participant further comprises:
    - detecting one or more terms used frequently.

7. A computer system for a speech summarization system, the computer system comprising:
- one or more computer processors, one or more computer-readable storage media, and program instructions stored on one or more of the computer-readable storage media for execution by at least one of the one or more processors, the program instructions comprising;
  
  program instructions to receive data corresponding to a video conference, including an audio component and a video component;
  
  program instructions to determine a first participant is speaking based on comparing one or more images of the first participant contained in the video component with one or more template images;
  
  program instructions to determine a voiceprint of the first participant based on the received audio component, wherein the voiceprint of the first participant includes information detailing one or more unique parameters of a voice waveform of the first participant;
  
  program instructions to associate the determined voiceprint of the first participant with at least one of the one or more images of the first participant;
  
  program instructions to determine one or more key points within content spoken by the first participant;
  
  based on detecting the first participant within the video component, program instructions to overlay one or more most recent key points of the one or more key points within the video component, wherein the one or more most recent key points are displayed in close spatial proximity to the first participant to indicate an association between the one or more key points and the first participant;
  
  program instructions to receive a user input to the overlaid one or more most recent key points; and
  
  based on receiving the user input to the overlaid one or more most recent key points, program instructions to expand the overlay to include both the one or more most recent key points and the one or more key points.
- View Dependent Claims (8, 9)
- - 8. The computer system of claim 7, wherein determining one or more key points within the content spoken by the first participant further comprises:
    - program instructions to detect a change in a speaking speed of the first participant based on a deviation from an average spoken words per second of the first participant.
  - 9. The computer system of claim 7, wherein determining one or more key points within the content spoken by the first participant further comprises:
    - detecting one or more terms used frequently.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Chen, Ye Q., Nie, Wen J., Wu, Ting, Yang, Zhao
Primary Examiner(s)
He, Jialong

Application Number

US14/665,592
Publication Number

US 20160284354A1
Time in Patent Office

806 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/26   Speech to text systems G10L...

G10L 17/00   Speaker identification or v...

G10L 17/02   Preprocessing operations, e...

G10L 21/10   Transforming into visible i...

G10L 25/57   for processing of video sig...

G10L 25/87   Detection of discrete point...

H04L 12/1827   Network arrangements for co...

H04L 12/1831   Tracking arrangements for l...

H04N 7/147   Communication arrangements,...

H04N 7/15   Conference systems

Extracting and displaying key points of a video conference

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Extracting and displaying key points of a video conference

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links