Speech recognition and summarization

US 8,612,211 B1
Filed: 01/17/2013
Issued: 12/17/2013
Est. Priority Date: 09/10/2012
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

in one or more processing devices, executing instructions to perform operations comprising;

receiving two or more data sets, each data set representing speech of a corresponding individual attending a social networking video conference session;

decoding the received data sets to produce corresponding text for each individual attending the social networking video conference session;

detecting one or more topics of the social networking video conference session from a transcript produced from the text for each individual attending the social networking video conference session; and

providing, to one or more of the attending individuals of the social networking video conference session, context relating to the one or more topics of the social networking video conference session detected from the transcript produced from the text for each individual attending the social networking video conference session.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The subject matter of this specification can be embodied in, among other things, a method that includes receiving two or more data sets each representing speech of a corresponding individual attending an internet-based social networking video conference session, decoding the received data sets to produce corresponding text for each individual attending the internet-based social networking video conference, and detecting characteristics of the session from a coalesced transcript produced from the decoded text of the attending individuals for providing context to the internet-based social networking video conference session.

Citations

20 Claims

1. A method comprising:
- in one or more processing devices, executing instructions to perform operations comprising;
  
  receiving two or more data sets, each data set representing speech of a corresponding individual attending a social networking video conference session;
  
  decoding the received data sets to produce corresponding text for each individual attending the social networking video conference session;
  
  detecting one or more topics of the social networking video conference session from a transcript produced from the text for each individual attending the social networking video conference session; and
  
  providing, to one or more of the attending individuals of the social networking video conference session, context relating to the one or more topics of the social networking video conference session detected from the transcript produced from the text for each individual attending the social networking video conference session.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein detecting one or more topics of the social networking video conference session from a transcript produced from the text for each individual attending the social networking video conference session further comprises:
    - identifying one or more keywords from the transcript produced from the text; and
      
      detecting one or more topics of the social networking video conference session based on the one or more keywords identified from the transcript produced from the text.
  - 3. The method of claim 1, wherein detecting one or more topics of the social networking video conference session from the transcript produced from the text for each individual attending the social networking video conference session includes at least one of detecting a temporal length of the social networking video conference session and detecting a repetitive use of one or more words.
  - 4. The method of claim 1, further comprising:
    - detecting characteristics of the social networking video conference session from the two or more received data sets; and
      
      detecting one or more topics of the social networking video conference session based on the transcript produced from the text for each individual attending the social networking video conference session and the detected characteristics.
  - 5. The method of claim 4, wherein detecting characteristics of the social networking video conference session includes monitoring at least one of the volume of the speech represented in the two or more received data sets and the presented speed of the speech represented in the two or more received data sets.
  - 6. The method of claim 1, further comprising:
    - detecting characteristics of the social networking video conference session attending individuals from other corresponding data sets; and
      
      detecting one or more topics of the social networking video conference session based on the transcript produced from the text for each individual attending the social networking video conference session and the detected characteristics.
  - 7. The method of claim 6, wherein detecting characteristics of the social networking video conference session attending individuals from the other corresponding data sets includes detecting physical features of the attending individuals.

8. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving two or more data sets, each data set representing speech of a corresponding individual attending a social networking video conference session;
  
  decoding the received data sets to produce corresponding text for each individual attending the social networking video conference session;
  
  detecting one or more topics of the social networking video conference session from a transcript produced from the text for each individual attending the social networking video conference session; and
  
  providing, to one or more of the attending individuals of the social networking video conference session, context relating to the one or more topics of the social networking video conference session detected from the transcript produced from the text for each individual attending the social networking video conference session.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein detecting one or more topics of the social networking video conference session from a transcript produced from the text for each individual attending the social networking video conference session further comprises:
    - identifying one or more keywords from the transcript produced from the text; and
      
      detecting one or more topics of the social networking video conference session based on the one or more keywords identified from the transcript produced from the text.
  - 10. The system of claim 8, wherein detecting one or more topics of the social networking video conference session from the transcript produced from the text for each individual attending the social networking video conference session includes at least one of detecting a temporal length of the social networking video conference session and detecting a repetitive use of one or more words.
  - 11. The system of claim 8, the operations further comprising:
    - detecting characteristics of the social networking video conference session from the two or more received data sets; and
      
      detecting one or more topics of the social networking video conference session based on the transcript produced from the text for each individual attending the social networking video conference session and the detected characteristics.
  - 12. The system of claim 11, wherein detecting characteristics of the social networking video conference session includes monitoring at least one of the volume of the speech represented in the two or more received data sets and the presented speed of the speech represented in the two or more received data sets.
  - 13. The system of claim 8, the operations further comprising:
    - detecting characteristics of the social networking video conference session attending individuals from other corresponding data sets; and
      
      detecting one or more topics of the social networking video conference session based on the transcript produced from the text for each individual attending the social networking video conference session and the detected characteristics.
  - 14. The system of claim 13, wherein detecting characteristics of the social networking video conference session attending individuals from the other corresponding data sets includes detecting physical features of the attending individuals.

15. One or more non-transitory machine-readable media storing instructions that are executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving two or more data sets, each data set representing speech of a corresponding individual attending a social networking video conference session;
  
  decoding the received data sets to produce corresponding text for each individual attending the social networking video conference session;
  
  detecting one or more topics of the social networking video conference session from a transcript produced from the text for each individual attending the social networking video conference session; and
  
  providing, to one or more of the attending individuals of the social networking video conference session, context relating to the one or more topics of the social networking video conference session detected from the transcript produced from the text for each individual attending the social networking video conference session.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The machine-readable media of claim 15, wherein detecting one or more topics of the social networking video conference session from a transcript produced from the text for each individual attending the social networking video conference session further comprises:
    - identifying one or more keywords from the transcript produced from the text; and
      
      detecting one or more topics of the social networking video conference session based on the one or more keywords identified from the transcript produced from the text.
  - 17. The machine-readable media of claim 15, wherein detecting one or more topics of the social networking video conference session from the transcript produced from the text for each individual attending the social networking video conference session includes at least one of detecting a temporal length of the social networking video conference session and detecting a repetitive use of one or more words.
  - 18. The machine-readable media of claim 15, the operations further comprising:
    - detecting characteristics of the social networking video conference session attending individuals from other corresponding data sets; and
      
      detecting one or more topics of the social networking video conference session based on the transcript produced from the text for each individual attending the social networking video conference session and the detected characteristics.
  - 19. The machine-readable media of claim 15, the operations further comprising:
    - detecting characteristics of the social networking video conference session from the two or more received data sets; and
      
      detecting one or more topics of the social networking video conference session based on the transcript produced from the text for each individual attending the social networking video conference session and the detected characteristics.
  - 20. The machine-readable media of claim 19, wherein detecting characteristics of the social networking video conference session includes monitoring at least one of the volume of the speech represented in the two or more received data sets and the presented speed of the speech represented in the two or more received data sets.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Shires, Glen, Swigart, Sterling, Zolla, Jonathan, Gauci, Jason J.
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US13/743,838
Time in Patent Office

334 Days
Field of Search

704/235, 704/246, 704/270, 704/270.1, 704/275, 704/9, 704/10, 704/243
US Class Current

704/9
CPC Class Codes

G06F 40/279   Recognition of textual enti...

G06F 40/30   Semantic analysis

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/1822   Parsing for meaning underst...

G10L 15/26   Speech to text systems G10L...

G10L 21/10   Transforming into visible i...

H04N 7/15   Conference systems

Speech recognition and summarization

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition and summarization

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links