Speech recognition using a personal vocabulary and language model

US 8,532,994 B2
Filed: 08/27/2010
Issued: 09/10/2013
Est. Priority Date: 08/27/2010
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

monitoring network traffic from a plurality of users including a first user and a second user;

extracting words from the network traffic;

building a personal vocabulary for at least the second user from the words;

identifying a connection between the first user and the second user, wherein the connection is created from a trigger that includes an email including one or more subject matter keywords and the first and the second user as one or more of a recipient of the email, a sender of the email, or a part of text in the email;

receiving audio of the first user originating from audio content that does not involve the second user; and

converting the audio of the first user into text using a language model based at least partially on the personal vocabulary of the second user and the connection between the first user and the second user, where the audio of the first user includes at least part of the one or more subject matter keywords.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one implementation, speech or audio is converted to a searchable format by a speech recognition system. The speech recognition system uses a language model including probabilities of certain words occurring, which may depend on the occurrence of other words or sequences of words. The language model is partially built from personal vocabularies. Personal vocabularies are determined by known text from network traffic, including emails and Internet postings. The speech recognition system may incorporate the personal vocabulary of one user into the language model of another user based on a connection between the two users. The connection may be triggered by an email, a phone call, or an interaction in a social networking service. The speech recognition system may remove or add personal vocabularies to the language model based on a calculated confidence score from the resulting language model.

Citations

20 Claims

1. A method comprising:
- monitoring network traffic from a plurality of users including a first user and a second user;
  
  extracting words from the network traffic;
  
  building a personal vocabulary for at least the second user from the words;
  
  identifying a connection between the first user and the second user, wherein the connection is created from a trigger that includes an email including one or more subject matter keywords and the first and the second user as one or more of a recipient of the email, a sender of the email, or a part of text in the email;
  
  receiving audio of the first user originating from audio content that does not involve the second user; and
  
  converting the audio of the first user into text using a language model based at least partially on the personal vocabulary of the second user and the connection between the first user and the second user, where the audio of the first user includes at least part of the one or more subject matter keywords.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, comprising:
    - building a personal vocabulary for at least the first user from the words, wherein the language model is further based at least partially on the personal vocabulary of the first user.
  - 3. The method of claim 1, wherein the connection is defined by interaction between the first user and the second user in a social networking service.
  - 4. The method of claim 1, wherein the connection is defined by a voice over internet protocol (VoIP) phone call between the first user and the second user.
  - 5. The method of claim 1, wherein the audio of the first user originates with an uploaded video, a teleconference, or a videoconference.
  - 6. The method of claim 1, further comprising:
    - saving the text in a searchable database.
  - 7. The method of claim 1, further comprising:
    - developing a folksonomy system based on the text.
  - 8. The method of claim 1, where the audio content includes audio from a teleconference or a videoconference.

9. An apparatus comprising:
- a collector interface configured to monitor network traffic from a plurality of users including a first user and a second user and extract n-grams from the network traffic;
  
  a memory configured to store a personal vocabulary for at least the second user from the n-grams; and
  
  a controller configured to;
  
  identify a connection between the first user and the second user, wherein the connection is created from a trigger that includes a message including one or more subject matter keywords and the first and the second user as one or more of a recipient of the message, a sender of the message, or a part of a body or a header in the message;
  
  receive audio of the first user originating from audio content that does not involve the second user; and
  
  convert the audio of the first user into text using a language model based at least partially on the personal vocabulary of the second user and the connection between the first user and the second user, where the audio of the first user includes at least part of the one or more subject matter keywords.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The apparatus of claim 9, wherein the memory is further configured to store a personal vocabulary for at least the first user from the n-grams, wherein the language model is further based at least partially on the personal vocabulary of the first user.
  - 11. The apparatus of claim 9, wherein the audio of the first user originates with an uploaded video, a teleconference, or a videoconference.
  - 12. The apparatus of claim 9, further comprising:
    - a database configured to store the text in a searchable format.
  - 13. The apparatus of claim 9, wherein the n-grams are a sequence having n words, syllables, phonemes, or phones, wherein n is configurable as an integer.
  - 14. The apparatus of claim 9, further comprising a searchable database configured to store the text.
  - 15. The apparatus of claim 9, where the audio content includes audio from a teleconference or a videoconference.

16. Logic encoded in one or more non-transitory tangible media, the logic executable by a processor and operable to:
- monitor network traffic from a plurality of users including a first user and a second user;
  
  extract words from the network traffic;
  
  build a personal vocabulary from the words for each of the plurality of users;
  
  identify a connection between the first user and the second user, wherein the connection is created from a trigger that includes an email, call, or social media interaction including one or more subject matter keywords and the first and the second user as one or more of a recipient, a sender, or a part of content of the email, call, or social media interaction;
  
  receive audio of the first user originating from audio content that does not involve the second user; and
  
  convert the audio of the first user into text using a language model based on the personal vocabulary of the second user and the connection between the first user and the second user, where the audio of the first user includes at least part of the one or more subject matter keywords.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The logic of claim 16, further operable to:
    - identify a connection between the first user and the second user, wherein the language model is defined by the connection.
  - 18. The logic of claim 17, wherein the connection is defined by an email, a social networking service, or a voice over internet protocol (VoIP) phone call.
  - 19. The logic of claim 16, further operable to:
    - store and retrieve the text to and from a searchable database.
  - 20. The logic of claim 16, where the audio content includes audio from a teleconference or a videoconference.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Original Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Inventors
Malegaonkar, Ashutosh A., Kumar, Gannu Satish, Jouret, Guido K. M.
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US12/870,480
Publication Number

US 20120053935A1
Time in Patent Office

1,110 Days
Field of Search

704/9, 704231-257, 704/270, 704/270.1
US Class Current

704/257
CPC Class Codes

G10L 15/07   to the speaker

G10L 15/183   using context dependencies,...

G10L 15/30   Distributed recognition, e....

H04M 3/42221   Conversation recording syst...

H04M 3/56   Arrangements for connecting...

Speech recognition using a personal vocabulary and language model

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition using a personal vocabulary and language model

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links