Communicating metadata that identifies a current speaker

US 9,704,488 B2
Filed: 03/20/2015
Issued: 07/11/2017
Est. Priority Date: 03/20/2015
Status: Active Grant

First Claim

Patent Images

1. A computer system for communicating metadata that identifies a current speaker, the computer system comprising:

a processor configured to execute computer-executable instructions; and

memory storing one or more computer-executable instructions that, when executed by the processor, perform operations including;

receive audio data that represents speech of the current speaker;

generate an audio fingerprint of the current speaker based on the audio data;

perform automated speaker recognition including comparing the audio fingerprint of the current speaker against one or more stored audio fingerprints contained in a speaker fingerprint repository;

communicate data indicating that the current speaker is unrecognized to a first client device of an observer;

receive tagging information that identifies the current speaker from the first client device of the observer;

store the audio fingerprint of the current speaker and metadata that identifies the current speaker in the speaker fingerprint repository, the metadata being at least partly based on the tagging information;

communicate the metadata that identifies the current speaker to at least one of the first client device of the observer or a second client device of a different observer;

receive a request that identifies a particular speaker from at least one of the first client device of the observer or the second client device of the different observer; and

communicate an alert to at least one of the first client device of the observer or the second client device of the different observer when the particular speaker is currently speaking.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer system may communicate metadata that identifies a current speaker. The computer system may receive audio data that represents speech of the current speaker, generate an audio fingerprint of the current speaker based on the audio data, and perform automated speaker recognition by comparing the audio fingerprint of the current speaker against stored audio fingerprints contained in a speaker fingerprint repository. The computer system may communicate data indicating that the current speaker is unrecognized to a client device of an observer and receive tagging information that identifies the current speaker from the client device of the observer. The computer system may store the audio fingerprint of the current speaker and metadata that identifies the current speaker in the speaker fingerprint repository and communicate the metadata that identifies the current speaker to at least one of the client device of the observer or a client device of a different observer.

Citations

20 Claims

1. A computer system for communicating metadata that identifies a current speaker, the computer system comprising:
- a processor configured to execute computer-executable instructions; and
  
  memory storing one or more computer-executable instructions that, when executed by the processor, perform operations including;
  
  receive audio data that represents speech of the current speaker;
  
  generate an audio fingerprint of the current speaker based on the audio data;
  
  perform automated speaker recognition including comparing the audio fingerprint of the current speaker against one or more stored audio fingerprints contained in a speaker fingerprint repository;
  
  communicate data indicating that the current speaker is unrecognized to a first client device of an observer;
  
  receive tagging information that identifies the current speaker from the first client device of the observer;
  
  store the audio fingerprint of the current speaker and metadata that identifies the current speaker in the speaker fingerprint repository, the metadata being at least partly based on the tagging information;
  
  communicate the metadata that identifies the current speaker to at least one of the first client device of the observer or a second client device of a different observer;
  
  receive a request that identifies a particular speaker from at least one of the first client device of the observer or the second client device of the different observer; and
  
  communicate an alert to at least one of the first client device of the observer or the second client device of the different observer when the particular speaker is currently speaking.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The computer system of claim 1, wherein the memory further stores one or more computer-executable instructions that, when executed by the processor, perform operations including:
    - receive additional tagging information that identifies the current speaker from at least one other client device of at least one other observer; and
      
      resolve a conflict between the tagging information that identifies the current speaker and the additional tagging information that identifies the current speaker by identifying the current speaker based on an identity supplied by a majority of observers.
  - 3. The computer system of claim 1, wherein the memory further stores one or more computer-executable instructions that, when executed by the processor, perform operations including:
    - receive confirmation that the current speaker has been correctly identified.
  - 4. The computer system of claim 1, wherein the memory further stores one or more computer-executable instructions that, when executed by the processor, perform operations including:
    - retrieve additional information for the current speaker from an information source; and
      
      communicate the additional information in the metadata that identifies the current speaker.
  - 5. The computer system of claim 4, wherein the additional information includes one or more of:
    - a company of the current speaker, a department of the current speaker, a job title of the current speaker, or contact information for the current speaker.
  - 6. The computer system of claim 1, wherein the memory further stores one or more computer-executable instructions that, when executed by the processor, perform operations including:
    - generate augmented audio data that includes the audio data that represents speech of the current speaker and the metadata that identifies the current speaker.
  - 7. The computer system of claim 6, wherein the metadata that identifies the current speaker is communicated to the at least one of the first client device of the observer or the second client device of the different observer via the augmented audio data.
  - 8. The computer system of claim 6, wherein the memory further stores one or more computer-executable instructions that, when executed by the processor, perform operations including:
    - store the augmented audio data;
      
      receive a query indicating a recognized speaker;
      
      search the augmented audio data for metadata that identifies the recognized speaker; and
      
      output portions of the augmented audio data that represent speech of the recognized speaker.
  - 9. The computer system of claim 1, wherein the memory further stores one or more computer-executable instructions that, when executed by the processor, perform operations including:
    - generate a transcription of a conversation having multiple speakers, wherein text of speech spoken by a recognized speaker is associated with an identifier for the recognized speaker;
      
      store the transcription;
      
      receive a query indicating the recognized speaker;
      
      search the transcription for the identifier for the recognized speaker; and
      
      output portions of the transcription that include the text of speech spoken by the recognized speaker.
  - 10. The computer system of claim 1, wherein the memory further stores one or more computer-executable instructions that, when executed by the processor, perform operations including:
    - receive subsequent audio data representing speech of the current speaker;
      
      generate a new audio fingerprint of the current speaker based on the subsequent audio data;
      
      perform speaker recognition by comparing the new audio fingerprint of the current speaker against the stored audio fingerprint of the current speaker; and
      
      communicate the metadata that identifies the current speaker to the client device of the observer or the client device of the different observer.
  - 11. The computer system of claim 1, wherein the memory further stores one or more computer-executable instructions that, when executed by the processor, perform operations including:
    - operate on metadata included in augmented audio data that is being communicated in real time to determine that the particular speaker is currently speaking; and
      
      wherein communicate an alert to at least one of the first client device of the observer or the second client device of the different observer when the particular speaker is currently speaking includes transmit data to the at least one of the first client device of the observer or the second client device of the different observer for generating at least one of an audible or visual alert whenever the particular speaker talks.
  - 12. The computer system of claim 1, wherein the memory further stores one or more computer-executable instructions that, when executed by the processor, perform operations including:
    - provide an online meeting for participants;
      
      receive an audio fingerprint of a participant from at least one client device of at least one participant; and
      
      store the audio fingerprint of at least one participant and metadata that identifies at least one participant in the speaker fingerprint repository.
  - 13. The computer system of claim 1, wherein the memory further stores one or more computer-executable instructions that, when executed by the processor, perform operations including:
    - communicate the audio fingerprint of the current speaker to the first client device of the observer.

14. A computer-implemented method for communicating metadata that identifies a current speaker performed by a computer system including one or more computing devices, the computer-implemented method comprising:
- generating an audio fingerprint of the current speaker based on audio data that represents speech of the current speaker;
  
  performing automated speaker recognition based on the audio fingerprint of the current speaker and one or more stored audio fingerprints;
  
  receiving tagging information that identifies the current speaker from a first client device of an observer when the current speaker is unrecognized;
  
  storing the audio fingerprint of the current speaker and metadata that identifies the current speaker, the metadata being at least partly based on the tagging information;
  
  communicating the metadata that identifies the current speaker to at least one of the first client device of the observer or a second client device of a different observer;
  
  receive a request that identifies a particular speaker from the first client device of the observer or the second client device of the different observer; and
  
  communicate an alert to at least one of the first client device of the observer or the second client device of the different observer when the particular speaker is currently speaking.
- View Dependent Claims (15, 16, 17)
- - 15. The computer-implemented method of claim 14, further comprising:
    - communicating data indicating that the current speaker is unrecognized to the client device of the observer.
  - 16. The computer-implemented method of claim 14, further comprising:
    - receiving additional tagging information that identifies the current speaker from at least one other client device of at least one other observer; and
      
      resolving a conflict between the tagging information that identifies the current speaker and the additional tagging information that identifies the current speaker by identifying the current speaker based on an identity supplied by a majority of observers.
  - 17. The computer-implemented method of claim 14, further comprising:
    - generating a new audio fingerprint of the current speaker based on subsequent audio data that represents speech of the current speaker; and
      
      performing speaker recognition based on the new audio fingerprint of the current speaker and the stored audio fingerprint of the current speaker.

18. A computer-readable storage medium storing computer-executable instructions that, when executed by a computing device, cause the computing device to implement:
- a speaker recognition component configured to generate an audio fingerprint of the current speaker based on audio data that represents speech of the current speaker and perform automated speaker recognition by comparing the audio fingerprint of the current speaker against stored audio fingerprints;
  
  a tagging component configured to receive tagging information that identifies the current speaker from a first client device of an observer when the automated speaker recognition is unsuccessful and store the audio fingerprint of the current speaker with the stored audio fingerprints;
  
  an audio data enrichment component configured to communicate metadata that identifies the current speaker to the first client device of the observer or a second client device of a different observer, the metadata being at least partly based on the tagging information; and
  
  an alert component configured to receive a request that identifies a particular speaker from at least one of the first client device of the observer or the second client device of the different observer, and communicate an alert to at least one of the first client device of the observer or the second client device of the different observer when the particular speaker is currently speaking.
- View Dependent Claims (19, 20)
- - 19. The computer-readable storage medium of claim 18, wherein the tagging component is further configured to resolve receive additional tagging information that identifies the current speaker from at least one other client device of at least one other observer, and resolve a conflict between the tagging information that identifies the current speaker and the additional tagging information that identifies the current speaker by identifying the current speaker based on an identity supplied by a majority of observers.
  - 20. The computer-readable storage medium of claim 18, wherein the audio data enrichment component is configured to communicate the audio data that represents speech of the current speaker and the metadata that identifies the current speaker as synchronized streams of audio data and metadata.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Schlesinger, Benny, Kashtan, Guy, Fitoussi, Hen
Primary Examiner(s)
He, Jialong

Application Number

US14/664,047
Publication Number

US 20160275952A1
Time in Patent Office

844 Days
Field of Search
US Class Current
CPC Class Codes

G10L 17/00   Speaker identification or v...

G10L 17/04   Training, enrolment or mode...

G10L 17/22   Interactive procedures; Man...

G10L 19/018   Audio watermarking, i.e. em...

H04M 2201/41   using speaker recognition

H04M 2203/5081   Inform conference party of ...

H04M 2203/6045   Identity confirmation

H04M 3/56   Arrangements for connecting...

H04M 3/563   User guidance or feature se...

H04M 3/569   using the instant speaker's...

Communicating metadata that identifies a current speaker

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Communicating metadata that identifies a current speaker

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links