REAL-TIME SPEAKER STATE ANALYTICS PLATFORM

US 20170084295A1
Filed: 06/10/2016
Published: 03/23/2017
Est. Priority Date: 09/18/2015
Status: Active Grant

First Claim

Patent Images

1. A speech analytics platform implemented in one or more computing devices, for providing speech-derived speaker state data as a service, the platform comprising:

a speech data processing subsystem embodied in one or more non-transitory machine accessible storage media, the speech data processing subsystem configured to produce speech data corresponding to audio input captured from a human or synthetic speaker, the produced speech data being dynamically segmented for real-time speech-based speaker state determination; and

a plurality of analytics engines embodied in one or more non-transitory machine accessible storage media, wherein each of the plurality of analytics engines is configured to receive the pre-processed speech data from the speech data processing subsystem and provide as output a speaker state indicator, the plurality of analytics engines comprising;

an automatic speech recognition module configured to perform a speech recognition operation on the speech data; and

a plurality of algorithms each configured for a different type of speaker state analytics, at least one of the algorithms extracting at least one non-word feature of the speech data and outputting speaker state data relating to the type of speaker state analytics for which the at least one algorithm has been configured.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are machine learning-based technologies that analyze an audio input and provide speaker state predictions in response to the audio input. The speaker state predictions can be selected and customized for each of a variety of different applications.

Citations

22 Claims

1. A speech analytics platform implemented in one or more computing devices, for providing speech-derived speaker state data as a service, the platform comprising:
- a speech data processing subsystem embodied in one or more non-transitory machine accessible storage media, the speech data processing subsystem configured to produce speech data corresponding to audio input captured from a human or synthetic speaker, the produced speech data being dynamically segmented for real-time speech-based speaker state determination; and
  
  a plurality of analytics engines embodied in one or more non-transitory machine accessible storage media, wherein each of the plurality of analytics engines is configured to receive the pre-processed speech data from the speech data processing subsystem and provide as output a speaker state indicator, the plurality of analytics engines comprising;
  
  an automatic speech recognition module configured to perform a speech recognition operation on the speech data; and
  
  a plurality of algorithms each configured for a different type of speaker state analytics, at least one of the algorithms extracting at least one non-word feature of the speech data and outputting speaker state data relating to the type of speaker state analytics for which the at least one algorithm has been configured.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The platform of claim 1, wherein the speech data processing subsystem is configured to perform at least one of:
    - speaker identification and speaker verification.
  - 3. The platform of claim 1, wherein at least one of the plurality of analytics engines is further configured to extract one or more speech features from the speech data based at least in part on a criterion specified by end user software.
  - 4. The platform of claim 1, wherein at least one of the plurality of analytics engines is configured to extract from the speech data lexical content, non-lexical acoustic features, or a combination of lexical content and non-lexical acoustic features.
  - 5. The platform of claim 1, wherein the extracted speech features comprise one or more of:
    - low-level features, static features, calculated features, dynamic features, derived features, and relative features.
  - 6. The platform of claim 1, wherein at least one of the plurality of analytics engines is further configured to use output of the automatic speech recognition module to determine one or more non-lexical speech features to be analyzed.
  - 7. The platform of claim 1, wherein at least one of the plurality of analytics engines is further configured to provide the speaker state data as input to the speech recognition operation.
  - 8. The platform of claim 1, wherein each of the plurality of analytics engines further comprises a synchronization mechanism to visually align the speaker state data with output corresponding to the audio signal input.
  - 9. The platform of claim 8, wherein the indicator output and speech signal output are visually presented to the speaker in real-time during a speech session by the speaker.
  - 10. The platform of claim 1, wherein the speech data processing subsystem is configured to capture audio input over a fixed window size.
  - 11. The platform of claim 1, wherein the platform is configured to identify a segment of the audio input spoken by a user calling a call center and to identify a segment of the audio input spoken by an agent of the call center with whom the user is speaking.
  - 12. The platform of claim 11, wherein the one or more algorithms are configured to determine a speaker state score for the user based on the segment of the audio input spoken by the user and to determine a speaker state score for the agent based on the segment of the audio input spoken by the agent.

13. A system configured to provide output comprising speech-derived speaker state information, the system configured to:
- capture a speech signal from a speaker;
  
  convert the speech signal to a predetermined format configured to facilitate an interaction-time analysis of non-lexical features extracted from the speech signal by dynamically segmenting the speech signal using speech activity detection;
  
  select a plurality of analytics engines based at least partly on an application specification; and
  
  operate the selected analytics engines to, in an interaction time;
  
  extract the non-lexical features from the speech signal, analyze the non-lexical features, and, based on the analyzing of the non-lexical features, provide as output a plurality of different speaker state indicators.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 14. The system of claim 13, further configured to perform speaker diarization based on the segmented speech signal in the interactive time.
  - 15. The system of claim 13, further comprising a display device configured to provide as output a temporal stream of one or more of the speaker state indicators in the interaction time.
  - 16. The system of claim 15, wherein the display device is configured to provide the one or more speaker state indicators as feedback to the speaker during a current speech session by the speaker.
  - 17. The system of claim 13, wherein at least one of the plurality of analytics engines is further configured to compare the one or more speaker state indicators to one or more speaker state indicators corresponding to a prior speaking event for the speaker.
  - 18. The system of claim 13, wherein at least one of the plurality of analytics engines is configured to output data indicative of a comparison of the one or more speaker state indicators to a reference model.
  - 19. The system of claim 18, wherein at least one of the plurality of analytics engines is configured to indicate one or more directional changes in a speaker state indicator based at least in part on the reference model.
  - 20. The system of claim 13, further comprising a model training subsystem configured to, by a combination of interacting with a user through a user interface and automated processing by the system, generate a trained model for use by at least one of the analytics engines.
  - 21. The system of claim 13, wherein the system is configured to parallelize the plurality of analytics engines.
  - 22. The system of claim 13, wherein the system is configured to select a speech capture channel and perform the dynamically segmenting of the speech signal based on the selected speech capture channel.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SRI International, Inc.
Original Assignee
SRI International, Inc.
Inventors
Tsiartas, Andreas, Shriberg, Elizabeth, Albright, Cory, Frandsen, Michael W.

Granted Patent

US 10,706,873 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/1822   Parsing for meaning underst...

G10L 15/26   Speech to text systems G10L...

G10L 15/32   Multiple recognisers used i...

G10L 17/00   Speaker identification or v...

G10L 17/02   Preprocessing operations, e...

G10L 17/08   Use of distortion metrics o...

G10L 25/63   for estimating an emotional...

REAL-TIME SPEAKER STATE ANALYTICS PLATFORM

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

REAL-TIME SPEAKER STATE ANALYTICS PLATFORM

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links