Detecting an answering machine using speech recognition
First Claim
1. A computer-implemented method of ascertaining whether a call recipient is an actual person or an answering machine, the method comprising, with a processor:
- receiving an audible response from a call recipient and processing the audible response with a speech recognizer having a language model to convert the audible response to an output indicative of recognized speech in a textual form; and
processing the output indicative of recognized speech in the textual form with a statistical classifier trained on word phrases commonly used by actual persons and on word phrases commonly used by automated systems along with ascertaining non-word features associated with the audible response to provide an output indicative of whether the call recipient is an actual person or an answering machine, wherein said classifier is separate from said language model, said processing being based on a statistical analysis of the output indicative of recognized speech in the textual form along with the non-word features, the statistical analysis examining the content of the output indicative of recognized speech and based on that examination determining whether the output indicative of recognized speech is more statistically consistent with the word phrases commonly used by actual persons or with the word phrases commonly used by automated systems.
2 Assignments
0 Petitions
Accused Products
Abstract
An answering machine detection module is used to determine whether a call recipient is an actual person or an answering machine. The answering machine detection module includes a speech recognizer and a call analysis module. The speech recognizer receives an audible response of the call recipient to a call. The speech recognizer processes the audible response and provides an output indicative of recognized speech. The call analysis module processes the output of the speech recognizer to generate an output indicative of whether the call recipient is a person or an answering machine.
35 Citations
20 Claims
-
1. A computer-implemented method of ascertaining whether a call recipient is an actual person or an answering machine, the method comprising, with a processor:
-
receiving an audible response from a call recipient and processing the audible response with a speech recognizer having a language model to convert the audible response to an output indicative of recognized speech in a textual form; and processing the output indicative of recognized speech in the textual form with a statistical classifier trained on word phrases commonly used by actual persons and on word phrases commonly used by automated systems along with ascertaining non-word features associated with the audible response to provide an output indicative of whether the call recipient is an actual person or an answering machine, wherein said classifier is separate from said language model, said processing being based on a statistical analysis of the output indicative of recognized speech in the textual form along with the non-word features, the statistical analysis examining the content of the output indicative of recognized speech and based on that examination determining whether the output indicative of recognized speech is more statistically consistent with the word phrases commonly used by actual persons or with the word phrases commonly used by automated systems. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer readable storage medium being a hardware computer storage medium and having instructions which when implemented by a computer ascertain whether a call recipient is an actual person or an answering machine, the instructions comprising:
-
receiving an audible response from a call recipient; accessing a language model and using speech recognition to convert the audible response to an output indicative of recognized word phrases in textual form; and processing the output indicative of recognized word phrases in textual form to provide an output indicative of whether the call recipient is an actual person or an answering machine, said processing being based on statistical analysis of the word phrases used by the call recipient in the audible response and apart from the language model, wherein each of said word phrases comprises a plurality of words, and wherein the statistical analysis includes calculating a confidence level that the output indicative of recognized word phrases corresponds to an actual person and calculating a confidence level that the output indicative of recognized word phrases corresponds to an answering machine, the output indicative of whether the call recipient is an actual person or an answering machine being based at least in part on the calculated confidence levels. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer-implemented method of leaving a message for a call recipient on an answering machine, the method comprising:
-
detecting whether a call recipient is an actual person or the answering machine wherein detecting comprises receiving an audible response from the call recipient and processing the audible response with a processor operating as a speech recognizer having access to a language model to provide an output indicative of recognized speech; processing the output indicative of recognized speech to provide an output indicative of whether the call recipient is an actual person or the answering machine, said processing using a statistical classifier trained on word phrases commonly used by actual persons and on word phrases commonly used by automated systems along with ascertaining non-word features associated with the audible response to provide the output indicative of whether the call recipient is an actual person or an answering machine, wherein said classifier is separate from said language model, said processing being based on a statistical analysis of the output indicative of recognized speech in the textual form along with the non-word features, the statistical analysis examining the content of the output indicative of recognized speech and based on that examination determining whether the output indicative of recognized speech is more statistically consistent with the word phrases commonly used by actual persons or with the word phrases commonly used by automated systems; and if the call recipient is the answering machine, then operating the speech recognizer to detect barge-in events by the answering machine, wherein a barge-in event is detection of a portion of a greeting from the answering machine following a period of silence; detecting multiple barge-in events from the answering machine within a single call session; and repeatedly restarting a message and playing the message from the beginning to the answering machine upon detection of each of the multiple barge-in events until the message is played in its entirety.
-
Specification