Web-based speech recognition with scripting and semantic objects

US 8,024,422 B2
Filed: 04/03/2008
Issued: 09/20/2011
Est. Priority Date: 03/24/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A speech application system, comprising:

A. a speech recognition (SR) system including a server and configured to receive an audio input and generate a context free semantic object representing a plurality of valid interpretations of said audio input;

B. a set of speech application scripts, loaded at the SR system and configured to control said SR system, said set of application scripts defining a context;

C. a semantic data evaluator, configured to receive said context free semantic object and said context and, as a function thereof, to generate a linguistic result corresponding to said audio input, and to return said linguistic result to said set of application scripts; and

D. a set of reusable object oriented interfaces local to the SR system, said interfaces configured to interface said one or more of said set of application scripts with said SR system.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention is a system and method for creating and implementing transactional speech applications (SAs) using Web technologies, without reliance on server-side standard or custom services. A transactional speech application may be any application that requires interpretation of speech in conjunction with a speech recognition (SR) system, such as, for example, consumer survey systems. A speech application in accordance with the present invention is represented within a Web page, as an application script that interprets semantic objects according to a context. Any commonly known scripting language can be used to write the application script, such as JavaScript (or ECMAScript), PerlScript, and VBscript. The present invention is “Web-based” to the extent that it implements Web technologies, but it need not include or access the World Wide Web.

42 Citations

View as Search Results

17 Claims

1. A speech application system, comprising:
- A. a speech recognition (SR) system including a server and configured to receive an audio input and generate a context free semantic object representing a plurality of valid interpretations of said audio input;
  
  B. a set of speech application scripts, loaded at the SR system and configured to control said SR system, said set of application scripts defining a context;
  
  C. a semantic data evaluator, configured to receive said context free semantic object and said context and, as a function thereof, to generate a linguistic result corresponding to said audio input, and to return said linguistic result to said set of application scripts; and
  
  D. a set of reusable object oriented interfaces local to the SR system, said interfaces configured to interface said one or more of said set of application scripts with said SR system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, wherein one or more of said set of application scripts is implemented on an extended HTML Web page.
  - 3. The system of claim 1, wherein one or more of said interfaces are objects exposed via scripting facilities.
  - 4. The system of claim 1, wherein said application script includes programming code written in a language chosen from a group of scripting languages comprising:
    - (1) JavaScript;
      
      (2) PerlScript;
      
      (3) VBscript; and
      
      (4) ECMAScript.
  - 5. The system of claim 1, wherein said context free semantic object is represented as a semantic tree instance.
  - 6. The system of claim 1, wherein said context free semantic object is represented in a semantic object.
  - 7. The system of claim 1, wherein said audio input is received from a device chosen from a group comprising:
    - A. a telephone;
      
      B. a cellular telephone;
      
      C. a personal computer;
      
      D. an application server;
      
      E. an audio receiver; and
      
      F. VolP client.
  - 8. The system of claim 1, wherein said audio input is received via a network comprised of one or more wire or wireless networks from a group comprising:
    - A. a telephone network;
      
      B. a cellular telephone network;
      
      C. a LAN;
      
      D. a WAN;
      
      E. a virtual private network;
      
      F. the Internet; and
      
      G. the Web.
  - 9. The system of claim 1, wherein said plurality of valid interpretations of said audio input includes all valid interpretations of said audio input within said context.
  - 10. The system of claim 1, wherein speech application is chosen from a group of interactive speech applications comprising:
    - A. consumer survey applications;
      
      B. Web access applications;
      
      C. educational applications, including health education applications and computer-based lesson applications and testing applications;
      
      D. screening applications, including patient screening applications and consumer screening applications;
      
      E. health risk assessment applications;
      
      F. monitoring applications, including heath data monitoring applications and consumer preference monitoring applications;
      
      G. compliance applications, including applications that generate notifications of compliance related activities, including notifications regarding health or product maintenance;
      
      H. test results applications, including applications that provide at least one of lab test results, standardized tests results, consumer product test results, and maintenance results; and
      
      I. linking applications, including applications that link two or more of the applications in parts A through H.

11. A speech application system comprising:
- first computer and second computer;
  
  A. a speech recognition (SR) system hosted on said first computer and configured to receive an audio input from an input device and to generate one or more context free semantic objects representing a plurality of valid interpretations of said audio input;
  
  B. a Web page loaded on said first computer, from said second computer, said Web page including an application script comprising a set of speech application functionality and configured to interact with said input device via said SR system, wherein said speech application is configured to conduct speech application sessions without accessing said second computer, and wherein the application script is configured to control the SR system;
  
  C. a set of reusable object oriented interfaces local to said first computer, said interfaces including;
  
  (1) one or more interface objects configured to facilitate access by said application script to standard services of said first computer; and
  
  (2) an interface configured to facilitate access to and control of said SR system by said application script; and
  
  D. a semantic object evaluator, configured to generate from said context free semantic objects, as a function of said context, a single interpretation of said audio input and to return said single interpretation to said application script.
- View Dependent Claims (12, 13)
- - 12. The system of claim 11, wherein speech application is chosen from a group of interactive speech applications comprising:
    - A. consumer survey applications;
      
      B. Web access applications;
      
      C. educational applications, including health education applications and computer-based lesson applications and testing applications;
      
      D. screening applications, including patient screening applications and consumer screening applications;
      
      E. health risk assessment applications;
      
      F. monitoring applications, including heath data monitoring applications and consumer preference monitoring applications;
      
      G. compliance applications, including applications that generate notifications of compliance related activities, including notifications regarding health or product maintenance;
      
      H. test results applications, including applications that provide at least one of lab test results, standardized tests results, consumer product test results, and maintenance results; and
      
      I. linking applications, including applications that link two or more of the applications in parts A through H.
  - 13. The system of claim 11, wherein said set of reusable objet oriented interfaces and said semantic object evaluator are objects exposed via ActiveX facilities.

14. A non-transitory machine-readable storage medium;
- and executable program instructions embodied in the machine readable storage medium that when executed by a processor of a programmable computing device configures the programmable computing device to execute a speech application script included within a Web page, and configured to interact with a SR system hosted on a first computer and configured to receive an audio input and to generate one or more semantic objects representing a plurality of valid interpretations of said audio input, said first computer also including a plurality of interfaces objects and a semantic object evaluator configured to generate from said one or more semantic objects a single interpretation of said audio input as a function of a context, said speech application script comprising;
  
  A. a context definition;
  
  B. a link to said semantic object evaluator;
  
  C. a link to said SR system, via an interface object, from said plurality of interface objects;
  
  D. a set of control functionality comprising;
  
  (1) a session manager configured to generate user prompts and to determine a next action as a function of said single interpretation;
  
  (2) a SR system controller, configured to control said SR system; and
  
  (3) a communication manager, configured to manage interaction with said input device via said SR system,wherein said speech application script is loaded on said first computer from a second computer and said speech application is configured to conduct speech application sessions without accessing said second computer.
- View Dependent Claims (15, 16)
- - 15. The non-transitory machine-readable storage medium of claim 14, wherein said interface objects are objects exposed via scripting facilities.
  - 16. The non-transitory machine-readable storage medium of claim 14, wherein said speech application script is a speech application chosen from a group of interactive speech applications comprising:
    - A. consumer survey applications;
      
      B. Web access applications;
      
      C. educational applications, including health education applications and computer-based lesson applications and testing applications;
      
      D. screening applications, including patient screening applications and consumer screening applications;
      
      E. health risk assessment applications;
      
      F. monitoring applications, including heath data monitoring applications and consumer preference monitoring applications;
      
      G. compliance applications, including applications that generate notifications of compliance related activities, including notifications regarding health or product maintenance;
      
      H. test results applications, including applications that provide at least one of lab test results, standardized tests results, consumer product test results, and maintenance results; and
      
      I. linking applications, including applications that link two or more of the applications in parts A through H.

17. A method of performing a speech application session, wherein a SR system is hosted on a first computer and includes a means to receive an audio input, said method comprising:
- A. receiving said audio input by said SR system;
  
  B. loading a Web page including an application script on said first computer, said application script including a set of functionality configured to manage a speech application session and control said SR system, without accessing functionality from a second computer;
  
  C. establishing a set of standard interfaces between said SR system and said application script, including establishing a semantic evaluator;
  
  D. in response to tasking by said application script, generating by said SR system one or more context free semantic objects representing all possible interpretations of said audio input;
  
  E. in response to receiving a context defined by said application script, determining by said semantic evaluator a single semantic interpretation from said one or more context free semantic objects; and
  
  F. determining a next action by said application script as a function of said single semantic interpretation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Eliza Corporation (Gainwell Technologies)
Original Assignee
Eliza Corporation (Gainwell Technologies)
Inventors
Kroeker, John, Boulanov, Oleg
Primary Examiner(s)
Colin; Carl
Assistant Examiner(s)
SIDDIQI, MOHAMMAD A

Application Number

US12/062,144
Publication Number

US 20080183469A1
Time in Patent Office

1,265 Days
Field of Search

709/217, 709/218, 704/9, 704/251, 704/257, 704/270
US Class Current

709/217
CPC Class Codes

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/279   Recognition of textual enti...

G06F 40/30   Semantic analysis

G06Q 30/02   Marketing; Price estimation...

G10L 15/02   Feature extraction for spee...

G10L 15/14   using statistical models, e...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/1822   Parsing for meaning underst...

G10L 15/28   Constructional details of s...

G10L 15/30   Distributed recognition, e....

H04M 3/4938   comprising a voice browser ...

Web-based speech recognition with scripting and semantic objects

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

42 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Web-based speech recognition with scripting and semantic objects

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links