Distributed speech recognition server system for mobile internet/intranet communication

US 20020091527A1
Filed: 01/08/2001
Published: 07/11/2002
Est. Priority Date: 01/08/2001
Status: Abandoned Application

First Claim

Patent Images

1. A speech recognition server system for implementation in a communications network having a plurality of clients, at least one site server, at least one gateway server, and at least one content server, said speech recognition server system comprising:

a site map including a table of site address words;

a server daemon, communicable with the gateway server and the site server, for managing client information and request parameters;

a voice recognition server, communicable with said server daemon, for speech recognition of the speech information;

a site map manager, communicable with said site map, for speech recognition of the site address words in said site map;

a speaker model, communicable with said site map manager and said voice recognition server, for speech recognition of the site address words in said site map; and

a site selector, communicable with said voice recognition server, said server daemon, and said site map, for selecting the site words responsive to words recognized by said voice recognition server.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This invention is a speech recognition server system for implementation in a communications network having a plurality of clients, at least one site communication server, at least one contents server, and at least one communications gateway server, said speech recognition server system comprising a site map including a table of site address words; a speech server daemon, communicable with the wireless communications gateway server and the site communications server, for managing speech information; a voice recognition server, communicable with said speech server daemon, for speech recognition of the speech information; a site map manager, communicable with said site map, for speech recognition of the site address words in said site map; a speaker model, communicable with said site map manager and said voice recognition server, for speech recognition of the site address words in said site map; and a site selector, communicable with said voice recognition server, said speech server daemon, and said site map, for selecting the site words responsive to words recognized by said voice recognition server.

167 Citations

66 Claims

1. A speech recognition server system for implementation in a communications network having a plurality of clients, at least one site server, at least one gateway server, and at least one content server, said speech recognition server system comprising:
- a site map including a table of site address words;
  
  a server daemon, communicable with the gateway server and the site server, for managing client information and request parameters;
  
  a voice recognition server, communicable with said server daemon, for speech recognition of the speech information;
  
  a site map manager, communicable with said site map, for speech recognition of the site address words in said site map;
  
  a speaker model, communicable with said site map manager and said voice recognition server, for speech recognition of the site address words in said site map; and
  
  a site selector, communicable with said voice recognition server, said server daemon, and said site map, for selecting the site words responsive to words recognized by said voice recognition server.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51)
- - 2. The speech recognition server system of claim 1 wherein the clients comprise telephone handsets.
  - 3. The speech recognition server system of claim 2 wherein the telephone handsets comprise wireless mobile phones.
  - 4. The speech recognition server system of claim 1 wherein the clients include computers.
  - 5. The speech recognition server system of claim 1 wherein the clients include personal digital assistant devices.
  - 6. The speech recognition server system of claim 1 wherein the network communications system is a wireless system.
  - 7. The speech recognition server system of claim 1 wherein the gateway server is a wireless application protocol (WAP) gateway.
  - 8. The speech recognition server system of claim 1 wherein the site sever is a HTTP server.
  - 9. The speech recognition server system of claim 1 wherein said site address table comprises URL website words.
  - 10. The speech recognition server system of claim 1 wherein said speaker model is speaker dependent.
  - 11. The speech recognition server system of claim 1 wherein said speaker model is speaker adaptive.
  - 12. The speech recognition server system of claim 1 wherein said server daemon comprises:
    - a request manager for receiving information requests and user addresses from the clients and transmitting the information requests to said voice recognition server for speech recognition;
      
      an ID manager, coupled to said request manager, for generating a user ID for each client and for transmitting a map page number to said sitemap manager;
      
      a profile manager, coupled to said request manager, for receiving the user ID and matching a voice profile created by said voice recognition server;
      
      a log manager, coupled to said request manager, for recording a log entry transmitted by said request manager;
      
      a site address verifier, coupled to said ID manager, for receiving a matched site address from said site map manager and verifying the matched site address;
      
      a reply manager, coupled to said request manager and to said site address verifier, for receiving the matched site address from said site address verifier and transmitting a fetch request to the site communications server responsive to the matched site address; and
      
      a sessions manager, coupled to said request manager, for recording and controlling the sequence of actions.
  - 13. The speech recognition server system of claim 12 wherein said site addresses are URLs.
  - 14. The speech recognition server system of claim 12 wherein said profile manager requests said voice recognition server to generate an adaptation acoustic profile responsive to the user ID and transmits the adaptation acoustic profile to said profile manager.
  - 15. The speech recognition server system of claim 1 wherein said voice recognition server comprises:
    - at least one voice recognition engine; and
      
      a syllable map having map entries, coupled to said voice recognition engine, for matching an incoming voice feature with said map entries in said syllable map.
  - 16. The speech recognition server system of claim 15 wherein said at least one voice recognition engine comprises a speaker-independent speech recognition program.
  - 17. The speech recognition server system of claim 16 wherein said speaker-independent speech recognition program comprises words in a Korean language.
  - 18. The speech recognition server system of claim 16 wherein said speaker-independent speech recognition program comprises words in a Japanese language.
  - 19. The speech recognition server system of claim 16 wherein said speaker-independent speech recognition program comprises words in a Chinese language.
  - 20. The speech recognition server system of claim 15 wherein said at least one voice recognition engine comprises an adaptive speech recognition program.
  - 21. The speech recognition server system of claim 20 wherein said adaptive speech recognition program comprises words in a Korean language.
  - 22. The speech recognition server system of claim 20 wherein said adaptive speech recognition program comprises words in a Japanese language.
  - 23. The speech recognition server system of claim 20 wherein said adaptive speech recognition program comprises words in a Chinese language.
  - 24. The speech recognition server system of claim 15 wherein said at least one voice recognition engine comprises a training speech recognition program.
  - 25. The speech recognition server system of claim 24 wherein said training speech recognition program comprises words in a Korean language.
  - 26. The speech recognition server system of claim 24 wherein said training speech recognition program comprises words in a Japanese language.
  - 27. The speech recognition server system of claim 24 wherein said training speech recognition program comprises words in a Chinese language.
  - 28. The speech recognition server system of claim 15 wherein said at least one voice recognition engine comprises a predetermined purpose speech recognition program.
  - 29. The speech recognition server system of claim 28 wherein said predetermined purpose speech recognition program comprises words in a Korean language.
  - 30. The speech recognition server system of claim 28 wherein said predetermined purpose speech recognition program comprises words in a Japanese language.
  - 31. The speech recognition server system of claim 28 wherein said predetermined purpose speech recognition program comprises words in a Chinese language.
  - 32. The speech recognition server system of claim 28 wherein said predetermined purpose speech recognition program includes site names on a communications network.
  - 33. The speech recognition server system of claim 28 wherein said predetermined purpose speech recognition program includes company names on a stock exchange.
  - 34. The speech recognition server system of claim 28 wherein said predetermined purpose speech recognition program includes transportation information related words.
  - 35. The speech recognition server system of claim 28 wherein said predetermined purpose speech recognition program includes entertainment information related words.
  - 36. The speech recognition server system of claim 28 wherein said predetermined purpose speech recognition program includes restaurant information words.
  - 37. The speech recognition server system of claim 28 wherein said predetermined purpose speech recognition program includes weather information words.
  - 38. The speech recognition server system of claim 28 wherein said predetermined purpose speech recognition program includes retail store name words.
  - 39. The speech recognition server system of claim 28 wherein said predetermined purpose speech recognition program includes banking services related words.
  - 40. The speech recognition server system of claim 28 wherein said predetermined purpose speech recognition program includes financial services related words.
  - 41. The speech recognition server system of claim 28 wherein said predetermined purpose speech recognition program includes e-commerce and e-business related words.
  - 42. The speech recognition server system of claim 28 wherein said predetermined purpose speech recognition program includes navigation aids words.
  - 43. The speech recognition server system of claim 1 wherein said sitemap manager comprises:
    - a syllable generator for generating speech syllables;
      
      a syllable map, coupled to said syllable generator, for storing site name words;
      
      a site address map for storing site addresses;
      
      a sitemap toolkit, coupled to said syllable generator, said sitemap toolkit including a user interface for interfacing with the contents server, a syllable map manager for managing the syllables transmitted from said syllable map and the syllables generated by said syllable generator, and a site address map manager for managing the site address words, said sitemap tool kit for matching the syllables from said syllable map and said syllables recognized by said voice recognition server.
  - 44. The speech recognition server system of claim 43 wherein said site addresses comprise URL words.
  - 45. The speech recognition server system of claim 43 wherein said syllable map comprises words in a Korean language.
  - 46. The speech recognition server system of claim 43 wherein said syllable map comprises words in a Japanese language.
  - 47. The speech recognition server system of claim 43 wherein said syllable map comprises words in a Chinese language.
  - 48. The speech recognition server system of claim 43 wherein said syllable generator generates Korean language syllables.
  - 49. The speech recognition server system of claim 43 wherein said syllable generator generates Korean language syllables.
  - 50. The speech recognition server system of claim 43 wherein said syllable generator generates Japanese language syllables.
  - 51. The speech recognition server system of claim 43 wherein said syllable generator generates Chinese language syllables.

52. A speech recognition server system for implementation in a communications network having at least one site server, at least one gateway server, at least one content server, and a plurality of clients each having a keypad and a micro-browser, said speech recognition server system comprising:
- a hotkey, disposed on the keypad, for initializing a voice session;
  
  a vocoder for generating voice frame data responsive to an input speech;
  
  a client speech subroutine, coupled to said vocoder, for performing speech feature extraction on said voice frame data and to generate digitized voice signals therefrom;
  
  a system-specific profile database for storing and transmitting system-specific client profiles;
  
  a payload formatter, communicable with said client speech subroutine and said system-specific profile database, for formatting a client payload data flow received from said client speech subroutine with data received from said system-specific profile database;
  
  a speech recognition server, communicable with the gateway server for speech recognition of the formatted client payload;
  
  a transaction protocol (TP) socket, communicable with said payload formatter and the gateway server, for receiving the formatted client payload from said payload formatter, converting the client payload to a wireless speech TP query, and transmitting the wireless speech TP query via the gateway server through the communications network to said speech recognition server, and further for receiving a recognized wireless speech TP query from said speech recognition server, converting the recognized wireless speech TP query to a resource identifier, and transmitting the resource identifier to the micro-browser for identifying the resource responsive to the resource identifier;
  
  a wireless transaction protocol socket, communicable with the micro-browser and gateway server, for receiving the resource query from the micro-browser, generating a wireless session resource query, and transmitting the resource query via the gateway server and through the communications network to the contents server, and further for receiving content from the content server via the site server, the communications network, and the gateway server, and transmitting the content via the micro-browser to the client for display; and
  
  an event handler, communicable with said hotkey, said client speech subroutine, said TP socket, the micro-browser, and said payload formatter, for transmitting event command signals and synchronizing the voice session thereamong.

53. A speech recognition server system for implementation in a communications network having at least one site server, at least one gateway server, at least one content server, and a plurality of clients each having a keypad and a micro-browser, said speech recognition server system comprising:
- a hotkey, disposed on the keypad, for initializing a voice session;
  
  a vocoder for generating voice frame data responsive to an input speech;
  
  a client speech subroutine, coupled to said vocoder, for performing speech feature extraction on said voice frame data and to generate digitized voice signals therefrom;
  
  a system-specific profile database for storing and transmitting system-specific client profiles;
  
  a payload formatter, communicable with said client speech subroutine and said system-specific profile database, for formatting the client payload received from said client speech subroutine with data received from said system-specific profile database;
  
  a speech recognition server, communicable with the gateway server for speech recognition;
  
  a transaction protocol (TP) socket, communicable with said payload formatter and the gateway server, for receiving the client payload from said payload formatter, converting the client payload to a TP tag, and transmitting the TP tag via the gateway server through the communications network to said speech recognition server;
  
  a wireless transaction protocol socket, communicable with the micro-browser and the gateway server, for receiving a wireless push transmission from the gateway server responsive to a push access protocol transmission from said speech recognition server, and for receiving a resource transmission from the micro-browser and transmitting the resource transmission via the gateway server through the communications network to the site server, and further for receiving content from the content server via the site server, the communications network, and the gateway server, and transmitting the content via the micro-browser to the client for display; and
  
  an event handler, communicable with said hotkey, said client speech subroutine, the micro-browser, and said payload formatter, for transmitting event command signals and synchronizing the voice session thereamong.

54. A speech recognition server system for implementation in a communications network having at least one site server, at least one gateway server, at least one contents server, and a plurality of clients each having a keypad and a micro-browser, said speech recognition server system comprising:
- a hotkey, disposed on the keypad, for initializing a voice session;
  
  a vocoder for generating voice frame data responsive to an input speech;
  
  a client speech subroutine, coupled to said vocoder, for performing speech feature extraction on said voice frame data and to generate digitized voice signals therefrom;
  
  a system-specific profile database for storing and transmitting system-specific client profiles;
  
  a payload formatter, communicable with the micro-browser, said client speech subroutine and said system-specific profile database, for formatting a client payload received from said client speech subroutine with data received from said system-specific profile database;
  
  a speech recognition server, communicable with the gateway server for receiving the client payload hypertext TP transmissions from the gateway server and for performing speech recognition on the client payload, and further for transmitting a recognized client payload to the gateway server;
  
  a wireless transaction protocol socket, communicable with the micro-browser and the gateway server, for receiving a wireless query transmission from the micro-browser and transmitting a wireless session protocol transmission to the gateway server and thence to said speech recognition server, and further for receiving a wireless session protocol transmission from the gateway server responsive to a hypertext TP transmission from said speech recognition server, and for receiving a resource transmission from the micro-browser and transmitting the resource transmission via the gateway server through the communications network to the contents server, and further for receiving content from the content server via the site server, the communications network, and the gateway server, and transmitting the content via the micro-browser to the client for display; and
  
  an event handler, communicable with said hotkey, said client speech subroutine, the micro-browser, and said payload formatter, for transmitting event command signals and synchronizing the voice session thereamong.

55. A speech recognition server system for implementation in a communications network having at least one site server, at least one gateway server, at least one content server, and a plurality of clients each having a keypad and a micro-browser, said speech recognition server system comprising:
- a hotkey, disposed on the keypad, for initializing a voice session;
  
  a vocoder for generating voice frame data responsive to an input speech;
  
  a client speech subroutine, coupled to said vocoder, for performing speech feature extraction on said voice frame data and to generate digitized voice signals therefrom;
  
  a system-specific profile database for storing and transmitting system-specific client profiles;
  
  a payload formatter, communicable with the micro-browser, said client speech subroutine and said system-specific profile database, for formatting a client payload received from said client speech subroutine with data received from said system-specific profile database;
  
  a speech recognition server, communicable with the gateway server for receiving the client payload hypertext TP transmissions from the gateway server and for performing speech recognition on the client payload, and further for transmitting a recognized client payload to the gateway server;
  
  a wireless transaction protocol socket, communicable with the micro-browser, said payload formatter, and the gateway server, for receiving a wireless protocol query transmission from said payload formatter and transmitting a wireless session protocol transmission to the gateway server and thence to said speech recognition server, and further for receiving a wireless session protocol transmission from the gateway server responsive to a hypertext TP transmission from said speech recognition server, and for receiving a resource transmission from the micro-browser and transmitting the resource transmission via the gateway server through the communications network to the contents server, and further for receiving content from the content server via the site server, the communications network, and the gateway server, and transmitting the content via the micro-browser to the client for display; and
  
  an event handler, communicable with said hotkey, said client speech subroutine, the micro-browser, and said payload formatter, for transmitting event command signals and synchronizing the voice session thereamong.
- View Dependent Claims (57, 58, 60)
- - 57. The distributed speech recognition system of claim 56 wherein said server speech processor is disposed in the wireless gateway proxy server.
  - 58. The distributed speech recognition system of claim 56 wherein said server speech processor is disposed in the website server
  - 60. The distributed speech recognition system of claim 59 wherein said server speech processor is disposed in the web server.

56. A distributed speech recognition system for implementation in a wireless mobile communications system, communicable with the Internet, having at least one website server, at least one wireless gateway proxy server, a wireless telephony applications (WTA) server, and a plurality of mobile communication devices each having a micro-browser, said distributed speech recognition system comprising:
- a client speech processor, disposed in said mobile communication devices, for speech feature extraction; and
  
  a server speech processor, disposed in the WTA server, for recognizing the speech features.

59. A distributed speech recognition system for implementation in a wireless mobile communications system communicable with an intranet system having at least one web server, at least one intranet wireless communications gateway proxy server, a firewall, and a plurality of mobile communication devices, said distributed speech recognition system comprising:
- a client speech processor, disposed in said mobile communication devices, for speech feature extraction; and
  
  a server speech processor, disposed in the intranet wireless communications gateway proxy server for recognizing the speech features.

61. A speech recognition server system for implementation in a communications network having a plurality of sites each having a site map and a plurality of sub-sites, said speech recognition server system comprising:
- a site map table for mapping the site map at the plurality of sites;
  
  mirroring means, coupled to said site map table, for mirroring the site map at the plurality of sites to said site map table;
  
  speech recognition means for recognizing an input speech selecting one of said plurality of sites and sub-sites; and
  
  first child process means, coupled to said speech recognition means, for launching one of the plurality of sites responsive to the input speech;
  
  second child process means, coupled to said speech recognition means, for launching one of the plurality of sub-sites responsive to the input speech; and
  
  third child process means, coupled to said speech recognition means, for launching information at the sub-site responsive to an input query.
- View Dependent Claims (62)
- - 62. The speech recognition server system of claim 61 wherein said speech recognition server system is disposed at the plurality of sites.

63. In a network communication system including a plurality of sites and sub-sites each providing content, a method for speech-accessing the sites, sub-sites, and content comprising the steps of:
- mirroring the sites and sub-sites onto a speech recognition system site map;
  
  speaking a selected site name for one of the plurality of mirrored sites and sub-sites;
  
  generating a first child process to launch a site responsive to said spoken site name;
  
  speaking a sub-site name for one of the plurality of mirrored sub-sites;
  
  generating a second child process to launch a sub-site responsive to said spoken sub-site name;
  
  speaking a query for one of the plurality of mirrored sub-sites; and
  
  generating a third child process to launch a content responsive to said spoken query.

64. In a network communication system including a plurality of sites and sub-sites, a method for charging a payment for speech-accessing the sites and sub-sites comprising the steps of:
- (a) mirroring the sites and sub-sites onto a speech recognition system site map;
  
  (b) speaking a site name for one of the plurality of mirrored sites and sub-sites;
  
  (c) generating a first child process to launch a site responsive to said spoken site name;
  
  (d) speaking a sub-site name for one of the plurality of mirrored sub-sites;
  
  (e) generating a second child process to launch a sub-site responsive to said spoken sub-site name;
  
  (f) speaking a query for one of the plurality of mirrored sub-sites;
  
  (g) generating a third child process to launch a content responsive to said spoken query; and
  
  (h) charging a payment for said steps (a) to (g).
- View Dependent Claims (65, 66)
- - 65. The method of claim 64 wherein said charging a payment for said steps (a) to (g) is done by a billing by the network communications system.
  - 66. The method of claim 65 wherein said billing by the network communications system is performed monthly.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verbaltek Incorporated
Original Assignee
Verbaltek Incorporated
Inventors
Shiau, Shyue-Chin

Application Number

US09/757,305
Publication Number

US 20020091527A1
Time in Patent Office

Days
Field of Search
US Class Current

704/270.1
CPC Class Codes

G10L 15/30 Distributed recognition, e....

Distributed speech recognition server system for mobile internet/intranet communication

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

167 Citations

66 Claims

Specification

Solutions

Use Cases

Quick Links

Distributed speech recognition server system for mobile internet/intranet communication

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

167 Citations

66 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links