Arrangement of speaker-independent speech recognition

US 7,392,184 B2
Filed: 04/15/2002
Issued: 06/24/2008
Est. Priority Date: 04/17/2001
Status: Expired due to Term

First Claim

Patent Images

1. A method of forming a pronunciation model for speech recognition in a telecommunications system comprising at least one portable electronic device and a server, the electronic device being configured to compare the user'"'"'s speech information with pronunciation models comprising acoustic units and stored in the electronic device, the method comprising:

transferring a character sequence from the electronic device to the server, comparing the character sequence with a language resource to determine the language of the character sequence in a language selector of the electronic device,selecting a text to phenome resource, stored in the server, according to the determined language;

converting the character sequence in the server into at least one phoneme sequence in text format in accordance with the determined language; and

transferring said at least one phoneme sequence in text format from the server to the electronic device;

storing the phoneme sequence in association with the transferred character sequence in the electronic device;

forming an audio model of the phoneme sequence at the electronic device;

storing the audio model in the electronic device in association with the character sequence; and

comparing received speech information with the stored phoneme sequence and if a match results, selecting the character sequence associated with the matching phoneme sequence.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method needed in speech recognition for forming a pronunciation model in a telecommunications system comprising at least one portable electronic device and server. The electronic device is arranged to compare the user'"'"'s speech information with pronunciation models comprising acoustic units and stored in the electronic device. A character sequence is transferred from the electronic device to the server. In the server, the character device is converted into a sequence of acoustic units. A sequence of acoustic units is sent from the server to the electronic device.

28 Citations

View as Search Results

13 Claims

1. A method of forming a pronunciation model for speech recognition in a telecommunications system comprising at least one portable electronic device and a server, the electronic device being configured to compare the user'"'"'s speech information with pronunciation models comprising acoustic units and stored in the electronic device, the method comprising:
- transferring a character sequence from the electronic device to the server, comparing the character sequence with a language resource to determine the language of the character sequence in a language selector of the electronic device,selecting a text to phenome resource, stored in the server, according to the determined language;
  
  converting the character sequence in the server into at least one phoneme sequence in text format in accordance with the determined language; and
  
  transferring said at least one phoneme sequence in text format from the server to the electronic device;
  
  storing the phoneme sequence in association with the transferred character sequence in the electronic device;
  
  forming an audio model of the phoneme sequence at the electronic device;
  
  storing the audio model in the electronic device in association with the character sequence; and
  
  comparing received speech information with the stored phoneme sequence and if a match results, selecting the character sequence associated with the matching phoneme sequence.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. A method according to claim 1, the method further comprising:
    - selecting a character sequence according to said phoneme sequence from contact information; and
      
      activating a service in accordance with said character sequence.
  - 3. A method according to claim 1, the method further comprising:
    - searching in the server for information related to the character sequence, e.g. telephone numbers, on the basis of the received character sequence; and
      
      sending said information in addition to the phoneme sequence to the electronic device.
  - 4. A method according to claim 1, the method further comprising:
    - playing the audio model to the user of the electronic device as a response to the user'"'"'s speech command being substantially matching the phoneme sequence received from the character sequence.
  - 5. A method according to claim 1, wherein the electronic device is a mobile station and the data transmission between the server and the electronic device is configured by messaging through a mobile network.
  - 6. A method according to claim 1, wherein the language is determined by means of decision trees.

7. A telecommunications system comprising:
- at least one electronic device and a server, wherein the electronic device is configured to send a character sequence intended for speech recognition to the server;
  
  a language selector of the electronic device, adapted to compare the character sequence with a language resource to determine the language of the character sequence;
  
  wherein the server is configured to convert the character sequence into at least one phoneme sequence in accordance with the determined language and the server is configured to send said at least one phoneme sequence to the electronic device;
  
  wherein the electronic device is configured to form an audio model of the phoneme sequence and to store said audio model in association with the character sequence;
  
  a memory in the electronic device for storing the phoneme sequence in association with the transferred character sequence in the electronic device; and
  
  further wherein the electronic device is configured to compare the user'"'"'s speech information with the stored phoneme sequences stored in the electronic device and if a match results, selecting the character sequence associated with the matching phoneme sequence.

8. An electronic device comprising:
- a language selector adapted to compare a character sequence with a language resource to determine a language of the character sequence intended for speech recognition;
  
  a transmitter adapted to send the character sequence intended for speech recognition and information on the language of the character sequence to a server;
  
  a receiver adapted to receive a phoneme sequence formed of the character sequence in accordance with the determined language from the server;
  
  a processor in the electronic device configured to form an audio model of the phoneme sequence and to store said audio model in association with said character sequence;
  
  a memory adapted to store the phoneme sequence in association with the character sequence; and
  
  wherein the processor is adapted to compare user'"'"'s speech information with stored phoneme sequence and if a match results, select the character sequence associated with the matching phoneme sequence.
- View Dependent Claims (9, 10)
- - 9. An electronic device according to claim 8, wherein the processor is further adapted to:
    - associate a phoneme sequence received from the server to the character sequence stored in the memory of the electronic device or its tag;
      
      select a phoneme sequence substantially according to the user'"'"'s speech information and further a character sequence according to said phoneme sequence; and
      
      activate a service in accordance with said character sequence.
  - 10. An electronic device according to claim 8, wherein the language selector is arranged to determine the language by means of decision trees.

11. A server comprising:
- a communication connection for receiving a character sequence from at least one electronic device;
  
  a language selector adapted to compare the character sequence with a language resource to determine a language for the character sequence intended for speech recognition;
  
  a processor for converting the character sequence into at least one phoneme sequence in accordance with the determined language; and
  
  wherein said processor is adapted to send the at least one phoneme sequence to the electronic device over the communication connection; and
  
  and wherein the phoneme sequence is received and further processed and stored in association with the character sequence in the electronic device; and
  
  wherein a processor in the electronic device is adapted to compare a user'"'"'s speech information with the stored phoneme sequence and if a match results, select the character sequence associated with the matching phoneme sequence; and
  
  further wherein the processor in the electronic device is configured to form a audio model of the phoneme sequence and to store said speech model in association with the character sequence.
- View Dependent Claims (12)
- - 12. A server according to claim 11, wherein the language selector is arranged to determine the language by means of decision trees.

13. A computer program product comprisinga computer readable program code stored in a memory for causing a computer to perform speech recognition, said computer program product further comprising:
- computer readable program code means for causing a computer server to receive a character sequence for speech recognition from an electronic device;
  
  computer readable program code means for causing the computer server to compare the character sequence with a language resource to determine a language of the character sequence;
  
  computer readable program code means for causing the computer server to convert the character sequence into at least one phoneme sequence in text format in accordance with the determined language; and
  
  computer readable program code means for causing a processor in said electronic device to receive the at least one phoneme sequence in text format from the server;
  
  computer readable program code means for causing the processor to store the phoneme sequence in association with the character sequence; and
  
  compare received speech information with the stored phoneme sequence and, if a match results, select the character sequence associated with the matching phoneme sequence; and
  
  computer readable program code means for causing the processor to form a audio model of the phoneme sequence and to store said speech model in association with the character sequence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
HMD Global Oy
Original Assignee
Nokia Corporation
Inventors
Laurila, Kari, Viikki, Olli
Primary Examiner(s)
Vo, Huyen X.

Application Number

US10/122,730
Publication Number

US 20020152067A1
Time in Patent Office

2,262 Days
Field of Search

704/256, 704/277, 704/260, 704/258, 704/207, 704/8, 704/9, 704/270, 704/1, 704/5, 704/7, 704/236, 704/250, 704/243, 704/244, 704/231, 704/246, 709/203, 709/260
US Class Current

704/236
CPC Class Codes

G10L 15/063 Training

G10L 15/30 Distributed recognition, e....

Arrangement of speaker-independent speech recognition

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

28 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Arrangement of speaker-independent speech recognition

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

28 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links