Method, and a device for converting speech by replacing inarticulate portions of the speech before the conversion

US 9,123,343 B2
Filed: 04/27/2006
Issued: 09/01/2015
Est. Priority Date: 04/27/2006
Status: Expired due to Fees

- Alert
- Pin

First Claim

Patent Images

1. A mobile device operable in a wireless communications network comprising:

a speech input device that receives speech and converts the speech into a representative digital speech signal, said speech input device converting the received speech into representative digital speech signal without any speech-to-text conversion of the representative digital speech signal;

a control input device that, prior to any speech-to-text conversion of the representative digital speech signal, communicates an edit command inputted by a user, the edit command relating to and for editing audible speech content of a user-defined portion of the representative digital speech signal, the user input edit command is one of a group consisting of i) a deletion of a portion of the representative digital speech signal, ii) an insertion of a speech portion in the representative digital speech signal, iii) re-recording of a portion of the representative digital speech signal, and iv) replacement of a portion of the representative digital speech signal where the user-defined portion of the representative digital speech signal is an inarticulate portion in the representative digital speech signal being replaced with a new version recording of the portion of the representative digital speech signal;

a processing device that performs a digital speech signal editing task of the representative digital speech signal portion to edit the audible speech content thereof responsive to the received edit command, said speech signal editing task being free of speech-to-text conversion and providing a user-directed edited digital speech signal portion;

at least part of a speech recognition engine for carrying out tasks of the edited digital speech signal portion to text conversion; and

a transceiver that exchanges information relating to the digital speech signal portion and speech to text conversion thereof with an external entity functionally connected to said wireless communications network (614),wherein said mobile device is configured to transmit text resulting from the speech to text conversion to another entity for at least one of a group consisting of i) storage, ii) archiving, and iii) a further processing task selected from a group consisting of spell-checking, machine translation, human translation, translation verification, and text to speech synthesis.

View all claims

2 Assignments

Timeline View

Assignment View

Litigations

0 Petitions

Accused Products

Abstract

An arrangement for converting speech into text comprises a mobile device (202) and a server entity (208) configured to perform the conversion and additional optional processes in co-operation. The user of the mobile device (202) may locally edit the speech signal prior to or between the execution of the actual speech recognition tasks, by replacing an inarticulate portion of the speech signal with a new version being recording of the portion. Task sharing details can be negotiated dynamically based on a number of parameters.

Citations

14 Claims

1. A mobile device operable in a wireless communications network comprising:
- a speech input device that receives speech and converts the speech into a representative digital speech signal, said speech input device converting the received speech into representative digital speech signal without any speech-to-text conversion of the representative digital speech signal;
  
  a control input device that, prior to any speech-to-text conversion of the representative digital speech signal, communicates an edit command inputted by a user, the edit command relating to and for editing audible speech content of a user-defined portion of the representative digital speech signal, the user input edit command is one of a group consisting of i) a deletion of a portion of the representative digital speech signal, ii) an insertion of a speech portion in the representative digital speech signal, iii) re-recording of a portion of the representative digital speech signal, and iv) replacement of a portion of the representative digital speech signal where the user-defined portion of the representative digital speech signal is an inarticulate portion in the representative digital speech signal being replaced with a new version recording of the portion of the representative digital speech signal;
  
  a processing device that performs a digital speech signal editing task of the representative digital speech signal portion to edit the audible speech content thereof responsive to the received edit command, said speech signal editing task being free of speech-to-text conversion and providing a user-directed edited digital speech signal portion;
  
  at least part of a speech recognition engine for carrying out tasks of the edited digital speech signal portion to text conversion; and
  
  a transceiver that exchanges information relating to the digital speech signal portion and speech to text conversion thereof with an external entity functionally connected to said wireless communications network (614),wherein said mobile device is configured to transmit text resulting from the speech to text conversion to another entity for at least one of a group consisting of i) storage, ii) archiving, and iii) a further processing task selected from a group consisting of spell-checking, machine translation, human translation, translation verification, and text to speech synthesis.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The mobile device according to claim 1, configured to share the execution of tasks required for carrying out the speech to text conversion with the external entity, said mobile device being further configured to share the execution of tasks so as to optimize a factor according to predetermined criteria, said factor being selected from a group consisting of:
    - execution time of the speech to text conversion, conversion costs, amount of required data transfer, processing load, and memory load.
  - 3. The mobile device according to claim 2, wherein the exchanged information includes at least one element selected from a group consisting of:
    - data for allocating or performing the tasks of the speech to text conversion, processing load, memory load, a battery status, a battery capacity, information about tasks running with higher priority, available transmission bandwidth, data transmission rate, external entity usage cost per speech data size or duration, size or duration of the digital speech signal, available encoding/decoding method, conversion status, task status, device unavailability notice, intermediary speech to text conversion result, digital speech, digital encoded speech, speech recognition parameter, and text.
  - 4. The mobile device according to claim 2, configured to utilize intermediary results of the speech to text conversion provided by both the device and the external entity in order to produce the text.
  - 5. The mobile device according to claim 2, configured to transmit intermediary results of the speech to text conversion to said external entity, so as to enable the another entity to perform at least one of the following:
    - to combine the intermediary results acquired from the mobile device with locally obtained results to produce the text, subject the intermediary results to additional processing in order to produce the text.
  - 6. The mobile device according to claim 1, further comprising a display device for visualizing at least part of the digital speech signal, including the digital speech signal portion, whereupon said control input device is configured to communicate an edit command relating to said visualized part, wherein the visualization of the signal comprises at least one element selected from a group consisting of:
    - a time-domain representation of the signal, a frequency-domain representation of the signal, a parameterization of the signal, a zoom or unzoom operation targeted to the visualized signal, a numeric value determined from a user-defined portion of the signal, a pointer to a userdefined location in the visualized signal, and highlighting of a user-defined sub-area of the visualized signal, andwherein said mobile device is further configured to visualize at least a portion of the text resulted from the conversion as aligned in relation to the corresponding visualized portion of the signal.
  - 7. The mobile device according to claim 1, wherein said at least part of the speech recognition engine comprises an element selected from a group consisting of:
    - preprocessor for dividing the digital speech signal into frames of a predetermined length, audio encoder for compressing the digital speech signal, cepstral analyser, acoustic classifier, neural network classifier, best path decoder, HMM (Hidden Markov Model) decoder, lexical language model, grammatical language model, context dependent lexical language model, context dependent grammatical language model, user-specific settings, and vocabulary.
  - 8. The mobile device according to claim 1, wherein said exchanged information includes an element selected from a group consisting of:
    - digital form speech, digital encoded speech, device status information, message acknowledgment, control information, edit command, task sharing negotiation data, parameter value related to task sharing, task status, service down notice, load figure, intermediary speech to text conversion result.
  - 9. The mobile device according to claim 1, wherein said information is exchanged by utilizing at least one communication practice selected from a group consisting of:
    - an SMS (Short Message Service) message, an MMS (Multimedia Message Service) message, an e-mail, a data call, a GPRS (Global Packet Radio Service) connection, and a voice call.

10. A method for converting speech into text having the steps of:
- receiving, in a mobile device operable in a wireless network, a speech source and converting the speech source into a representative digital speech signal without performing any speech-to-text conversion;
  
  prior to performing any speech-to-text conversion of the representative digital speech signal, receiving an edit command inputted by a user, the edit command relating to and for editing audible speech content of a user-defined portion of the digital speech signal by the mobile device, the user input edit command being one of a group consisting of i) a deletion of a portion of the representative digital speech signal, ii) an insertion of a speech portion in the representative digital speech signal, iii) re-recording of a portion of the representative digital speech signal, and iv) replacement of a portion of the representative digital speech signal where the user-defined portion of the representative digital speech signal is an inarticulate portion in the representative digital speech signal being replaced with a new version recording of the portion of the representative digital speech signal;
  
  processing the digital speech signal portion in accordance with the edit command to edit the audible speech content thereof said processing being free of any speech-to-text conversion and providing a user-directed edited digital speech signal portion;
  
  exchanging information relating to the edited digital speech signal portion and speech to text conversion thereof;
  
  executing on the basis of the exchanged information at least part of the tasks required for carrying out a speech to text conversion of the digital speech signal portion; and
  
  visualizing at least part of the digital speech signal on a display of the mobile device, whereupon the received edit command further relates to said visualized part.
- View Dependent Claims (11, 12)
- - 11. A computer executable program stored on a computer-readable code device adapted, when run on a computer, to carry out the method steps as defined by claim 10.
  - 12. A non-transitory carrier medium comprising the computer executable program of claim 11, wherein said non-transitory carrier medium includes at least one element selected from a group consisting of:
    - a memory card, a floppy disc, a CD-ROM, and a hard drive.

13. A server operable in a communications network comprising:
- a data input device thati) receives a digital data signal sent by a mobile device, said digital data signal representing speech or at least part thereof free of any speech to text conversion, andii) receives a speech edit command inputted by a user, the speech edit command to edit audible speech content of a user-defined portion of said digital data signal representing said speech via the mobile device, the speech edit command providing a user-directed edit of the digital speech signal portion prior to any speech to text conversion of said digital data signal representing speech or at least part thereof, the user input edit command is one of a group consisting of i) a deletion of a portion of the representative digital speech signal, ii) an insertion of a speech portion in the representative digital speech signal, iii) re-recording of a portion of the representative digital speech signal, and iv) replacement of a portion of the representative digital speech signal where the user-defined portion of the representative digital speech signal is an inarticulate portion in the representative digital speech signal being replaced with a new version recording of the portion of the representative digital speech signal;
  
  at least part of a speech recognition engine for carrying out tasks of the edited digital data signal to text conversion;
  
  a controlling unit for exchanging control information with the mobile device, performing a digital speech signal editing task to edit the audible speech content of the user-defined digital speech signal portion responsive to the received edit command, and determining, based on the control information, the tasks to be performed on the received digital data signal by said at least part of the speech recognition engine, the digital speech signal editing task being performed prior to any speech signal to text conversion of the user-defined portion of said digital data signal representing said speech and providing a user-directed edit of the digital speech signal portion; and
  
  a data output device for communicating at least part of the output of the performed tasks to an external entity.

14. A system for converting speech into text comprising a mobile device operable in a wireless communications network and a server functionally connected to said wireless communications network, whereinsaid mobile device is configured to:
- receive speech from a user and convert the speech into a representative digital speech signal without performing any speech-to-text conversion,prior to performing any speech-to-text conversion of the representative digital speech signal, to receive user-inputted edit command from the user, the edit command relating to and for editing audible speech content of a user-defined portion of the digital speech signal and being one of a group consisting of i) a deletion of a portion of the representative digital speech signal, ii) an insertion of a speech portion in the representative digital speech signal, iii) re-recording of a portion of the representative digital speech signal, and iv) replacement of a portion of the representative digital speech signal where the user-defined portion of the representative digital speech signal is an inarticulate portion in the representative digital speech signal being replaced with a new version recording of the portion of the representative digital speech signal,to process the digital speech signal portion in accordance with the edit command to edit the audible speech content thereof, to exchange information relating to the digital speech signal and speech to text conversion thereof with the server, and to execute part of the tasks required for carrying out an edited digital speech signal to text conversion, said processing step being prior of speech-to-text conversion and providing a user-directed edit of the digital speech signal portion, andsaid server is configured to receive information relating to the digital speech signal and speech to text conversion thereof, and to execute, based on the exchanged information, the remaining part of the tasks required for carrying out a digital speech signal to text conversion.

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
Dicta-Direct LLC (2S Ventures, LLC)
Original Assignee
Mobiter Dicta Oy
Inventors
Kurki-Suonio, Risto
Primary Examiner(s)
Kazeminezhad, Farzad

Application Number

US12/298,697
Publication Number

US 20090319267A1
Time in Patent Office

3,414 Days
Field of Search

704/2, 704/270, 704/270.1
US Class Current

1/1
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

Method, and a device for converting speech by replacing inarticulate portions of the speech before the conversion

First Claim

2 Assignments

Litigations

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Method, and a device for converting speech by replacing inarticulate portions of the speech before the conversion

First Claim

2 Assignments

Subscription Required

Subscription Required

Litigations

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links