Method, and a device for converting speech by replacing inarticulate portions of the speech before the conversion
DCFirst Claim
Patent Images
1. A mobile device operable in a wireless communications network comprising:
- a speech input device that receives speech and converts the speech into a representative digital speech signal, said speech input device converting the received speech into representative digital speech signal without any speech-to-text conversion of the representative digital speech signal;
a control input device that, prior to any speech-to-text conversion of the representative digital speech signal, communicates an edit command inputted by a user, the edit command relating to and for editing audible speech content of a user-defined portion of the representative digital speech signal, the user input edit command is one of a group consisting of i) a deletion of a portion of the representative digital speech signal, ii) an insertion of a speech portion in the representative digital speech signal, iii) re-recording of a portion of the representative digital speech signal, and iv) replacement of a portion of the representative digital speech signal where the user-defined portion of the representative digital speech signal is an inarticulate portion in the representative digital speech signal being replaced with a new version recording of the portion of the representative digital speech signal;
a processing device that performs a digital speech signal editing task of the representative digital speech signal portion to edit the audible speech content thereof responsive to the received edit command, said speech signal editing task being free of speech-to-text conversion and providing a user-directed edited digital speech signal portion;
at least part of a speech recognition engine for carrying out tasks of the edited digital speech signal portion to text conversion; and
a transceiver that exchanges information relating to the digital speech signal portion and speech to text conversion thereof with an external entity functionally connected to said wireless communications network (614),wherein said mobile device is configured to transmit text resulting from the speech to text conversion to another entity for at least one of a group consisting of i) storage, ii) archiving, and iii) a further processing task selected from a group consisting of spell-checking, machine translation, human translation, translation verification, and text to speech synthesis.
2 Assignments
Litigations
0 Petitions
Accused Products
Abstract
An arrangement for converting speech into text comprises a mobile device (202) and a server entity (208) configured to perform the conversion and additional optional processes in co-operation. The user of the mobile device (202) may locally edit the speech signal prior to or between the execution of the actual speech recognition tasks, by replacing an inarticulate portion of the speech signal with a new version being recording of the portion. Task sharing details can be negotiated dynamically based on a number of parameters.
-
Citations
14 Claims
-
1. A mobile device operable in a wireless communications network comprising:
-
a speech input device that receives speech and converts the speech into a representative digital speech signal, said speech input device converting the received speech into representative digital speech signal without any speech-to-text conversion of the representative digital speech signal; a control input device that, prior to any speech-to-text conversion of the representative digital speech signal, communicates an edit command inputted by a user, the edit command relating to and for editing audible speech content of a user-defined portion of the representative digital speech signal, the user input edit command is one of a group consisting of i) a deletion of a portion of the representative digital speech signal, ii) an insertion of a speech portion in the representative digital speech signal, iii) re-recording of a portion of the representative digital speech signal, and iv) replacement of a portion of the representative digital speech signal where the user-defined portion of the representative digital speech signal is an inarticulate portion in the representative digital speech signal being replaced with a new version recording of the portion of the representative digital speech signal; a processing device that performs a digital speech signal editing task of the representative digital speech signal portion to edit the audible speech content thereof responsive to the received edit command, said speech signal editing task being free of speech-to-text conversion and providing a user-directed edited digital speech signal portion; at least part of a speech recognition engine for carrying out tasks of the edited digital speech signal portion to text conversion; and a transceiver that exchanges information relating to the digital speech signal portion and speech to text conversion thereof with an external entity functionally connected to said wireless communications network (614), wherein said mobile device is configured to transmit text resulting from the speech to text conversion to another entity for at least one of a group consisting of i) storage, ii) archiving, and iii) a further processing task selected from a group consisting of spell-checking, machine translation, human translation, translation verification, and text to speech synthesis. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for converting speech into text having the steps of:
-
receiving, in a mobile device operable in a wireless network, a speech source and converting the speech source into a representative digital speech signal without performing any speech-to-text conversion; prior to performing any speech-to-text conversion of the representative digital speech signal, receiving an edit command inputted by a user, the edit command relating to and for editing audible speech content of a user-defined portion of the digital speech signal by the mobile device, the user input edit command being one of a group consisting of i) a deletion of a portion of the representative digital speech signal, ii) an insertion of a speech portion in the representative digital speech signal, iii) re-recording of a portion of the representative digital speech signal, and iv) replacement of a portion of the representative digital speech signal where the user-defined portion of the representative digital speech signal is an inarticulate portion in the representative digital speech signal being replaced with a new version recording of the portion of the representative digital speech signal; processing the digital speech signal portion in accordance with the edit command to edit the audible speech content thereof said processing being free of any speech-to-text conversion and providing a user-directed edited digital speech signal portion; exchanging information relating to the edited digital speech signal portion and speech to text conversion thereof; executing on the basis of the exchanged information at least part of the tasks required for carrying out a speech to text conversion of the digital speech signal portion; and visualizing at least part of the digital speech signal on a display of the mobile device, whereupon the received edit command further relates to said visualized part. - View Dependent Claims (11, 12)
-
-
13. A server operable in a communications network comprising:
-
a data input device that i) receives a digital data signal sent by a mobile device, said digital data signal representing speech or at least part thereof free of any speech to text conversion, and ii) receives a speech edit command inputted by a user, the speech edit command to edit audible speech content of a user-defined portion of said digital data signal representing said speech via the mobile device, the speech edit command providing a user-directed edit of the digital speech signal portion prior to any speech to text conversion of said digital data signal representing speech or at least part thereof, the user input edit command is one of a group consisting of i) a deletion of a portion of the representative digital speech signal, ii) an insertion of a speech portion in the representative digital speech signal, iii) re-recording of a portion of the representative digital speech signal, and iv) replacement of a portion of the representative digital speech signal where the user-defined portion of the representative digital speech signal is an inarticulate portion in the representative digital speech signal being replaced with a new version recording of the portion of the representative digital speech signal; at least part of a speech recognition engine for carrying out tasks of the edited digital data signal to text conversion; a controlling unit for exchanging control information with the mobile device, performing a digital speech signal editing task to edit the audible speech content of the user-defined digital speech signal portion responsive to the received edit command, and determining, based on the control information, the tasks to be performed on the received digital data signal by said at least part of the speech recognition engine, the digital speech signal editing task being performed prior to any speech signal to text conversion of the user-defined portion of said digital data signal representing said speech and providing a user-directed edit of the digital speech signal portion; and a data output device for communicating at least part of the output of the performed tasks to an external entity.
-
-
14. A system for converting speech into text comprising a mobile device operable in a wireless communications network and a server functionally connected to said wireless communications network, wherein
said mobile device is configured to: -
receive speech from a user and convert the speech into a representative digital speech signal without performing any speech-to-text conversion, prior to performing any speech-to-text conversion of the representative digital speech signal, to receive user-inputted edit command from the user, the edit command relating to and for editing audible speech content of a user-defined portion of the digital speech signal and being one of a group consisting of i) a deletion of a portion of the representative digital speech signal, ii) an insertion of a speech portion in the representative digital speech signal, iii) re-recording of a portion of the representative digital speech signal, and iv) replacement of a portion of the representative digital speech signal where the user-defined portion of the representative digital speech signal is an inarticulate portion in the representative digital speech signal being replaced with a new version recording of the portion of the representative digital speech signal, to process the digital speech signal portion in accordance with the edit command to edit the audible speech content thereof, to exchange information relating to the digital speech signal and speech to text conversion thereof with the server, and to execute part of the tasks required for carrying out an edited digital speech signal to text conversion, said processing step being prior of speech-to-text conversion and providing a user-directed edit of the digital speech signal portion, and said server is configured to receive information relating to the digital speech signal and speech to text conversion thereof, and to execute, based on the exchanged information, the remaining part of the tasks required for carrying out a digital speech signal to text conversion.
-
Specification