Indexing digitized speech with words represented in the digitized speech
First Claim
1. A method for use with a multimodal digital audio editor operating on a multimodal device supporting multiple modes of user interaction with the multimodal digital audio editor, the modes of user interaction including a voice mode and one or more non-voice modes, the multimodal digital audio editor operatively coupled to an automatic speech recognition (ASR) engine, the method comprising:
- receiving in the multimodal digital audio editor, recognized speech that the ASR engine generated from digitized speech, that includes a recognized word and information indicating where, in the digitized speech, representation of the recognized word appears;
inserting, by the multimodal digital audio editor, the recognized word into a speech recognition grammar; and
inserting, by the multimodal digital audio editor, into the speech recognition grammar in association with the recognized word, the information indicating where, in the digitized speech, representation of the recognized word appears.
2 Assignments
0 Petitions
Accused Products
Abstract
Indexing digitized speech with words represented in the digitized speech, with a multimodal digital audio editor operating on a multimodal device supporting modes of user interaction, the modes of user interaction including a voice mode and one or more non-voice modes, the multimodal digital audio editor operatively coupled to an ASR engine, including providing by the multimodal digital audio editor to the ASR engine digitized speech for recognition; receiving in the multimodal digital audio editor from the ASR engine recognized user speech including a recognized word, also including information indicating where, in the digitized speech, representation of the recognized word begins; and inserting by the multimodal digital audio editor the recognized word, in association with the information indicating where, in the digitized speech, representation of the recognized word begins, into a speech recognition grammar, the speech recognition grammar voice enabling user interface commands of the multimodal digital audio editor.
154 Citations
18 Claims
-
1. A method for use with a multimodal digital audio editor operating on a multimodal device supporting multiple modes of user interaction with the multimodal digital audio editor, the modes of user interaction including a voice mode and one or more non-voice modes, the multimodal digital audio editor operatively coupled to an automatic speech recognition (ASR) engine, the method comprising:
-
receiving in the multimodal digital audio editor, recognized speech that the ASR engine generated from digitized speech, that includes a recognized word and information indicating where, in the digitized speech, representation of the recognized word appears; inserting, by the multimodal digital audio editor, the recognized word into a speech recognition grammar; and inserting, by the multimodal digital audio editor, into the speech recognition grammar in association with the recognized word, the information indicating where, in the digitized speech, representation of the recognized word appears. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. Apparatus for use with a multimodal digital audio editor operating on a multimodal device supporting multiple modes of user interaction with the multimodal digital audio editor, the modes of user interaction including a voice mode and one or more non-voice modes, the multimodal digital audio editor being operatively coupled to an automatic speech recognition (ASR) engine, the apparatus comprising:
-
at least one computer processor; and a computer memory operatively coupled to the at least one computer processor, the at least one computer processor being programmed to; receive, in the multimodal digital audio editor, recognized speech that the ASR engine generated from digitized speech, that includes a recognized word and information indicating where, in the digitized speech, representation of the recognized word appears; insert, by the multimodal digital audio editor, the recognized word into a speech recognition grammar; and insert, by the multimodal digital audio editor, into the speech recognition grammar in association with the recognized word, the information indicating where, in the digitized speech, representation of the recognized word appears. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer-readable, recordable medium having instructions encoded thereon which, when executed in a system comprising a multimodal digital audio editor operating on a multimodal device supporting multiple modes of user interaction with the multimodal digital audio editor, the modes of user interaction including a voice mode and one or more non-voice modes, the multimodal digital audio editor being operatively coupled to an automatic speech recognition (ASR) engine, perform a method comprising:
-
receiving, in the multimodal digital audio editor, recognized speech that the ASR engine generated from digitized speech, that includes a recognized word and information indicating where, in the digitized speech, representation of the recognized word appears; inserting, by the multimodal digital audio editor, the recognized word into a speech recognition grammar; and inserting, by the multimodal digital audio editor, into the speech recognition grammar in association with the recognized word, the information indicating where, in the digitized speech, representation of the recognized word appears. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification