Using speech recognition results based on an unstructured language model with a music system
First Claim
1. A method of entering text into a music system using a processor, comprising:
- recording speech presented by a user using a resident capture facility;
providing the speech as a recording to a speech recognition facility;
selecting at least one statistical language model, including a large vocabulary statistical language model, from a set of language models based at least in part on contextual information relating to the recording, wherein the at least one statistical language model selected includes a general language model for artists, a general language model for song titles, and a general language model for music types;
determining that the at least one statistical language model selected provides insufficient recognition output and requires an additional recognition pass of the recording;
conducting the additional recognition pass of the recording;
selecting at least one other statistical language model based at least in part on the additional recognition pass of the recording and client state information of the recording;
generating results utilizing the speech of the recording recognized by the speech recognition facility; and
using the results in the music system, wherein the music system provides information relating to a music application to the speech recognition facility, wherein generating the results is based at least in part on the information, wherein the information relating to the music application includes contextual information within the music application, and wherein the contextual information includes at least one of a usage history of the music application and information from at least one of a favorites list and playlists of the user.
5 Assignments
0 Petitions
Accused Products
Abstract
Speech recorded by an audio capture facility of a music facility is processed by a speech recognition facility to generate results that are provided to the music facility. When information related to a music application running on the music facility are provided to the speech recognition facility, the results generated are based at least in part on the application related information. The speech recognition facility uses an unstructured language model for generating results. The user of the music facility may optionally be allowed to edit the results being provided to the music facility. The speech recognition facility may also adapt speech recognition based on usage of the results.
-
Citations
25 Claims
-
1. A method of entering text into a music system using a processor, comprising:
-
recording speech presented by a user using a resident capture facility; providing the speech as a recording to a speech recognition facility; selecting at least one statistical language model, including a large vocabulary statistical language model, from a set of language models based at least in part on contextual information relating to the recording, wherein the at least one statistical language model selected includes a general language model for artists, a general language model for song titles, and a general language model for music types; determining that the at least one statistical language model selected provides insufficient recognition output and requires an additional recognition pass of the recording; conducting the additional recognition pass of the recording; selecting at least one other statistical language model based at least in part on the additional recognition pass of the recording and client state information of the recording; generating results utilizing the speech of the recording recognized by the speech recognition facility; and using the results in the music system, wherein the music system provides information relating to a music application to the speech recognition facility, wherein generating the results is based at least in part on the information, wherein the information relating to the music application includes contextual information within the music application, and wherein the contextual information includes at least one of a usage history of the music application and information from at least one of a favorites list and playlists of the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for entering text into a music system using a processor, comprising:
-
recording speech presented by a user using a resident capture facility; providing the speech as a recording to a speech recognition facility; selecting at least one statistical language model from a set of language models based at least in part on contextual information relating to the recording, wherein the at least one statistical language model selected includes a general language model for artists, a general language model for song titles, and a general language model for music types; determining that the at least one statistical language model selected provides insufficient recognition output and requires an additional recognition pass of the recording; conducting the additional recognition pass of the recording; selecting at least one other statistical language model based at least in part on the additional recognition pass of the recording and client state information of the recording; generating results utilizing the speech recognition facility; allowing the user to alter the results; and using the results in the music system, wherein the music system provides information relating to a music application to the speech recognition facility, wherein generating the results is based at least in part on the information, wherein allowing the user to alter the results includes the user editing a text result using at least one of a set of buttons, other controls, and a screen-based text correction mechanism on the music system, wherein the information includes usage history of the music application and information from at least one of a favorites list and playlists of the user. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A system for entering text into a music system comprising:
-
a resident capture facility for recording speech presented by a user; a speech recognition facility for receiving the speech as a recording, for determining that the at least one statistical language model selected provides insufficient recognition output and requires an additional recognition pass of the recording, for conducting the additional recognition pass of the recording, for selecting at least one other statistical language model based at least in part on the additional recognition pass of the recording and client state information of the recording, and for generating results by selecting at least one statistical language model from a set of language models based at least in part on contextual information relating to the recording, wherein the at least one selected statistical language model includes a general language model for artists, a general language model for song titles, and a general language model for music types; and the music system for using the results, wherein the music system provides information relating to a music application to the speech recognition facility and wherein the results are generated based at least in part on the information, and wherein the contextual information includes usage history of the music application and information from at least one of a favorites list and playlists of the user. - View Dependent Claims (20, 21, 22, 23, 24, 25)
-
Specification