Using speech recognition results based on an unstructured language model with a music system

US 10,056,077 B2
Filed: 08/01/2008
Issued: 08/21/2018
Est. Priority Date: 03/07/2007
Status: Expired due to Fees

First Claim

Patent Images

1. A method of entering text into a music system using a processor, comprising:

recording speech presented by a user using a resident capture facility;

providing the speech as a recording to a speech recognition facility;

selecting at least one statistical language model, including a large vocabulary statistical language model, from a set of language models based at least in part on contextual information relating to the recording, wherein the at least one statistical language model selected includes a general language model for artists, a general language model for song titles, and a general language model for music types;

determining that the at least one statistical language model selected provides insufficient recognition output and requires an additional recognition pass of the recording;

conducting the additional recognition pass of the recording;

selecting at least one other statistical language model based at least in part on the additional recognition pass of the recording and client state information of the recording;

generating results utilizing the speech of the recording recognized by the speech recognition facility; and

using the results in the music system, wherein the music system provides information relating to a music application to the speech recognition facility, wherein generating the results is based at least in part on the information, wherein the information relating to the music application includes contextual information within the music application, and wherein the contextual information includes at least one of a usage history of the music application and information from at least one of a favorites list and playlists of the user.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech recorded by an audio capture facility of a music facility is processed by a speech recognition facility to generate results that are provided to the music facility. When information related to a music application running on the music facility are provided to the speech recognition facility, the results generated are based at least in part on the application related information. The speech recognition facility uses an unstructured language model for generating results. The user of the music facility may optionally be allowed to edit the results being provided to the music facility. The speech recognition facility may also adapt speech recognition based on usage of the results.

Citations

25 Claims

1. A method of entering text into a music system using a processor, comprising:
- recording speech presented by a user using a resident capture facility;
  
  providing the speech as a recording to a speech recognition facility;
  
  selecting at least one statistical language model, including a large vocabulary statistical language model, from a set of language models based at least in part on contextual information relating to the recording, wherein the at least one statistical language model selected includes a general language model for artists, a general language model for song titles, and a general language model for music types;
  
  determining that the at least one statistical language model selected provides insufficient recognition output and requires an additional recognition pass of the recording;
  
  conducting the additional recognition pass of the recording;
  
  selecting at least one other statistical language model based at least in part on the additional recognition pass of the recording and client state information of the recording;
  
  generating results utilizing the speech of the recording recognized by the speech recognition facility; and
  
  using the results in the music system, wherein the music system provides information relating to a music application to the speech recognition facility, wherein generating the results is based at least in part on the information, wherein the information relating to the music application includes contextual information within the music application, and wherein the contextual information includes at least one of a usage history of the music application and information from at least one of a favorites list and playlists of the user.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, further comprising using user feedback to adapt the at least one statistical language model selected.
  - 3. The method of claim 1, wherein the speech recognition facility is remotely located from the music system.
  - 4. The method of claim 1, wherein the information relating to the music application further includes at least one of an identity of the music application, an identity of a text box within the music application, an identity of the music system, and an identity of the user.
  - 5. The method of claim 4, wherein the contextual information further includes at least one of information about music currently stored on the music system, and information currently displayed in the music application.
  - 6. The method of claim 1, wherein generating the results based at least in part on the information relating to the music application includes selecting at least one of a plurality of recognition models as the at least one statistical language model based on the information relating to the music application and the recording.
  - 7. The method of claim 1, wherein the at least one statistical language model is selected based at least in part on the information relating to music application.
  - 8. The method of claim 1, further comprising selecting a different statistical language model based on the results generated.
  - 9. The method of claim 1, wherein the speech of the recording recognized comprises at least one of a song title, an artist name and a music category.
  - 10. The method of claim 1, wherein the client state information includes one or more of application ID of the music application, user ID of the user, text field ID of the music application and current state of the music application.

11. A method for entering text into a music system using a processor, comprising:
- recording speech presented by a user using a resident capture facility;
  
  providing the speech as a recording to a speech recognition facility;
  
  selecting at least one statistical language model from a set of language models based at least in part on contextual information relating to the recording, wherein the at least one statistical language model selected includes a general language model for artists, a general language model for song titles, and a general language model for music types;
  
  determining that the at least one statistical language model selected provides insufficient recognition output and requires an additional recognition pass of the recording;
  
  conducting the additional recognition pass of the recording;
  
  selecting at least one other statistical language model based at least in part on the additional recognition pass of the recording and client state information of the recording;
  
  generating results utilizing the speech recognition facility;
  
  allowing the user to alter the results; and
  
  using the results in the music system, wherein the music system provides information relating to a music application to the speech recognition facility, wherein generating the results is based at least in part on the information, wherein allowing the user to alter the results includes the user editing a text result using at least one of a set of buttons, other controls, and a screen-based text correction mechanism on the music system, wherein the information includes usage history of the music application and information from at least one of a favorites list and playlists of the user.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
- - 12. The method of claim 11 wherein the speech recognition facility is remotely located from the music system.
  - 13. The method of claim 11 wherein the information relating to the music application running on the music system is provided to the speech recognition facility.
  - 14. The method of claim 11 wherein allowing the user to alter the results includes selecting from among a plurality of alternate choices of words contained in the results from the speech recognition facility.
  - 15. The method of claim 11 wherein allowing the user to alter the results includes selecting from among a plurality of alternate actions related to the results from the speech recognition facility.
  - 16. The method of claim 11, further comprising combining words recognized from more than one set of language models to generate the results.
  - 17. The method of claim 11, wherein allowing the user to alter the results includes adapting the statistical language model selected using user feedback.
  - 18. The method of claim 11, wherein the speech recognized comprises at least one of a city, a state and a region.

19. A system for entering text into a music system comprising:
- a resident capture facility for recording speech presented by a user;
  
  a speech recognition facility for receiving the speech as a recording, for determining that the at least one statistical language model selected provides insufficient recognition output and requires an additional recognition pass of the recording, for conducting the additional recognition pass of the recording, for selecting at least one other statistical language model based at least in part on the additional recognition pass of the recording and client state information of the recording, and for generating results by selecting at least one statistical language model from a set of language models based at least in part on contextual information relating to the recording, wherein the at least one selected statistical language model includes a general language model for artists, a general language model for song titles, and a general language model for music types; and
  
  the music system for using the results, wherein the music system provides information relating to a music application to the speech recognition facility and wherein the results are generated based at least in part on the information, and wherein the contextual information includes usage history of the music application and information from at least one of a favorites list and playlists of the user.
- View Dependent Claims (20, 21, 22, 23, 24, 25)
- - 20. The system of claim 19 wherein the speech recognition facility is remotely located from the music system.
  - 21. The system of claim 19 wherein the speech recognition facility generates the results based at least in part on the information relating to the music application that is received from the music system.
  - 22. The system of claim 21 wherein the information relating to the music application includes at least one of an identity of the music application, an identity of a text box within the music application, the contextual information within the music application, an identity of the music system, and an identity of the user.
  - 23. The system of claim 19, wherein the speech recognition facility allows the user an opportunity to alter the speech from the recording recognized by the speech recognition facility.
  - 24. The system of claim 19, wherein the system generates the results selected from at least one of selecting an output of the music system, adjusting a volume output of the music system and generating a playlist of the playlists for the music system.
  - 25. The system of claim 19, further comprising a database of music content accessible to the speech recognition facility.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Cerra, Joseph P., Nguyen, John N., Phillips, Michael S., Shu, Han
Primary Examiner(s)
Colucci, Michael

Application Number

US12/184,490
Publication Number

US 20090030698A1
Time in Patent Office

3,672 Days
Field of Search

704257, 704 9, 7042701, 715830
US Class Current
CPC Class Codes

G10L 15/183 using context dependencies,...

G10L 15/30 Distributed recognition, e....

Using speech recognition results based on an unstructured language model with a music system

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Using speech recognition results based on an unstructured language model with a music system

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links