Disambiguating a speech recognition grammar in a multimodal application

US 7,822,608 B2
Filed: 02/27/2007
Issued: 10/26/2010
Est. Priority Date: 02/27/2007
Status: Active Grant

First Claim

Patent Images

1. A method of disambiguating a speech recognition grammar in a multimodal application, the multimodal application including voice activated hyperlinks, the voice activated hyperlinks being voice enabled by a speech recognition grammar comprising ambiguous terminal grammar elements, the multimodal application being operable in a multimodal browser on a multimodal device supporting multiple modes of user interaction with the multimodal device, the modes of user interaction including a voice mode and a visual mode, the multimodal browser being operatively coupled to a grammar interpreter, the method comprising:

maintaining by the multimodal browser a record of visibility of each voice activated hyperlink, the record of visibility including current visibility and past visibility on a display of the multimodal device of each voice activated hyperlink, the record of visibility further including an ordinal indication, for each voice activated hyperlink scrolled off display, of the sequence in which each such voice activated hyperlink was scrolled off display;

recognizing by the multimodal browser speech from a user matching an ambiguous terminal element of the speech recognition grammar; and

selecting by the multimodal browser a voice activated hyperlink for activation, the selecting being carried out in dependence upon the recognized speech and the record of visibility.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disambiguating a speech recognition grammar in a multimodal application, the multimodal application including voice activated hyperlinks, the voice activated hyperlinks voice enabled by a speech recognition grammar characterized by ambiguous terminal grammar elements, including maintaining by the multimodal browser a record of visibility of each voice activated hyperlink, the record of visibility including current visibility and past visibility on a display of the multimodal device of each voice activated hyperlink, the record of visibility further including an ordinal indication, for each voice activated hyperlink scrolled off display, of the sequence in which each such voice activated hyperlink was scrolled off display; recognizing by the multimodal browser speech from a user matching an ambiguous terminal element of the speech recognition grammar; selecting by the multimodal browser a voice activated hyperlink for activation, the selecting carried out in dependence upon the recognized speech and the record of visibility.

Citations

18 Claims

1. A method of disambiguating a speech recognition grammar in a multimodal application, the multimodal application including voice activated hyperlinks, the voice activated hyperlinks being voice enabled by a speech recognition grammar comprising ambiguous terminal grammar elements, the multimodal application being operable in a multimodal browser on a multimodal device supporting multiple modes of user interaction with the multimodal device, the modes of user interaction including a voice mode and a visual mode, the multimodal browser being operatively coupled to a grammar interpreter, the method comprising:
- maintaining by the multimodal browser a record of visibility of each voice activated hyperlink, the record of visibility including current visibility and past visibility on a display of the multimodal device of each voice activated hyperlink, the record of visibility further including an ordinal indication, for each voice activated hyperlink scrolled off display, of the sequence in which each such voice activated hyperlink was scrolled off display;
  
  recognizing by the multimodal browser speech from a user matching an ambiguous terminal element of the speech recognition grammar; and
  
  selecting by the multimodal browser a voice activated hyperlink for activation, the selecting being carried out in dependence upon the recognized speech and the record of visibility.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein:
    - the multimodal application comprises an X+V page that includes the voice activated hyperlinks; and
      
      each voice activated hyperlink further comprises an XHTML anchor element bound to a terminal element of the grammar by a value of an identifying attribute of the anchor element, the value of the identifying attribute being unique within the X+V page.
  - 3. The method of claim 1 wherein:
    - the record of past visibility further comprises a boolean data element having a value of TRUE or FALSE, TRUE indicating that a voice activated hyperlink was previously visible on the display, FALSE indicating that a voice activated hyperlink was not previously visible on the display; and
      
      the record of current visibility further comprises a data element having ordinal values, the value zero indicating that a voice activated hyperlink is visible, other values taken from a scrolled-off-display counter indicating the sequence in which a voice activated hyperlink was scrolled off display.
  - 4. The method of claim 1 wherein maintaining a record of visibility of each voice activated hyperlink, the record of visibility further including an ordinal indication, for each voice activated hyperlink scrolled off display, of the sequence in which each such voice activated hyperlink was scrolled off display, further comprises:
    - creating by the multimodal browser a scrolled-off-display counter, the scrolled-off-display counter being initialized to zero; and
      
      incrementing the scrolled-off-display counter when a visible voice activated hyperlink is scrolled off display.
  - 5. The method of claim 1 wherein maintaining a record of visibility further comprises carrying out the following steps for each voice activated hyperlink on each scroll of the display:
    - if the voice activated hyperlink scrolled into visibility on the display, recording that the voice activated hyperlink is currently visible and recording that the voice activated hyperlink was previously visible; and
      
      if the voice activated hyperlink scrolled out of visibility off the display, recording that the voice activated hyperlink is not visible and recording a current value of a scrolled-off-display counter as the ordinal indication of the sequence in which the voice activated hyperlink scrolled off display.
  - 6. The method of claim 1 wherein selecting by the multimodal browser a voice activated hyperlink for activation further comprises:
    - identifying as ambiguous hyperlinks all voice activated hyperlinks that are voice enabled by grammar elements that are ambiguous with respect to the matched ambiguous terminal element of the speech recognition grammar;
      
      if only one ambiguous hyperlink is visible, selecting for activation the only visible ambiguous hyperlink;
      
      if no ambiguous hyperlink is visible and only one ambiguous hyperlink was previously visible, selecting for activation the ambiguous hyperlink that was previously visible; and
      
      if no ambiguous hyperlink is visible and more than one ambiguous hyperlink was previously visible, selecting for activation the most recently visible ambiguous hyperlink.

7. Apparatus for disambiguating a speech recognition grammar in a multimodal application, the multimodal application including voice activated hyperlinks, the voice activated hyperlinks being voice enabled by a speech recognition grammar comprising ambiguous terminal grammar elements, the multimodal application being operable in a multimodal browser on a multimodal device supporting multiple modes of user interaction with the multimodal device, the modes of user interaction including a voice mode and a visual mode, the multimodal browser being operatively coupled to a grammar interpreter, the apparatus comprising a computer processor and a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instructions capable of:
- maintaining by the multimodal browser a record of visibility of each voice activated hyperlink, the record of visibility including current visibility and past visibility on a display of the multimodal device of each voice activated hyperlink, the record of visibility further including an ordinal indication, for each voice activated hyperlink scrolled off display, of the sequence in which each such voice activated hyperlink was scrolled off display;
  
  recognizing by the multimodal browser speech from a user matching an ambiguous terminal element of the speech recognition grammar; and
  
  selecting by the multimodal browser a voice activated hyperlink for activation, the selecting being carried out in dependence upon the recognized speech and the record of visibility.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The apparatus of claim 7 wherein:
    - the multimodal application comprises an X+V page that includes the voice activated hyperlinks; and
      
      each voice activated hyperlink further comprises an XHTML anchor element bound to a terminal element of the grammar by a value of an identifying attribute of the anchor element, the value of the identifying attribute being unique within the X+V page.
  - 9. The apparatus of claim 7 wherein:
    - the record of past visibility further comprises a Boolean data element having a value of TRUE or FALSE, TRUE indicating that a voice activated hyperlink was previously visible on the display, FALSE indicating that a voice activated hyperlink was not previously visible on the display; and
      
      the record of current visibility further comprises a data element having ordinal values, the value zero indicating that a voice activated hyperlink is visible, other values taken from a scrolled-off-display counter indicating the sequence in which a voice activated hyperlink was scrolled off display.
  - 10. The apparatus of claim 7 wherein maintaining a record of visibility of each voice activated hyperlink, the record of visibility further including an ordinal indication, for each voice activated hyperlink scrolled off display, of the sequence in which each such voice activated hyperlink was scrolled off display, further comprises:
    - creating by the multimodal browser a scrolled-off-display counter, the scrolled-off-display counter being initialized to zero; and
      
      incrementing the scrolled-off-display counter when a visible voice activated hyperlink is scrolled off display.
  - 11. The apparatus of claim 7 wherein maintaining a record of visibility further comprises carrying out the following steps for each voice activated hyperlink on each scroll of the display:
    - if the voice activated hyperlink scrolled into visibility on the display, recording that the voice activated hyperlink is currently visible and recording that the voice activated hyperlink was previously visible; and
      
      if the voice activated hyperlink scrolled out of visibility off the display, recording that the voice activated hyperlink is not visible and recording a current value of a scrolled-off-display counter as the ordinal indication of the sequence in which the voice activated hyperlink scrolled off display.
  - 12. The apparatus of claim 7 wherein selecting by the multimodal browser a voice activated hyperlink for activation further comprises:
    - identifying as ambiguous hyperlinks all voice activated hyperlinks that are voice enabled by grammar elements that are ambiguous with respect to the matched ambiguous terminal element of the speech recognition grammar;
      
      if only one ambiguous hyperlink is visible, selecting for activation the only visible ambiguous hyperlink;
      
      if no ambiguous hyperlink is visible and only one ambiguous hyperlink was previously visible, selecting for activation the ambiguous hyperlink that was previously visible; and
      
      if no ambiguous hyperlink is visible and more than one ambiguous hyperlink was previously visible, selecting for activation the most recently visible ambiguous hyperlink.

13. A computer program product for disambiguating a speech recognition grammar in a multimodal application, the multimodal application including voice activated hyperlinks, the voice activated hyperlinks being voice enabled by a speech recognition grammar comprising ambiguous terminal grammar elements, the multimodal application being operable in a multimodal browser on a multimodal device supporting multiple modes of user interaction with the multimodal device, the modes of user interaction including a voice mode and a visual mode, the multimodal browser being operatively coupled to a grammar interpreter, the computer program product disposed upon at least one recordable computer-readable medium, the computer program product comprising computer program instructions capable of:
- maintaining by the multimodal browser a record of visibility of each voice activated hyperlink, the record of visibility including current visibility and past visibility on a display of the multimodal device of each voice activated hyperlink, the record of visibility further including an ordinal indication, for each voice activated hyperlink scrolled off display, of the sequence in which each such voice activated hyperlink was scrolled off display;
  
  recognizing by the multimodal browser speech from a user matching an ambiguous terminal element of the speech recognition grammar; and
  
  selecting by the multimodal browser a voice activated hyperlink for activation, the selecting being carried out in dependence upon the recognized speech and the record of visibility.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The computer program product of claim 13 wherein:
    - the multimodal application comprises an X+V page that includes the voice activated hyperlinks; and
      
      each voice activated hyperlink further comprises an XHTML anchor element bound to a terminal element of the grammar by a value of an identifying attribute of the anchor element, the value of the identifying attribute being unique within the X+V page.
  - 15. The computer program product of claim 13 wherein:
    - the record of past visibility further comprises a boolean data element having a value of TRUE or FALSE, TRUE indicating that a voice activated hyperlink was previously visible on the display, FALSE indicating that a voice activated hyperlink was not previously visible on the display; and
      
      the record of current visibility further comprises a data element having ordinal values, the value zero indicating that a voice activated hyperlink is visible, other values taken from a scrolled-off-display counter indicating the sequence in which a voice activated hyperlink was scrolled off display.
  - 16. The computer program product of claim 13 wherein maintaining a record of visibility of each voice activated hyperlink, the record of visibility further including an ordinal indication, for each voice activated hyperlink scrolled off display, of the sequence in which each such voice activated hyperlink was scrolled off display, further comprises:
    - creating by the multimodal browser a scrolled-off-display counter, the scrolled-off-display counter being initialized to zero; and
      
      incrementing the scrolled-off-display counter when a visible voice activated hyperlink is scrolled off display.
  - 17. The computer program product of claim 13 wherein maintaining a record of visibility further comprises carrying out the following steps for each voice activated hyperlink on each scroll of the display:
    - if the voice activated hyperlink scrolled into visibility on the display, recording that the voice activated hyperlink is currently visible and recording that the voice activated hyperlink was previously visible; and
      
      if the voice activated hyperlink scrolled out of visibility off the display, recording that the voice activated hyperlink is not visible and recording a current value of a scrolled-off-display counter as the ordinal indication of the sequence in which the voice activated hyperlink scrolled off display.
  - 18. The computer program product of claim 13 wherein selecting by the multimodal browser a voice activated hyperlink for activation further comprises:
    - identifying as ambiguous hyperlinks all voice activated hyperlinks that are voice enabled by grammar elements that are ambiguous with respect to the matched ambiguous terminal element of the speech recognition grammar;
      
      if only one ambiguous hyperlink is visible, selecting for activation the only visible ambiguous hyperlink;
      
      if no ambiguous hyperlink is visible and only one ambiguous hyperlink was previously visible, selecting for activation the ambiguous hyperlink that was previously visible; and
      
      if no ambiguous hyperlink is visible and more than one ambiguous hyperlink was previously visible, selecting for activation the most recently visible ambiguous hyperlink.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Cross, Charles W. Jr., White, Marc T.
Primary Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US11/679,274
Publication Number

US 20080208590A1
Time in Patent Office

1,337 Days
Field of Search

None
US Class Current

704/270
CPC Class Codes

G10L 15/22 Procedures used during a sp...

Disambiguating a speech recognition grammar in a multimodal application

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Disambiguating a speech recognition grammar in a multimodal application

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links