Speech-enabled web content searching using a multimodal browser

US 8,843,376 B2
Filed: 03/13/2007
Issued: 09/23/2014
Est. Priority Date: 03/13/2007
Status: Active Grant

First Claim

Patent Images

1. A method of speech-enabled searching of web content using a multimodal browser, the method implemented with one or more grammars in an automatic speech recognition (‘

ASR’

) engine, with the multimodal browser operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal browser operatively coupled to the ASR engine, the method comprising;

rendering, by the multimodal browser, web content;

searching, by the multimodal browser, the rendered web content for a search phrase, including matching the search phrase to at least one portion of the rendered web content, yielding a matched search result, the search phrase specified by a first voice utterance received from a user and a search grammar; and

in response to a second voice utterance received from the user;

using an action grammar comprising one or more entries to recognize the second voice utterance as corresponding to a first entry of the one or more entries, the action grammar specifying,for the first entry of the one or more entries, an associated first action to be taken in dependence upon the matched search result, andfor a second entry of the one or more entries, an associated second action to be taken in dependence upon the same matched search result, the second action being different from the first action, andperforming, by the multimodal browser, the first action in dependence upon the matched search result associated with the first entry.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech-enabled web content searching using a multimodal browser implemented with one or more grammars in an automatic speech recognition (‘ASR’) engine, with the multimodal browser operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal browser operatively coupled to the ASR engine, includes: rendering, by the multimodal browser, web content; searching, by the multimodal browser, the web content for a search phrase, including yielding a matched search result, the search phrase specified by a first voice utterance received from a user and a search grammar; and performing, by the multimodal browser, an action in dependence upon the matched search result, the action specified by a second voice utterance received from the user and an action grammar.

347 Citations

20 Claims

1. A method of speech-enabled searching of web content using a multimodal browser, the method implemented with one or more grammars in an automatic speech recognition (‘
- ASR’
  
  ) engine, with the multimodal browser operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal browser operatively coupled to the ASR engine, the method comprising;
  
  rendering, by the multimodal browser, web content;
  
  searching, by the multimodal browser, the rendered web content for a search phrase, including matching the search phrase to at least one portion of the rendered web content, yielding a matched search result, the search phrase specified by a first voice utterance received from a user and a search grammar; and
  
  in response to a second voice utterance received from the user;
  
  using an action grammar comprising one or more entries to recognize the second voice utterance as corresponding to a first entry of the one or more entries, the action grammar specifying,for the first entry of the one or more entries, an associated first action to be taken in dependence upon the matched search result, andfor a second entry of the one or more entries, an associated second action to be taken in dependence upon the same matched search result, the second action being different from the first action, andperforming, by the multimodal browser, the first action in dependence upon the matched search result associated with the first entry.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein searching, by the multimodal browser, the web content for a search phrase, including yielding a matched search result further comprises:
    - creating the search grammar in dependence upon the web content;
      
      receiving the first voice utterance from a user; and
      
      determining, using the ASR engine, the search phrase in dependence upon the first voice utterance and the search grammar.
  - 3. The method of claim 2 wherein matching the search phrase to at least one portion of the web content, yielding a matched search result further comprises identifying a node of a Document Object Model (‘
    - DOM’
      
      ) representing the web content that contains the search phrase.
  - 4. The method of claim 1 wherein performing, by the multimodal browser, an action in dependence upon the matched search result further comprises:
    - creating the action grammar in dependence upon the matched search result;
      
      receiving the second voice utterance from the user;
      
      determining, using the ASR engine, an action identifier in dependence upon the second voice utterance and the action grammar; and
      
      performing the specified action in dependence upon the action identifier.
  - 5. The method of claim 1 further comprising augmenting, by the multimodal browser, the matched search result with additional web content.
  - 6. The method of claim 5 wherein augmenting, by the multimodal browser, the matched search result with additional web content further comprises inserting the additional web content into a node of a Document Object Model (‘
    - DOM’
      
      ) representing the web content that contains the matched search result.
  - 7. The method of claim 1 wherein the web content is not speech-enabled.

8. Apparatus for speech-enabled searching of web content using a multimodal browser operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal browser operatively coupled to an automatic speech recognition (‘
- ASR’
  
  ) engine, the apparatus comprising;
  
  a computer processor; and
  
  a computer memory operatively coupled to the computer processor, the computer memory having stored thereon computer program instructions that, when executed by the computer processor, perform a method comprising acts of;
  
  rendering, by the multimodal browser, web content;
  
  searching, by the multimodal browser, the rendered web content for a search phrase, including matching the search phrase to at least one portion of the rendered web content, yielding a matched search result, the search phrase specified by a first voice utterance received from a user and a search grammar; and
  
  in response to a second voice utterance received from the user;
  
  using an action grammar comprising one or more entries to recognize the second voice utterance as corresponding to a first entry of the one or more entries, the action grammar specifying,for the first entry of the one or more entries, an associated first action to be taken in dependence upon the matched search result, andfor a second entry of the one or more entries, an associated second action to be taken in dependence upon the same matched search result, the second action being different from the first action, andperforming, by the multimodal browser, the first action in dependence upon the matched search result associated with the first entry.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The apparatus of claim 8 wherein searching, by the multimodal browser, the web content for a search phrase, including yielding a matched search result further comprises:
    - creating the search grammar in dependence upon the web content;
      
      receiving the first voice utterance from a user; and
      
      determining, using the ASR engine, the search phrase in dependence upon the first voice utterance and the search grammar.
  - 10. The apparatus of claim 9 wherein matching the search phrase to at least one portion of the web content, yielding a matched search result further comprises identifying a node of a Document Object Model (‘
    - DOM’
      
      ) representing the web content that contains the search phrase.
  - 11. The apparatus of claim 8 wherein performing, by the multimodal browser, an action in dependence upon the matched search result further comprises:
    - creating the action grammar in dependence upon the matched search result;
      
      receiving the second voice utterance from the user;
      
      determining, using the ASR engine, an action identifier in dependence upon the second voice utterance and the action grammar; and
      
      performing the specified action in dependence upon the action identifier.
  - 12. The apparatus of claim 8 further comprising computer program instructions capable of augmenting, by the multimodal browser, the matched search result with additional web content.
  - 13. The apparatus of claim 12 wherein augmenting, by the multimodal browser, the matched search result with additional web content further comprises inserting the additional web content into a node of a Document Object Model (‘
    - DOM’
      
      ) representing the web content that contains the matched search result.

14. A computer-readable recordable medium encoded with instructions that, when executed, perform a method for speech-enabled searching of web content using a multimodal browser operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal browser operatively coupled to an automatic speech recognition (‘
- ASR’
  
  ) engine, the method comprising acts of;
  
  rendering, by the multimodal browser, web content;
  
  searching, by the multimodal browser, the rendered web content for a search phrase, including matching the search phrase to at least one portion of the rendered web content, yielding a matched search result, the search phrase specified by a first voice utterance received from a user and a search grammar; and
  
  in response to a second voice utterance received from the user;
  
  using an action grammar comprising one or more entries to recognize the second voice utterance as corresponding to a first entry of the one or more entries, the action grammar specifying,for the first entry of the one or more entries, an associated first action to be taken in dependence upon the matched search result, andfor a second entry of the one or more entries, an associated second action to be taken in dependence upon the same matched search result, the second action being different from the first action, andperforming, by the multimodal browser, the first action in dependence upon the matched search result associated with the first entry.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The computer-readable recordable medium of claim 14 wherein searching, by the multimodal browser, the web content for a search phrase, including yielding a matched search result further comprises:
    - creating the search grammar in dependence upon the web content;
      
      receiving the first voice utterance from a user; and
      
      determining, using the ASR engine, the search phrase in dependence upon the first voice utterance and the search grammar.
  - 16. The computer-readable recordable medium of claim 15 wherein matching the search phrase to at least one portion of the web content, yielding a matched search result further comprises identifying a node of a Document Object Model (‘
    - DOM’
      
      ) representing the web content that contains the search phrase.
  - 17. The computer-readable recordable medium of claim 14 wherein performing, by the multimodal browser, an action in dependence upon the matched search result further comprises:
    - creating the action grammar in dependence upon the matched search result;
      
      receiving the second voice utterance from the user;
      
      determining, using the ASR engine, an action identifier in dependence upon the second voice utterance and the action grammar; and
      
      performing the specified action in dependence upon the action identifier.
  - 18. The computer-readable recordable medium of claim 14 further comprising computer program instructions capable of augmenting, by the multimodal browser, the matched search result with additional web content.
  - 19. The computer-readable recordable medium of claim 18 wherein augmenting, by the multimodal browser, the matched search result with additional web content further comprises inserting the additional web content into a node of a Document Object Model (‘
    - DOM’
      
      ) representing the web content that contains the matched search result.
  - 20. The computer-readable recordable medium of claim 14 wherein the web content is not speech-enabled.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Cross, Charles W. Jr.
Primary Examiner(s)
He, Jialong

Application Number

US11/685,350
Publication Number

US 20080228494A1
Time in Patent Office

2,751 Days
Field of Search

704/270, 704/270.1, 704/275
US Class Current

704/275
CPC Class Codes

G06F 16/957 Browsing optimisation, e.g....

G10L 15/22 Procedures used during a sp...

Speech-enabled web content searching using a multimodal browser

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

347 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech-enabled web content searching using a multimodal browser

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

347 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links