Compounded Text Segmentation

US 20140372119A1
Filed: 09/28/2009
Published: 12/18/2014
Est. Priority Date: 09/26/2008
Status: Abandoned Application

First Claim

Patent Images

1-9. -9. (canceled)

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In general, the subject matter described in this specification can be embodied in methods, systems, and program products for performing compounded text segmentation. Compounded text that is extracted from one or more search queries submitted to a search engine is received. The compounded text includes a plurality of individual words that are joined together without intervening spaces. An electronic dictionary including words is accessed. A data structure representing possible segmentations of the compounded text is generated based on whether words in the possible segmentations occur in the electronic dictionary. A data store comprising data associated with a same field of usage as the compounded text is accessed to determine a frequency of occurrence for possible segmentations of the data structure. A segmentation of the compounded text that is most probable based on the data is determined. A language model is trained using the determined segmentation of the compounded text.

Citations

44 Claims

1-9. -9. (canceled)

10. A computer-implemented method comprising:
- receiving, by a computing system, a textual uniform resource locator (URL) that was extracted from one or more text search queries that were submitted to a search engine, wherein the textual URL comprises a plurality of individual words that are joined together without intervening spaces;
  
  accessing, by the computing system, an electronic dictionary that includes a plurality of words;
  
  generating, by the computing system, a data structure that represents possible segmentations of the textual URL based on whether words in the possible segmentations occur in the electronic dictionary;
  
  determining, by the computing system, a segmentation of the textual URL that is a most probable segmentation of the textual URL based on a frequency of occurrence of each of the possible segmentations of the textual URL;
  
  receiving, by the computing system, audio data that includes a human spoken query and that was recorded by a microphone of a computing device;
  
  identifying, by the computing system and through use of a language model, a textual form of words in the spoken query;
  
  determining, by the computing system and in response to receiving the audio data, that the textual form of at least some of the words in the spoken query matches the determined segmentation of the textual URL; and
  
  transmitting, by the computing system and to a search engine system in response to determining that the textual form of at least some of the words in the spoken query matches the determined segmentation of the textual URL, a textual query that includes the textual URL.
- View Dependent Claims (26, 29, 30, 31, 32, 33, 34, 35)
- - 26. The computer-implemented method of claim 10, comprising:
    - sending, by the computing system and for receipt by the computing device, multiple search results that the search engine system determined were responsive to the textual query.
  - 29. The computer-implemented method of claim 10, wherein:
    - the textual form of the words in the spoken query includes a textual representation of a spoken word (dot), andthe textual URL that is included in the textual query includes a character (.) and does not include the textual representation of the spoken word (dot).
  - 30. The computer-implemented method of claim 10, wherein:
    - the textual form of the words in the spoken query includes one or more words in addition to the at least some of the words in the spoken query that match the determined segmentation of the textual URL; and
      
      the textual query, that the computing system transmits to the search engine system, includes the one or more words in addition to the textual URL.
  - 31. The computer-implemented method of claim 10, further comprising:
    - identifying that substrings of the one or more text search queries are URLs, wherein the textual URL is one of the substrings that has been identified as a URL.
  - 32. The computer-implemented method of claim 10, further comprising:
    - determining whether the textual form of the at least some of the words in the spoken query is associated with any URLs by comparing the textual form of the at least some of the words in the spoken query with determined segmentations of multiple different URLs.
  - 33. The computer-implemented method of claim 10, wherein:
    - the language model has been trained using (i) the segmentation of the textual URL that has been determined to be the most probable segmentation of the textual URL, and (ii) segmentations of other URLs that have been determined to be the most probable segmentations of the other URLs.
  - 34. The computer-implemented method of claim 10, wherein users typed the one or more text search queries.
  - 35. The computer-implemented method of claim 10, further comprising:
    - identifying constituent words in the textual URL, comprising;
      
      (a) the receiving the textual URL,(b) the accessing the electronic dictionary,(c) the generating the data structure, and(d) the determining the segmentation of the textual URL; and
      
      generating the textual URL from the human spoken query, comprising;
      
      (e) the receiving the audio data,(f) the identifying the textual form of the words in the spoken query,(g) the determining that the textual form of the at least some of the words in the spoken query matches the determined segmentation of the textual URL, and(h) the transmitting the textual query.

11-25. -25. (canceled)

27-28. -28. (canceled)

36. One or more computer-readable media including instructions that, when executed by one or more programmable processors, perform operations that comprise:
- receiving, by a computing system, a textual uniform resource locator (URL) that was extracted from one or more text search queries that were submitted to a search engine, wherein the textual URL comprises a plurality of individual words that are joined together without intervening spaces;
  
  accessing, by the computing system, an electronic dictionary that includes a plurality of words;
  
  generating, by the computing system, a data structure that represents possible segmentations of the textual URL based on whether words in the possible segmentations occur in the electronic dictionary;
  
  determining, by the computing system, a segmentation of the textual URL that is a most probable segmentation of the textual URL based on a frequency of occurrence of each of the possible segmentations of the textual URL;
  
  receiving, by the computing system, audio data that includes a human spoken query and that was recorded by a microphone of a computing device;
  
  identifying, by the computing system and through use of a language model, a textual form of words in the spoken query;
  
  determining, by the computing system and in response to receiving the audio data, that the textual form of at least some of the words in the spoken query matches the determined segmentation of the textual URL; and
  
  transmitting, by the computing system and to a search engine system in response to determining that the textual form of at least some of the words in the spoken query matches the determined segmentation of the textual URL, a textual query that includes the textual URL that includes the plurality of individual words that are joined together without intervening spaces.
- View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44)
- - 37. The one or more computer-readable media of claim 36, wherein the operations further comprise:
    - sending, by the computing system and for receipt by the computing device, multiple search results that the search engine system determined were responsive to the textual query.
  - 38. The one or more computer-readable media of claim 36, wherein:
    - the textual form of the words in the spoken query includes a textual representation of a spoken word (dot), andthe textual URL that is included in the textual query includes a character (.) and does not include the textual representation of the spoken word (dot).
  - 39. The one or more computer-readable media of claim 36, wherein:
    - the textual form of the words in the spoken query includes one or more words in addition to the at least some of the words in the spoken query that match the determined segmentation of the textual URL; and
      
      the textual query, that the computing system transmits to the search engine system, includes the one or more words in addition to the textual URL
  - 40. The one or more computer-readable media of claim 36, wherein the operations further comprise:
    - identifying that substrings of the one or more text search queries are URLs, wherein the textual URL is one of the substrings that has been identified as a URL.
  - 41. The one or more computer-readable media of claim 36, wherein the operations further comprise:
    - determining whether the textual form of the at least some of the words in the spoken query is associated with any URLs by comparing the textual form of the at least some of the words in the spoken query with determined segmentations of multiple different URLs.
  - 42. The one or more computer-readable media of claim 36, wherein:
    - the language model has been trained using (i) the segmentation of the textual URL that has been determined to be the most probable segmentation of the textual URL, and (ii) segmentations of other URLs that have been determined to be the most probable segmentations of the other URLs.
  - 43. The one or more computer-readable media of claim 36, wherein users typed the one or more text search queries.
  - 44. The computer-implemented method of claim 36, wherein the operations further comprise:
    - identifying constituent words in the textual URL, comprising;
      
      (a) the receiving the textual URL,(b) the accessing the electronic dictionary,(c) the generating the data structure representing, and(d) the determining the segmentation of the textual URL; and
      
      generating the textual URL from the human spoken query, comprising;
      
      (e) the receiving the audio data,(f) the identifying the textual form of the words in the spoken query,(g) the determining that the textual form of the at least some of the words in the spoken query matches the determined segmentation of the textual URL, and(h) the transmitting the textual query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Harb, Boulos, Schalkwyk, Johan, Parada, Carolina

Application Number

US12/568,014
Publication Number

US 20140372119A1
Time in Patent Office

Days
Field of Search
US Class Current

704/246
CPC Class Codes

G06F 16/957 Browsing optimisation, e.g....

Compounded Text Segmentation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

44 Claims

Specification

Solutions

Use Cases

Quick Links

Compounded Text Segmentation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

44 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links