Compounded Text Segmentation
2 Assignments
0 Petitions
Accused Products
Abstract
In general, the subject matter described in this specification can be embodied in methods, systems, and program products for performing compounded text segmentation. Compounded text that is extracted from one or more search queries submitted to a search engine is received. The compounded text includes a plurality of individual words that are joined together without intervening spaces. An electronic dictionary including words is accessed. A data structure representing possible segmentations of the compounded text is generated based on whether words in the possible segmentations occur in the electronic dictionary. A data store comprising data associated with a same field of usage as the compounded text is accessed to determine a frequency of occurrence for possible segmentations of the data structure. A segmentation of the compounded text that is most probable based on the data is determined. A language model is trained using the determined segmentation of the compounded text.
-
Citations
44 Claims
-
1-9. -9. (canceled)
-
10. A computer-implemented method comprising:
-
receiving, by a computing system, a textual uniform resource locator (URL) that was extracted from one or more text search queries that were submitted to a search engine, wherein the textual URL comprises a plurality of individual words that are joined together without intervening spaces; accessing, by the computing system, an electronic dictionary that includes a plurality of words; generating, by the computing system, a data structure that represents possible segmentations of the textual URL based on whether words in the possible segmentations occur in the electronic dictionary; determining, by the computing system, a segmentation of the textual URL that is a most probable segmentation of the textual URL based on a frequency of occurrence of each of the possible segmentations of the textual URL; receiving, by the computing system, audio data that includes a human spoken query and that was recorded by a microphone of a computing device; identifying, by the computing system and through use of a language model, a textual form of words in the spoken query; determining, by the computing system and in response to receiving the audio data, that the textual form of at least some of the words in the spoken query matches the determined segmentation of the textual URL; and transmitting, by the computing system and to a search engine system in response to determining that the textual form of at least some of the words in the spoken query matches the determined segmentation of the textual URL, a textual query that includes the textual URL. - View Dependent Claims (26, 29, 30, 31, 32, 33, 34, 35)
-
-
11-25. -25. (canceled)
-
27-28. -28. (canceled)
-
36. One or more computer-readable media including instructions that, when executed by one or more programmable processors, perform operations that comprise:
-
receiving, by a computing system, a textual uniform resource locator (URL) that was extracted from one or more text search queries that were submitted to a search engine, wherein the textual URL comprises a plurality of individual words that are joined together without intervening spaces; accessing, by the computing system, an electronic dictionary that includes a plurality of words; generating, by the computing system, a data structure that represents possible segmentations of the textual URL based on whether words in the possible segmentations occur in the electronic dictionary; determining, by the computing system, a segmentation of the textual URL that is a most probable segmentation of the textual URL based on a frequency of occurrence of each of the possible segmentations of the textual URL; receiving, by the computing system, audio data that includes a human spoken query and that was recorded by a microphone of a computing device; identifying, by the computing system and through use of a language model, a textual form of words in the spoken query; determining, by the computing system and in response to receiving the audio data, that the textual form of at least some of the words in the spoken query matches the determined segmentation of the textual URL; and transmitting, by the computing system and to a search engine system in response to determining that the textual form of at least some of the words in the spoken query matches the determined segmentation of the textual URL, a textual query that includes the textual URL that includes the plurality of individual words that are joined together without intervening spaces. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44)
-
Specification