Natural language vocabulary generation and usage
First Claim
Patent Images
1. A method implemented at least partially in hardware of a computing device, the method comprising:
- mining, by the computing device, one or more search results returned from a search engine for a particular one of a plurality of domains to determine a frequency at which words occur in the one or more search results relating to the particular domain, respectively;
selecting, by the computing device, a set of the words from the search results of the particular domain based on the determined frequency;
assigning, by the computing device, a sense to each of the selected set of the words that identifies a part-of-speech for a respective said word, the sense based in part on the particular domain; and
generating, by the computing device, a vocabulary for the particular domain that includes the selected set of the words and a respective said sense, the vocabulary describing a term sense bias as applied to a term semantic distance, the vocabulary configured for use in natural language processing to disambiguate natural language input according to the term sense bias applied to the term semantic distance of the particular domain.
2 Assignments
0 Petitions
Accused Products
Abstract
Natural language vocabulary generation and usage techniques are described. In one or more implementations, one or more search results are mined for a domain to determine a frequency at which words occur in the one or more search results, respectively. A set of the words is selected based on the determined frequency. A sense is assigned to each of the selected set of the words that identifies a part-of-speech for a respective word. A vocabulary is then generated that includes the selected set of the words and a respective said sense, the vocabulary configured for use in natural language processing associated with the domain.
77 Citations
38 Claims
-
1. A method implemented at least partially in hardware of a computing device, the method comprising:
-
mining, by the computing device, one or more search results returned from a search engine for a particular one of a plurality of domains to determine a frequency at which words occur in the one or more search results relating to the particular domain, respectively; selecting, by the computing device, a set of the words from the search results of the particular domain based on the determined frequency; assigning, by the computing device, a sense to each of the selected set of the words that identifies a part-of-speech for a respective said word, the sense based in part on the particular domain; and generating, by the computing device, a vocabulary for the particular domain that includes the selected set of the words and a respective said sense, the vocabulary describing a term sense bias as applied to a term semantic distance, the vocabulary configured for use in natural language processing to disambiguate natural language input according to the term sense bias applied to the term semantic distance of the particular domain. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method implemented at least partially in hardware of one or more computing devices, the method comprising:
-
using a spreading activation algorithm, by the one or more computing devices, to generate a set of words relating to a particular one of a plurality of domains to determine which words are most frequently used in the particular domain; biasing, by the one or more computing devices, the set of words by assigning a sense based on one or more lexicon ontologies to each of the words that identifies a part-of-speech of a respective said word for the particular domain, the biasing configured to reflect a sense corresponding to how each of the words are used in the particular domain; and configuring, by the one or more computing devices, the set of words and the assigned sense for each of the words to form a vocabulary that is configured for use in natural language processing for the particular domain, the vocabulary describing the assigned sense bias for the particular domain as applied to a term semantic distance. - View Dependent Claims (11, 12, 13, 14)
-
-
15. One or more computer-readable hardware storage media comprising instructions stored thereon that, responsive to execution by a computing device, causes the computing device to perform operations comprising:
generating, at least partially in hardware of the computing device, a vocabulary for use in natural language processing for a particular one of a plurality of domains by; building an N-gram tree to represent frequency of occurrence of words in search results involving the particular domain, the search results obtained using a spreading activation algorithm constrained to the particular domain; selecting a set of the words based on the represented frequency of occurrence in the N-gram tree; and assigning a sense to each of the words in the set that identifies a part-of-speech of a respective said word, the sense reflecting, for each of the words, how each of the words are used in the particular domain of the plurality of domains. - View Dependent Claims (16, 17, 18)
-
19. A method implemented at least partially in hardware of a computing device comprising:
-
receiving a natural language input by the computing device; processing the natural language input by the computing device to identify an image editing operation, the processing performed using a vocabulary that includes a set of words and a sense that identifies a part-of-speech for each of the words that is biased for a domain that includes image editing operations, the natural language input not being part of the set of words of the vocabulary, the processing comprising disambiguating the natural language input according to a term sense bias applied to a term semantic distance, the term sense bias relating to the sense that identifies the part-of-speech for each of the words based on the domain that includes image editing operations; and performing the image editing operation by the computing device. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A method implemented at least partially in hardware of one or more computing devices, the method comprising:
-
identifying, by the one or more computing devices, a word parsed from a natural language input that is not included in a vocabulary associated with an image editing operation domain; and disambiguating, by the one or more computing devices, the parsed word using the vocabulary to identify non-matching words to accomplish image editing operations, the disambiguating based on; semantic distance between the parsed word and a plurality of words included in the vocabulary, the semantic distance based at least in part on a distance matrix computed between the parsed word and one or more of the plurality of words included in the vocabulary, the distance matrix comprising semantic distances between terms for respective parts of speech; and similarity of lexical functional groups of the parsed word and the plurality of words. - View Dependent Claims (30, 31, 32, 33, 34)
-
-
35. One or more computer-readable hardware storage media comprising instructions stored thereon that, responsive to execution at least partially in hardware by a computing device, causes the computing device to perform operations comprising:
-
receiving a word via a natural language input, the received word not being included in an image editing operation vocabulary; determining which of a plurality of words included in the image editing operation vocabulary is considered to be more similar to the received word than other said words in the image editing operation vocabulary, the determining based at least in part on; semantic distance between the received word and a plurality of words included in the vocabulary, the semantic distance based at least in part on a distance matrix computed between the received word and one or more of the plurality of words included in the image editing operation vocabulary, the distance matrix comprising semantic distances between terms for respective parts of speech; and similarity of lexical functional groups of the received word and the plurality of words; and identifying an image editing operation based on the determining. - View Dependent Claims (36, 37, 38)
-
Specification