Automatic generation of domain models for virtual personal assistants
First Claim
1. A computing device for domain model creation, the computing device comprising:
- a web corpus module to access an n-gram index of a web corpus, wherein the web corpus includes a plurality of entities, wherein the n-gram index is indicative of a plurality of n-grams, wherein each n-gram comprises a predetermined number n of consecutive entities in the web corpus, and wherein the n-gram index is further indicative of a plurality of entities of each n-gram and a frequency of each n-gram;
a semantic graph module to generate a semantic graph of the web corpus using the n-gram index of the web corpus, wherein the semantic graph is rooted by a predefined seed entity and includes a first plurality of related entities, wherein each of the first plurality of related entities is grammatically related to the seed entity and each of the first plurality of related entities is included in a corresponding n-gram of the web corpus that also includes the seed entity, and wherein to generate the semantic graph comprises to;
retrieve a first plurality of n-grams from the web corpus using the n-gram index, wherein each of the first plurality of n-grams includes the seed entity;
tag each entity of the first plurality of n-grams for part-of-speech; and
identify a grammatical relationship between the seed entity and each of the first plurality of related entities in response to tagging of each entity, wherein each of the first plurality of related entities is included in the first plurality of n-grams;
a similarity discovery module to analyze the web corpus using the semantic graph to identify and rank contextual synonyms for entities within a domain, wherein the semantic graph is further expanded using the ranked contextual synonyms;
an intent discovery module to analyze the web corpus using the semantic graph to identify intents and intent patterns in the domain, wherein each intent is associated with a domain action, and each intent pattern matches query features and a corresponding intent; and
a slot discovery module to analyze the web corpus using the semantic graph to identify slots, slot patterns, and slot values in the domain, wherein each slot is associated with a parameter of an intent or an entity, each slot pattern matches query features and a corresponding slot, and each slot value is associated with an entity.
1 Assignment
0 Petitions
Accused Products
Abstract
Technologies for automatic domain model generation include a computing device that accesses an n-gram index of a web corpus. The computing device generates a semantic graph of the web corpus for a relevant domain using the n-gram index. The semantic graph includes one or more related entities that are related to a seed entity. The computing device performs similarity discovery to identify and rank contextual synonyms within the domain. The computing device maintains a domain model including intents representing actions in the domain and slots representing parameters of actions or entities in the domain. The computing device performs intent discovery to discover intents and intent patterns by analyzing the web corpus using the semantic graph. The computing device performs slot discovery to discover slots, slot patterns, and slot values by analyzing the web corpus using the semantic graph. Other embodiments are described and claimed.
-
Citations
25 Claims
-
1. A computing device for domain model creation, the computing device comprising:
-
a web corpus module to access an n-gram index of a web corpus, wherein the web corpus includes a plurality of entities, wherein the n-gram index is indicative of a plurality of n-grams, wherein each n-gram comprises a predetermined number n of consecutive entities in the web corpus, and wherein the n-gram index is further indicative of a plurality of entities of each n-gram and a frequency of each n-gram; a semantic graph module to generate a semantic graph of the web corpus using the n-gram index of the web corpus, wherein the semantic graph is rooted by a predefined seed entity and includes a first plurality of related entities, wherein each of the first plurality of related entities is grammatically related to the seed entity and each of the first plurality of related entities is included in a corresponding n-gram of the web corpus that also includes the seed entity, and wherein to generate the semantic graph comprises to; retrieve a first plurality of n-grams from the web corpus using the n-gram index, wherein each of the first plurality of n-grams includes the seed entity; tag each entity of the first plurality of n-grams for part-of-speech; and identify a grammatical relationship between the seed entity and each of the first plurality of related entities in response to tagging of each entity, wherein each of the first plurality of related entities is included in the first plurality of n-grams; a similarity discovery module to analyze the web corpus using the semantic graph to identify and rank contextual synonyms for entities within a domain, wherein the semantic graph is further expanded using the ranked contextual synonyms; an intent discovery module to analyze the web corpus using the semantic graph to identify intents and intent patterns in the domain, wherein each intent is associated with a domain action, and each intent pattern matches query features and a corresponding intent; and a slot discovery module to analyze the web corpus using the semantic graph to identify slots, slot patterns, and slot values in the domain, wherein each slot is associated with a parameter of an intent or an entity, each slot pattern matches query features and a corresponding slot, and each slot value is associated with an entity. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for domain model creation, the method comprising:
-
generating, by a computing device, a semantic graph of a web corpus using an n-gram index of the web corpus, wherein the web corpus includes a plurality of entities, wherein the n-gram index is indicative of a plurality of n-grams, wherein each n-gram comprises a predetermined number n of consecutive entities in the web corpus, and wherein the n-gram index is further indicative of a plurality of entities of each n-gram and a frequency of each n-gram, and wherein the semantic graph is rooted by a predefined seed entity and includes a first plurality of related entities, wherein each of the first plurality of related entities is grammatically related to the seed entity and each of the first plurality of related entities is included in a corresponding n-gram of the web corpus that also includes the seed entity, wherein generating the semantic graph comprises; retrieving a first plurality of n-grams from the web corpus using the n-gram index, wherein each of the first plurality of n-grams includes the seed entity; tagging each entity of the first plurality of n-grams for part-of-speech; and identifying a grammatical relationship between the seed entity and each of the first plurality of related entities in response to tagging each entity, wherein each of the first plurality of related entities is included in the first plurality of n-grams; analyzing, by the computing device, the web corpus using the semantic graph to identify and rank contextual synonyms for entities within a domain, wherein the semantic graph is further expanded using the ranked contextual synonyms; analyzing, by the computing device, the web corpus using the semantic graph to identify intents and intent patterns in the domain, wherein each intent is associated with a domain action, and each intent pattern matches query features and a corresponding intent; and analyzing, by the computing device, the web corpus using the semantic graph to identify slots, slot patterns, and slot values in the domain, wherein each slot is associated with a parameter of an intent or an entity, each slot pattern matches query features and a corresponding slot, and each slot value is associated with an entity. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. One or more non-transitory, computer-readable storage media comprising a plurality of instructions that in response to being executed cause a computing device to:
-
generate a semantic graph of a web corpus using an n-gram index of the web corpus, wherein the web corpus includes a plurality of entities, wherein the n-gram index is indicative of a plurality of n-grams, wherein each n-gram comprises a predetermined number n of consecutive entities in the web corpus, and wherein the n-gram index is indicative of a plurality of entities of each n-gram and a frequency of each n-gram, and wherein the semantic graph is rooted by a predefined seed entity and includes a first plurality of related entities, wherein each of the first plurality of related entities is grammatically related to the seed entity and each of the first plurality of related entities is included in a corresponding n-gram of the web corpus that also includes the seed entity, wherein to generate the semantic graph comprises to; retrieve a first plurality of n-grams from the web corpus using the n-gram index, wherein each of the first plurality of n-grams includes the seed entity; tag each entity of the first plurality of n-grams for part-of-speech; and identify a grammatical relationship between the seed entity and each of the first plurality of related entities in response to tagging each entity, wherein each of the first plurality of related entities is included in the first plurality of n-grams; analyze the web corpus using the semantic graph to identify and rank contextual synonyms for entities within a domain, wherein the semantic graph is further expanded using the ranked contextual synonyms; analyze the web corpus using the semantic graph to identify intents and intent patterns in the domain, wherein each intent is associated with a domain action, and each intent pattern matches query features and a corresponding intent; and analyze the web corpus using the semantic graph to identify slots, slot patterns, and slot values in the domain, wherein each slot is associated with a parameter of an intent or an entity, each slot pattern matches query features and a corresponding slot, and each slot value is associated with an entity. - View Dependent Claims (20, 21, 22, 23, 24, 25)
-
Specification