Automated ontology development
First Claim
Patent Images
1. A method of automated ontology development via a computer system for processing communication data, wherein the ontology is a structural representation of language elements and the relationships between those language elements within a domain stored in memory of the computer system, the method comprising:
- processing a corpus of communication data by the computer system, the corpus comprising communication data from a plurality of interactions from multiple platforms;
extracting a plurality of terms from the corpus by the computer system, wherein each term of the plurality is a plurality of words that identify a single concept within the corpus;
automatedly generating an ontology by the computer system from the extracted terms by at least creating two context vectors for each of the plurality of terms and comparing the context vectors for each of the plurality of terms to one another to categorize the terms into a plurality of relations, wherein a first of the two context vectors of a given term is a first list of terms that predicts terms that will appear to the left of the given term, wherein a second of the two context vectors is a second list of terms that predicts terms that will appear to the right of the given term, wherein each of the context vectors includes up to a predetermined number of potential terms in the first or second list of terms taken in descending order based on a calculated score predictive of the likelihood that a specific term will appear adjacent to a given one of the plurality of terms within the meaning units; and
storing the automatedly generated ontology in an ontology database in the memory of the computer system.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods of automated ontology development include a corpus of communication data. The corpus of communication data includes communication data from a plurality of interactions and is processed. A plurality of terms are extracted from the corpus. Each term of the plurality is a plurality of words that identify a single concept within the corpus. An ontology is automatedly generated from the extracted terms.
-
Citations
20 Claims
-
1. A method of automated ontology development via a computer system for processing communication data, wherein the ontology is a structural representation of language elements and the relationships between those language elements within a domain stored in memory of the computer system, the method comprising:
-
processing a corpus of communication data by the computer system, the corpus comprising communication data from a plurality of interactions from multiple platforms; extracting a plurality of terms from the corpus by the computer system, wherein each term of the plurality is a plurality of words that identify a single concept within the corpus; automatedly generating an ontology by the computer system from the extracted terms by at least creating two context vectors for each of the plurality of terms and comparing the context vectors for each of the plurality of terms to one another to categorize the terms into a plurality of relations, wherein a first of the two context vectors of a given term is a first list of terms that predicts terms that will appear to the left of the given term, wherein a second of the two context vectors is a second list of terms that predicts terms that will appear to the right of the given term, wherein each of the context vectors includes up to a predetermined number of potential terms in the first or second list of terms taken in descending order based on a calculated score predictive of the likelihood that a specific term will appear adjacent to a given one of the plurality of terms within the meaning units; and storing the automatedly generated ontology in an ontology database in the memory of the computer system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 17, 18)
-
-
10. A method of automated ontology development via a computer system for processing communication data, wherein the ontology is a structural representation of language elements and the relationships between those language elements within a domain stored in memory of the computer system, the method comprising:
-
processing a corpus of communication data by the computer system, the corpus comprising communication data from a plurality of interactions from multiple platforms, by zoning the communication data to segment the communication data into a plurality of meaning units; extracting a plurality of terms from each of the plurality of meaning units by the computer system, wherein each term of the plurality is a plurality of words that identify a single concept within the corpus, wherein words in a given one of the meaning units are assigned to only one term; automatedly generating an ontology by the computer system that comprises the extracted terms by at least creating two context vectors for each of the plurality of terms and comparing the context vectors for each of the plurality of terms to one another to categorize the terms into a plurality of relations, wherein a first of the two context vectors of a given term is a first list of terms that predicts terms that will appear to the left of the given term, wherein a second of the two context vectors is a second list of terms that predicts terms that will appear to the right of the given term, wherein each of the context vectors includes up to a predetermined number of potential terms in the first or second list of terms taken in descending order based on a calculated score predictive of the likelihood that a specific term will appear adjacent to a given one of the plurality of terms within the meaning units; and storing the automatedly generated ontology in an ontology database. - View Dependent Claims (11, 12, 13, 19, 20)
-
-
14. A system for automated ontology development, wherein the ontology is a structural representation of language elements and the relationships between those language elements within a domain stored in memory, the system comprising:
-
a communication data database populated with communication data from interactions from multiple platforms; a processor communicatively connected to the database of communication data and communicatively connected to a computer readable medium programmed with computer readable code that upon execution by the processor causes the processor to; process a corpus of communication data received from the database through application of a rank filter to select the communication data from raw communication data, wherein the rank filter selects data files from the raw communication data that include a threshold of identified related terms to the domain of the ontology that is to be developed; zone the communication data to segment the communication data into a plurality of meaning units; extract a plurality of terms from the corpus on a meaning unit-by-meaning unit basis, wherein each term of the plurality is a plurality of words that identify a single concept within the corpus, wherein words in a given one of the meaning units are assigned to only one term; and automatedly generate the ontology from the extracted terms by at least creating two context vectors for each of the plurality of terms and comparing the context vectors for each of the plurality of terms to one another to categorize the terms into a plurality of relations, wherein a first of the two context vectors of a given term is a first list of terms that predicts terms that will appear to the left of the given term, wherein a second of the two context vectors predicts terms is a second list of terms that that will appear to the right of the given term, wherein each of the context vectors includes up to a predetermined number of potential terms in the first or second list of terms taken in descending order based on a calculated score predictive of the likelihood that a specific term will appear adjacent to a given one of the plurality of terms within the meaning units; and an ontology database upon which the processor stores the automatedly generated ontology. - View Dependent Claims (15, 16)
-
Specification