Semantic exploration and discovery
First Claim
1. A method for exploring and organizing a first electronic corpus of documents stored in a computer storage medium, the method comprising the steps of:
- performing at least one of reviewing the text of the documents from the first electronic corpus of documents in a concordance form, collecting terms from the first electronic corpus of documents in order to build semantically related terms, or collecting documents from the first electronic corpus of documents in order to build semantically related documents clusters;
creating a first set, the first set having at least one category applying to at least one of the words and phrases in gazetteers, or at least one document in the semantically related document clusters;
creating a second set, the second set having at least one of a candidate document cluster or a candidate words and phrases list;
evaluating the second set based upon a set of predetermined factors in order to create a third set, where the third set includes at least one document semantically related to the candidate clusters or at least one semantically related word and phrase related to the candidate words and phrases that meet at least one of the predetermined factors; and
selectively substituting the third set for the first set in a subsequent iteration of the method for exploring.
2 Assignments
0 Petitions
Accused Products
Abstract
A semantic discovery and exploration system is disclosed where an environment enabling a developer or user to uncover, navigate, and organize semantic patterns and structures in a document collection with or without the aid of structured knowledge. The semantic discovery and exploration system provides techniques for searching document collections, categorizing documents, inducing lists of related concepts, and identifying clusters of related terms and documents. This system operates both without and with infusions of structured knowledge such as gazetteers, thesauruses, taxonomies and ontologies. System performance improves when structured knowledge is incorporated. The semantic discovery and exploration system may be used as a first step in developing an information extraction system such as to categorize or cluster documents in a particular domain or to develop gazetteers and as a part of a deployed run-time information extraction system. It may also be used as standalone utility for searching, navigating, and organizing document collections and structured knowledge bases such as dictionaries or domain-specific reference works.
-
Citations
25 Claims
-
1. A method for exploring and organizing a first electronic corpus of documents stored in a computer storage medium, the method comprising the steps of:
-
performing at least one of reviewing the text of the documents from the first electronic corpus of documents in a concordance form, collecting terms from the first electronic corpus of documents in order to build semantically related terms, or collecting documents from the first electronic corpus of documents in order to build semantically related documents clusters;
creating a first set, the first set having at least one category applying to at least one of the words and phrases in gazetteers, or at least one document in the semantically related document clusters;
creating a second set, the second set having at least one of a candidate document cluster or a candidate words and phrases list;
evaluating the second set based upon a set of predetermined factors in order to create a third set, where the third set includes at least one document semantically related to the candidate clusters or at least one semantically related word and phrase related to the candidate words and phrases that meet at least one of the predetermined factors; and
selectively substituting the third set for the first set in a subsequent iteration of the method for exploring. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A computer system for exploring and organizing an electronic corpus of documents stored in a computer storage medium, the computer system comprising:
-
an exploration resource module for managing the semantic exploration process;
an exploration engine in communication with the exploration resource module a document management module in communication with the exploration resource module, the document management module for feeding document sets into the exploration resource module;
an ontology management module in communication with the exploration resource module, the ontology management module for feeding and receiving sets of ontologies to and from the exploration resource module;
a resource definitions database in communication with the exploration resource module, the resource definitions database for feeding and receiving sets of definitions into and from the exploration resource module;
a query manager module in communication with the exploration engine, the query manager module having access to a query definitions database and a query results database, the query manager module foe feeding and receiving query information to and from the exploration engine, and an exploration interface in communication with the exploration engine, the exploration interface for displaying information related to the semantic exploration process. - View Dependent Claims (24, 25)
-
Specification