Corpus search systems and methods
First Claim
1. A computer-implemented method for searching a corpus of texts relating to a domain of knowledge, the method comprising:
- determining, by said computer, a multiplicity of noun-pair proximity scores measuring associations between pairs of nouns that appear in said corpus of texts and that are semantically related to said domain of knowledge;
obtaining, by said computer, a search term related to said domain of knowledge;
identifying, by said computer based at least in part on said multiplicity of noun-pair proximity scores, a related noun that is strongly associated with said search term within said corpus of texts;
selecting, by said computer, a plurality of texts from said corpus of texts, wherein, in each of said plurality of texts, said search term and said related noun appear near each other in at least one place; and
providing, by said computer, data associated with said plurality of texts for presentation as search results; and
wherein determining said multiplicity of noun-pair proximity scores comprises, for a given text of said corpus of texts;
parsing said given text to identify an independent clause that appears in said given text and that includes at least a first noun and a second noun;
determining a measure of intra-clause proximity based at least in part on said first noun'"'"'s relationship to said second noun within said independent clause; and
assigning said determined measure of intra-clause proximity to a noun-pair-score data structure corresponding to said first noun and said second noun.
1 Assignment
0 Petitions
Accused Products
Abstract
A corpus of texts relating to a domain of knowledge may be searched by determining noun-pair proximity scores measuring associations between pairs of nouns that appear in the corpus and that are semantically related to the domain of knowledge. When a search term is received, the noun-pair proximity scores may be used (at least in part) to identify one or more related nouns that are strongly associated with the search term within the corpus. One or more texts may be selected from the corpus, texts in which the search term and the related nouns appear near each other in one or more places. The selected texts may be categorized and/or clustered based on the related nouns before being returned for presentation as SearchResults.
26 Citations
18 Claims
-
1. A computer-implemented method for searching a corpus of texts relating to a domain of knowledge, the method comprising:
-
determining, by said computer, a multiplicity of noun-pair proximity scores measuring associations between pairs of nouns that appear in said corpus of texts and that are semantically related to said domain of knowledge; obtaining, by said computer, a search term related to said domain of knowledge; identifying, by said computer based at least in part on said multiplicity of noun-pair proximity scores, a related noun that is strongly associated with said search term within said corpus of texts; selecting, by said computer, a plurality of texts from said corpus of texts, wherein, in each of said plurality of texts, said search term and said related noun appear near each other in at least one place; and providing, by said computer, data associated with said plurality of texts for presentation as search results; and wherein determining said multiplicity of noun-pair proximity scores comprises, for a given text of said corpus of texts; parsing said given text to identify an independent clause that appears in said given text and that includes at least a first noun and a second noun; determining a measure of intra-clause proximity based at least in part on said first noun'"'"'s relationship to said second noun within said independent clause; and assigning said determined measure of intra-clause proximity to a noun-pair-score data structure corresponding to said first noun and said second noun. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computing apparatus for searching a corpus of texts relating to a domain of knowledge, the apparatus comprising a processor and a memory storing instructions that, when executed by the processor, configure the apparatus to:
-
determine a multiplicity of noun-pair proximity scores measuring associations between pairs of nouns that appear in said corpus of texts and that are semantically related to said domain of knowledge; obtain a search term related to said domain of knowledge; identify, based at least in part on said multiplicity of noun-pair proximity scores, a related noun that is strongly associated with said search term within said corpus of texts; select from said corpus of texts a plurality of texts, wherein, in each of said plurality of texts, said search term and said related noun appear near each other in at least one place; and provide data associated with said plurality of texts for presentation as search results, and wherein the instructions that configure the apparatus to determine said multiplicity of noun-pair proximity scores further comprise instructions configuring the apparatus to, for a given text of said corpus of texts; parse said given text to identify an independent clause that appears in said given text and that includes at least a first noun and a second noun; determine a measure of intra-clause proximity based at least in part on said first noun'"'"'s relationship to said second noun within said independent clause; and assign said determined measure of intra-clause proximity to a noun-pair-score data structure corresponding to said first noun and said second noun. - View Dependent Claims (12, 13, 14)
-
-
15. A non-transitory computer-readable storage medium having stored thereon instructions including instructions that, when executed by a processor, configure the processor to:
-
determine a multiplicity of noun-pair proximity scores measuring associations between pairs of nouns that appear in texts and that are semantically related to a domain of knowledge; obtain a search term related to said domain of knowledge; identify, based at least in part on said multiplicity of noun-pair proximity scores, a related noun that is strongly associated with said search term within a corpus of texts; select from said corpus of texts a plurality of texts, wherein, in each of said plurality of texts, said search term and said related noun appear near each other in at least one place; and provide data associated with said plurality of texts for presentation as search results, and wherein the instructions that configure the processor to determine said multiplicity of noun-pair proximity scores further comprise instructions configuring the processor to, for a given text of said corpus of texts; parse said given text to identify an independent clause that appears in said given text and that includes at least a first noun and a second noun; determine a measure of intra-clause proximity based at least in part on said first noun'"'"'s relationship to said second noun within said independent clause; and assign said determined measure of intra-clause proximity to a noun-pair-score data structure corresponding to said first noun and said second noun. - View Dependent Claims (16, 17, 18)
-
Specification