×

Corpus search systems and methods

  • US 10,102,274 B2
  • Filed: 03/17/2014
  • Issued: 10/16/2018
  • Est. Priority Date: 03/17/2014
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for searching a corpus of texts relating to a domain of knowledge, the method comprising:

  • determining, by said computer, a multiplicity of noun-pair proximity scores measuring associations between pairs of nouns that appear in said corpus of texts and that are semantically related to said domain of knowledge;

    obtaining, by said computer, a search term related to said domain of knowledge;

    identifying, by said computer based at least in part on said multiplicity of noun-pair proximity scores, a related noun that is strongly associated with said search term within said corpus of texts;

    selecting, by said computer, a plurality of texts from said corpus of texts, wherein, in each of said plurality of texts, said search term and said related noun appear near each other in at least one place; and

    providing, by said computer, data associated with said plurality of texts for presentation as search results; and

    wherein determining said multiplicity of noun-pair proximity scores comprises, for a given text of said corpus of texts;

    parsing said given text to identify an independent clause that appears in said given text and that includes at least a first noun and a second noun;

    determining a measure of intra-clause proximity based at least in part on said first noun'"'"'s relationship to said second noun within said independent clause; and

    assigning said determined measure of intra-clause proximity to a noun-pair-score data structure corresponding to said first noun and said second noun.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×