Corpus clustering, confidence refinement, and ranking for geographic text search and information retrieval
First Claim
1. A computer-implemented method for processing a plurality of toponyms, said method comprising:
- in a large corpus, identifying geo-textual correlations among readings of the toponyms within the plurality of toponyms; and
for each toponym selected from the plurality of toponyms, using the identified geo-textual correlations to generate a value for a confidence that the selected toponym refers to a corresponding geographic location.
3 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method for processing a plurality of toponyms, the method involving: in a large corpus, identifying geo-textual correlations among readings of the toponyms within the plurality of toponyms; and for each toponym selected from the plurality of toponyms, using the identified geo-textual correlations to generate a value for a confidence that the selected toponym refers to a corresponding geographic location. Also a method of generating information useful for ranking a document that includes a plurality of toponyms for which there is a corresponding plurality of (toponym,place) pairs, there being associated with each (toponym,place) pair of said plurality of (toponym,place) pairs a corresponding value for a confidence that the toponym of that (toponym,place) pair refers to the place of that (toponym,place) pair. This further method includes, for a selected (toponym,place) pair of the plurality of (toponym,place) pairs, (1) determining if another toponym is present within the document that has an associated place that is geographically related to the place of the selected (toponym, place) pair; and (2) if a toponym is identified within the document that has an associated place that is geographically related to the place of the selected (toponym, place) pair, boosting the value of the confidence for the selected (toponym,place) pair.
-
Citations
18 Claims
-
1. A computer-implemented method for processing a plurality of toponyms, said method comprising:
-
in a large corpus, identifying geo-textual correlations among readings of the toponyms within the plurality of toponyms; and
for each toponym selected from the plurality of toponyms, using the identified geo-textual correlations to generate a value for a confidence that the selected toponym refers to a corresponding geographic location. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented method of generating information useful for ranking a document that includes a plurality of toponyms for which there is a corresponding plurality of (toponym,place) pairs, there being associated with each (toponym,place) pair of said plurality of (toponym,place) pairs a corresponding value for a confidence that the toponym of that (toponym,place) pair refers to the place of that (toponym,place) pair, said method comprising:
-
for a selected (toponym,place) pair of the plurality of (toponym,place) pairs, (1) determining if another toponym is present within the document that has an associated place that is geographically related to the place of the selected (toponym, place) pair; and
(2) if a toponym is identified within the document that has an associated place that is geographically related to the place of the selected (toponym, place) pair, boosting the value of the confidence for the selected (toponym,place) pair. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A method of evaluating relevance of a plurality of documents to a search query that includes both text and geographic place terms, said method comprising:
-
for a selected document among the plurality of documents, (1) computing a textual term relevance score corresponding to the text terms in the query;
(2) computing a geo-relevance score corresponding to the geographic terms in the query; and
(3) combining the computed textual term relevance score and the computed geo-relevance score to derive an overall relevance score for that document, wherein computing the geo-relevance for the selected document involves identifying a plurality of (toponym,place) pairs that is associated with the selected document, and for each identified (toponym,place) pair, obtaining and using a value for a confidence that the toponym of the (toponym,place) pair refers to the place.
-
Specification