Retrieval using a generalized sentence collocation
First Claim
Patent Images
1. A method in a computing device for searching for sentences, the method comprising:
- providing a collection of sentences;
for each sentence in the collection,identifying by the computing device pairs of collocated words of the sentence;
for each identified pair of collocated words,identifying by the computing device a part of speech of each word of the pair;
generating a first part of speech and word pair that includes the identified part of speech of the first word and the second word and a second part of speech and word pair that includes the first word and the identified part of speech of the second word; and
generating a mapping from the first part of speech and word pair and the second part of speech and word pair for the sentence;
receiving an input query that includes a first word, a part of speech, and a second word;
identifying from the mapping sentences that include the first word of the input query collocated with a word with the part of speech of the input query and a word with the part of speech collocated with the second word of the input query; and
displaying to a user the identified sentences.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for identifying documents relevant to a query that specifies a part of speech is provided. A retrieval system receives from a user an input query that includes a word and a part of speech. Upon receiving an input query that includes a word and a part of speech, the retrieval system identifies documents with a sentence that includes that word collocated with a word that is used as that part of speech. The retrieval system displays to the user an indication of the identified documents.
-
Citations
14 Claims
-
1. A method in a computing device for searching for sentences, the method comprising:
-
providing a collection of sentences; for each sentence in the collection, identifying by the computing device pairs of collocated words of the sentence; for each identified pair of collocated words, identifying by the computing device a part of speech of each word of the pair; generating a first part of speech and word pair that includes the identified part of speech of the first word and the second word and a second part of speech and word pair that includes the first word and the identified part of speech of the second word; and generating a mapping from the first part of speech and word pair and the second part of speech and word pair for the sentence; receiving an input query that includes a first word, a part of speech, and a second word; identifying from the mapping sentences that include the first word of the input query collocated with a word with the part of speech of the input query and a word with the part of speech collocated with the second word of the input query; and displaying to a user the identified sentences. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-readable storage medium that is not a signal containing instructions for controlling a computing device to identify documents relevant to a query, by a method comprising:
-
receiving from a user an input query that includes a word and a part of speech, the part of speech representing a wildcard for any word that is that part of speech; identifying documents with a sentence that includes the word collocated with any word of that part of speech, the document being identified based on mappings from part of speech and word pairs to documents, the mappings generated by; identifying collocated words of sentences of the documents; and for each identified pair of collocated words of a sentence, identifying a part of speech of each word of the pair; generating a first part of speech and word pair that includes the identified part of speech of the first word and the second word and a second part of speech and word pair that includes the first word and the identified part of speech of the second word; and generating a mapping from the first part of speech and word pair and the second part of speech and word pair to the document that contains the sentence; ranking the identified documents; and displaying to the user the identified documents in order of their rankings. - View Dependent Claims (9, 10)
-
-
11. A computing device for identifying sentences having words of a designated part of speech, comprising:
-
a component that inputs from a user a query having a word and a part of speech, the part of speech representing a wildcard for any word that is that part of speech; a component that identifies sentences that have the word of the query collocated with any word used as the part of speech of the query, the sentences being identified based on mappings from part of speech and word pairs to sentences, the mappings having been generated by a component that; identifies collocated words of sentences; and for each identified pair of collocated words of a sentence, identifies a part of speech of each word of the pair; generates a first part of speech and word pair that includes the identified part of speech of the first word and the second word and a second part of speech and word pair that includes the first word and the identified part of speech of the second word; and generates a mapping from the first part of speech and word pair and the second part of speech and word pair to the sentence; and a component that displays to the user the identified sentences. - View Dependent Claims (12, 13, 14)
-
Specification