Word association method and apparatus
First Claim
1. A method for associating words and word strings in a language comprising:
- providing a collection of documents, wherein said collection includes at least one document;
receiving from a user a word or word string query to be analyzed;
searching said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed;
determining a user-defined amount of words or word strings or both to the left of said query to be analyzed in said returned documents based on their frequency and creating a Left Signature List comprising said word or word strings to the left of said query to be analyzed in said returned documents;
determining a user-defined amount of words or word strings or both to the right of said words or word strings comprising said Left Signature List and creating Left Anchor Lists comprising said word or word strings to the right of said Left Signature Lists based on their frequency in a collection of documents;
determining a user-defined number of words or word strings or both to the right of said query to be analyzed in said returned documents and creating a Right Signature List comprising said word or word strings to the right of said query to be analyzed in said returned documents based on their frequency;
determining a user-defined number of words or word strings or both to the left of said word or word strings comprising said Right Signature List and creating Right Anchor Lists comprising said word or word strings to the left of said Right Signature List based on their frequency; and
ranking results based on the frequency of each word or word string occurring in said Left Anchor Lists and the frequency of said word or word string occurring in said Right Anchor Lists.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for creating and using a cross-idea association database that includes a method for associating words and word strings in a language by analyzing word formations around a word or word string to identify owther words or word strings that are equivalents or near equivalents semantically. One method for associating words and word strings includes querying a collection of documents with a user-supplied word or word string, determining a user-defined amount of words or word strings to the left and right of the query string, determining the frequency of occurrence of words or word strings located on the left and right of the query string, and ranking the located words.
137 Citations
4 Claims
-
1. A method for associating words and word strings in a language comprising:
-
providing a collection of documents, wherein said collection includes at least one document;
receiving from a user a word or word string query to be analyzed;
searching said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed;
determining a user-defined amount of words or word strings or both to the left of said query to be analyzed in said returned documents based on their frequency and creating a Left Signature List comprising said word or word strings to the left of said query to be analyzed in said returned documents;
determining a user-defined amount of words or word strings or both to the right of said words or word strings comprising said Left Signature List and creating Left Anchor Lists comprising said word or word strings to the right of said Left Signature Lists based on their frequency in a collection of documents;
determining a user-defined number of words or word strings or both to the right of said query to be analyzed in said returned documents and creating a Right Signature List comprising said word or word strings to the right of said query to be analyzed in said returned documents based on their frequency;
determining a user-defined number of words or word strings or both to the left of said word or word strings comprising said Right Signature List and creating Right Anchor Lists comprising said word or word strings to the left of said Right Signature List based on their frequency; and
ranking results based on the frequency of each word or word string occurring in said Left Anchor Lists and the frequency of said word or word string occurring in said Right Anchor Lists. - View Dependent Claims (2, 3)
-
-
4. A method for associating words and word strings in a language comprising:
-
providing a collection of documents, wherein said collection includes at least one document;
receiving from a user a word or word string query to be analyzed;
searching said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed;
determining a user-defined amount and size of words or word strings or both to the left and right of the query in said returned documents containing the query to be analyzed;
returning a list with an entry or pluarality of entries, wherein said entry or said plurality of entries contain said determined amount of words to the left and right of the query in said returned documents;
searching said collection of documents for said entry or plurality or plurality of entries in said returned list; and
returning a list of words or word strings or both that occur most frequently between said determined amount of words to the left and right of said query in said returned documents.
-
Specification