Word Association Method and Apparatus
First Claim
1. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
- providing a collection of documents, wherein said collection includes at least one document;
receiving a word or word string query to be analyzed;
searching by a processor, said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed;
determining a user-defined amount of words or word strings or both to the left of said query to be analyzed in said returned documents based on their frequency and creating a Left Signature List comprising each of said words and word strings to the left of said query to be analyzed in said returned documents;
searching said collection of documents for the words and word strings on the Left Signature List and returning documents containing said words or word strings on the Left Signature List;
determining a user-defined amount of words or word strings or both to the right of each of said words and word strings comprising said Left Signature List and creating a Left Anchor List comprising each of said words and word strings to the right of each of said words and word strings on the Left Signature List based on their frequency in a collection of documents;
determining a user-defined number of words or word strings or both to the right of said query to be analyzed in said returned documents and creating a Right Signature List comprising each of said words and word strings to the right of said query to be analyzed in said returned documents based on their frequency;
searching said collection of documents for each of said words and word strings on the Right Signature List and returning documents containing said words and word strings on the Right Signature List;
determining a user-defined number of words or word strings or both to the left of each of said words and word strings comprising said Right Signature List and creating a Right Anchor List comprising each of said words and word strings to the left of each of said words and word strings on the Right Signature List based on their frequency; and
ranking results based on the number of different Anchor Lists on which the result appears.
0 Assignments
0 Petitions
Accused Products
Abstract
A method for creating and using a cross-idea association database that includes a method for associating words and word strings in a language by analyzing word formations around a word or word string to identify other words or word strings that are equivalents or near equivalents semantically. One method for associating words and word strings includes querying a collection of documents with a user-supplied word or word string, determining a user-defined amount of words or word strings to the left and right of the query string, determining the frequency of occurrence of words or word strings located on the left and right of the query string, and ranking the located words.
-
Citations
24 Claims
-
1. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
providing a collection of documents, wherein said collection includes at least one document; receiving a word or word string query to be analyzed; searching by a processor, said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed; determining a user-defined amount of words or word strings or both to the left of said query to be analyzed in said returned documents based on their frequency and creating a Left Signature List comprising each of said words and word strings to the left of said query to be analyzed in said returned documents; searching said collection of documents for the words and word strings on the Left Signature List and returning documents containing said words or word strings on the Left Signature List; determining a user-defined amount of words or word strings or both to the right of each of said words and word strings comprising said Left Signature List and creating a Left Anchor List comprising each of said words and word strings to the right of each of said words and word strings on the Left Signature List based on their frequency in a collection of documents; determining a user-defined number of words or word strings or both to the right of said query to be analyzed in said returned documents and creating a Right Signature List comprising each of said words and word strings to the right of said query to be analyzed in said returned documents based on their frequency; searching said collection of documents for each of said words and word strings on the Right Signature List and returning documents containing said words and word strings on the Right Signature List; determining a user-defined number of words or word strings or both to the left of each of said words and word strings comprising said Right Signature List and creating a Right Anchor List comprising each of said words and word strings to the left of each of said words and word strings on the Right Signature List based on their frequency; and ranking results based on the number of different Anchor Lists on which the result appears. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
providing a collection of documents, wherein said collection includes at least one document; receiving a word or word string query to be analyzed; searching by a processor, said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed; determining a user-defined number of words or word strings or both to the left and right of the query in said returned documents containing the query to be analyzed; returning a list with an entry or plurality of entries, wherein said entry or said plurality of entries contain said determined words or word strings to the left and right of the query in said returned documents; searching said collection of documents for said entry or plurality of entries in said returned list and returning documents containing said entry or said plurality of entries in said returned list; and determining a user defined number of words or word strings that appear between words or word strings to the left and right of said query; returning a list of words or word strings or both that occur between said determined words or word strings to the left and right of said query; and ranking said returned list of words or word strings based on the number of different words or word strings to the left and right of said query that each returned word or word string appears between. - View Dependent Claims (7, 8)
-
-
9. A computer readable storage medium having stored thereon a program executable by a computer processor for performing the steps of:
-
providing a collection of documents, wherein said collection includes at least one document; receiving a word or word string query to be analyzed; searching by a processor, said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed; determining a user-defined amount of words or word strings or both to the left of said query to be analyzed in said returned documents based on their frequency and creating a Left Signature List comprising each of said words and word strings to the left of said query to be analyzed in said returned documents; searching said collection of documents for the words and word strings on the Left Signature List and returning documents containing said words or word strings on the Left Signature List; determining a user-defined amount of words or word strings or both to the right of each of said words and word strings comprising said Left Signature List and creating a Left Anchor List comprising each of said words and word strings to the right of each of said words and word strings on the Left Signature List based on their frequency in a collection of documents; determining a user-defined number of words or word strings or both to the right of said query to be analyzed in said returned documents and creating a Right Signature List comprising each of said words and word strings to the right of said query to be analyzed in said returned documents based on their frequency; searching said collection of documents for each of said words and word strings on the Right Signature List and returning documents containing said words and word strings on the Right Signature List; determining a user-defined number of words or word strings or both to the left of each of said words and word strings comprising said Right Signature List and creating a Right Anchor List comprising each of said words and word strings to the left of each of said words and word strings on the Right Signature List based on their frequency; and ranking results based on the number of different Anchor Lists on which the result appears. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A computer readable storage medium having stored thereon a program executable by a computer processor for performing the steps of:
-
providing a collection of documents, wherein said collection includes at least one document; receiving a word or word string query to be analyzed; searching by a processor, said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed; determining a user-defined number of words or word strings or both to the left and right of the query in said returned documents containing the query to be analyzed; returning a list with an entry or plurality of entries, wherein said entry or said plurality of entries contain said determined words or word strings to the left and right of the query in said returned documents; searching said collection of documents for said entry or plurality of entries in said returned list and returning documents containing said entry or said plurality of entries in said returned list; and determining a user defined number of words or word strings that appear between words or word strings to the left and right of said query; returning a list of words or word strings or both that occur between said determined words or word strings to the left and right of said query; and ranking said returned list of words or word strings based on the number of different words or word strings to the left and right of said query that each returned word or word string appears between. - View Dependent Claims (15, 16)
-
-
17. A method for associating words and word strings in a language comprising:
-
providing a collection of documents, wherein said collection includes at least one document; receiving a word or word string query to be analyzed; searching by a processor, said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed; determining a user-defined amount of words or word strings or both to the left of said query to be analyzed in said returned documents based on their frequency and creating a Left Signature List comprising each of said words and word strings to the left of said query to be analyzed in said returned documents; searching said collection of documents for the words and word strings on the Left Signature List and returning documents containing said words or word strings on the Left Signature List; determining a user-defined amount of words or word strings or both to the right of each of said words and word strings comprising said Left Signature List and creating a Left Anchor List comprising each of said words and word strings to the right of each of said words and word strings on the Left Signature List based on their frequency in a collection of documents; determining a user-defined number of words or word strings or both to the right of said query to be analyzed in said returned documents and creating a Right Signature List comprising each of said words and word strings to the right of said query to be analyzed in said returned documents based on their frequency; searching said collection of documents for each of said words and word strings on the Right Signature List and returning documents containing said words and word strings on the Right Signature List; determining a user-defined number of words or word strings or both to the left of each of said words and word strings comprising said Right Signature List and creating a Right Anchor List comprising each of said words and word strings to the left of each of said words and word strings on the Right Signature List based on their frequency; and ranking results based on the number of different Anchor Lists on which the result appears. - View Dependent Claims (18, 19, 20, 21)
-
-
22. A method for associating words and word strings in a language comprising:
-
providing a collection of documents, wherein said collection includes at least one document; receiving a word or word string query to be analyzed; searching by a processor, said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed; determining a user-defined number of words or word strings or both to the left and right of the query in said returned documents containing the query to be analyzed; returning a list with an entry or plurality of entries, wherein said entry or said plurality of entries contain said determined words or word strings to the left and right of the query in said returned documents; searching said collection of documents for said entry or plurality of entries in said returned list and returning documents containing said entry or said plurality of entries in said returned list; and determining a user defined number of words or word strings that appear between words or word strings to the left and right of said query; returning a list of words or word strings or both that occur between said determined words or word strings to the left and right of said query; and ranking said returned list of words or word strings based on the number of different words or word strings to the left and right of said query that each returned word or word string appears between. - View Dependent Claims (23, 24)
-
Specification