Content conversion method and apparatus
First Claim
Patent Images
1. A method for associating words in a language comprising:
- providing a collection of documents, wherein said collection includes at least one document;
selecting a first word or word string, and a second word or word string;
locating a plurality of occurrences of the first word or word string and the second word or word string in said collection;
defining in said collection first ranges and second ranges, wherein the first ranges include the first word or word string and the second ranges include the second word or word string;
searching said first ranges and second ranges for common word or word strings, wherein said common word or word strings occur in a plurality of ranges; and
associating first word or word strings and second word or word strings with common word or word strings based on frequency of occurrence of the common word or word strings within the first ranges and second ranges respectively.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for analyzing documents and thereby determining the association between words in a language. The method includes providing a collection of documents, selecting a first word or word string, and a second word or word string occurring in the documents. The method further involves associating first word or word strings and second word or word strings with common word or word strings based on frequency of occurrence of the common word or word strings within the ranges.
150 Citations
9 Claims
-
1. A method for associating words in a language comprising:
-
providing a collection of documents, wherein said collection includes at least one document;
selecting a first word or word string, and a second word or word string;
locating a plurality of occurrences of the first word or word string and the second word or word string in said collection;
defining in said collection first ranges and second ranges, wherein the first ranges include the first word or word string and the second ranges include the second word or word string;
searching said first ranges and second ranges for common word or word strings, wherein said common word or word strings occur in a plurality of ranges; and
associating first word or word strings and second word or word strings with common word or word strings based on frequency of occurrence of the common word or word strings within the first ranges and second ranges respectively. - View Dependent Claims (2, 3, 4)
-
-
5. A method for associating words in a language comprising:
-
providing a collection of documents, wherein said collection includes at least one document;
selecting a first word or word string, and a second word or word string;
locating all documents having a plurality of occurrences of the first word or word string within a defined proximity range of the second word and/or word string, with said defined proximity range having an upper limit and a lower limit;
defining in the located documents a range, wherein the range includes the first word or word string and the second word or word string;
searching said ranges for common word or word strings; and
associating the first word or word string and the second word or word string with common word or word strings based on frequency of occurrence of the common word or word strings within the ranges. - View Dependent Claims (6, 7, 8)
-
-
9. A method for creating an association database in a single language comprising the steps of:
-
providing a collection of documents, wherein said collection includes at least one document;
selecting a first word or word string;
locating a plurality of occurrences of the first word or word string;
defining in said collection ranges, wherein said ranges occur in relation to each of said plurality of occurrences of the first word or word string;
searching said ranges for common word or word strings, wherein said common word or word strings occur in a plurality of ranges; and
associating first word or word strings with common word or word strings based on frequency of occurrence of the common word or word strings within the ranges.
-
Specification