Method and system for encoding and accessing linguistic frequency data
First Claim
1. Method of encoding linguistic frequency data, the method comprising:
- identifying a plurality of sets of character strings in a source text, each set comprising at least a first and a second character string, for each set, obtaining frequency data indicative of the frequency of the respective set in the source text, for each character string that is a first character string in at least one of the sets, assigning a memory position in a first memory array to the respective character string and storing at said memory position the frequency data of each set comprising the respective character string as first character string, and for each character string that is a second character string in at least one of the sets, assigning a memory position in a second memory array to the respective character string and storing at said memory position, for each set comprising the respective character string as second character string, a pointer pointing to a memory position in the first memory array assigned to the corresponding first character string of the respective set and having stored the frequency data of the respective set.
7 Assignments
0 Petitions
Accused Products
Abstract
Linguistic frequency data is encoded by identifying a plurality of sets of character strings in a source text, where each set comprises at least a first and a second character string. Frequency data is obtained for each set and stored at a memory position in a first memory array that is assigned to each first character string. A pointer pointing to a position in the first memory array that has been assigned to the corresponding first character string of the respective set and which has stored the frequency data of the respective set, is stored in a second memory array for each set comprising each character string that is a second character string. The encoded data is accessed by identifying regions in the memory arrays that are each assigned a search string and a pointer pointing to a position in the first memory array.
76 Citations
26 Claims
-
1. Method of encoding linguistic frequency data, the method comprising:
-
identifying a plurality of sets of character strings in a source text, each set comprising at least a first and a second character string, for each set, obtaining frequency data indicative of the frequency of the respective set in the source text, for each character string that is a first character string in at least one of the sets, assigning a memory position in a first memory array to the respective character string and storing at said memory position the frequency data of each set comprising the respective character string as first character string, and for each character string that is a second character string in at least one of the sets, assigning a memory position in a second memory array to the respective character string and storing at said memory position, for each set comprising the respective character string as second character string, a pointer pointing to a memory position in the first memory array assigned to the corresponding first character string of the respective set and having stored the frequency data of the respective set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. Method of accessing encoded linguistic frequency data for retrieving the frequency of a search key in a text, the search key comprising a first and a second search string, the encoded data being stored in a first memory array storing frequency data and a second memory array storing pointers to the first memory array, the frequency data being indicative of the frequencies of character sets in a source text, the character sets each including at least two character strings, the method comprising:
-
identifying a region in the first memory array that is assigned to the first search string, identifying a region in the second memory array that is assigned to the second search string, identifying a pointer stored in the region of the second memory array, pointing to a memory position within the region of the first memory array, and reading the frequency data stored at said memory position. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A system for encoding linguistic frequency data, comprising:
-
a processing unit for identifying a plurality of sets of character strings in a source text, each set comprising at least a first and a second character string, and, for each set, obtaining frequency data indicative of the frequency of the respective set in the source text, and an encoder that, for each character string that is a first character string in at least one of the sets, assigns a memory position in a first memory array to the respective character string and stores at said memory position the frequency data of each set comprising the respective character string as first character string, and that, for each character string that is a second character string in at least one of the sets, assigns a memory position in a second memory array to the respective character string and stores at said memory position, for each set comprising the respective character string as second character string, a pointer pointing to a memory position in the first memory array assigned to the corresponding first character string of the respective set and having stored the frequency data of the respective set.
-
-
26. A system for accessing encoded linguistic frequency data for retrieving the frequency of a search key in a text, the search key comprising a first and a second search string, the encoded data being stored in a first memory array storing frequency data and a second memory array storing pointers to the first memory array, the frequency data being indicative of the frequencies of character sets in a source text, the character sets each including at least two character strings, the system comprising:
-
an input device for inputting the search key, and a search engine for identifying a region in the first memory array that is assigned to the first search string, identifying a region in the second memory array that is assigned to the second search string, identifying a pointer stored in the region of the second memory array, the pointer pointing to a memory position within the region of the first memory array, and reading the frequency data stored at said memory position.
-
Specification