Method and apparatus for implementing a dynamic collocation dictionary
First Claim
Patent Images
1. A computer-implemented method for updating a dynamic collocation dictionary having a plurality of bigram entries, each bigram entry identifying a bigram and associated frequency of occurrence, said method comprising the steps of:
- identifying a selected bigram in a document;
determining a frequency of occurrence of the selected bigram in the document; and
updating the dynamic collocation dictionary with the selected bigram on the basis of the frequency of occurrence of the selected bigram and the existence, in the dynamic collocation dictionary, of a bigram entry corresponding to the selected bigram wherein the dynamic collocation dictionary is updated with at least a predetermined positive minimum number of bigrams selected in the document.
19 Assignments
0 Petitions
Accused Products
Abstract
The present invention is method and apparatus for implementing a dynamic collocation dictionary. Bigrams and their frequency in a document are ascertained. Bigrams that are potentially collocations are selected. Entries in a dynamic collocation dictionary are updated for selected bigrams already present in the dynamic collocation dictionary. Selected bigrams, not present in the dynamic collocation dictionary, are entered in the dynamic collocation dictionary.
101 Citations
29 Claims
-
1. A computer-implemented method for updating a dynamic collocation dictionary having a plurality of bigram entries, each bigram entry identifying a bigram and associated frequency of occurrence, said method comprising the steps of:
-
identifying a selected bigram in a document;
determining a frequency of occurrence of the selected bigram in the document; and
updating the dynamic collocation dictionary with the selected bigram on the basis of the frequency of occurrence of the selected bigram and the existence, in the dynamic collocation dictionary, of a bigram entry corresponding to the selected bigram wherein the dynamic collocation dictionary is updated with at least a predetermined positive minimum number of bigrams selected in the document. - View Dependent Claims (2, 3, 4, 11, 12, 13, 14, 15)
determining that no selected bigram entry corresponding to the selected bigram exists in the dynamic collocation dictionary;
adding, to the dynamic collocation dictionary, a bigram entry corresponding to the selected bigram.
-
-
12. The method of claim 11 wherein the step of adding a bigram entry corresponding to the selected bigram comprises the steps of:
-
selecting an upper threshold representative of the number of new bigram entries that can be created in the dynamic collocation dictionary from selected bigrams in the document;
inspecting a new entry counter to determine how many new bigram entries have been created in the dynamic collocation dictionary;
incrementing the associated frequency of occurrence only if the new entry counter is below the upper threshold.
-
-
13. The method of claim 1 wherein the step of updating the dynamic collocation dictionary comprises the steps of:
-
determining that a selected bigram entry corresponding to the selected bigram exists in the dynamic collocation dictionary;
updating the selected bigram entry to be indicative of the frequency of occurrence of the selected bigram.
-
-
14. The method of claim 13 wherein the step of updating the selected bigram entry comprises the steps of:
-
determining that the frequency of occurrence of the selected bigram is greater than one; and
incrementing the associated frequency of occurrence of the selected bigram entry by an amount representative of the frequency of occurrence of the selected bigram.
-
-
15. The method of claim 13 wherein the step of updating the selected bigram entry comprises the steps of:
-
selecting a lower threshold representative of the minimum number of bigram entries that must be updated in the dynamic collocation dictionary using selected bigrams from the document;
selecting additional bigrams from the document to the extent necessary to ensure that the number of bigram entries updated in the dynamic collocation dictionary is in excess of the lower threshold.
-
-
5. A computer system for updating a dynamic collocation dictionary having a plurality of bigram entries, each bigram entry identifying a bigram and associated frequency of occurrence, said computer system comprising:
-
a first computer, said first computer including, a processor;
a memory, operatively coupled to the processor;
an identification process for enabling the processor to identify a selected bigram in a document;
a counting process for determining a frequency of occurrence of the selected bigram in the document; and
an updating process enabling the processor to update entries in a dynamic collocation dictionary with the selected bigram on the basis of the frequency of occurrence of the selected bigram and the existence, in the dynamic collocation dictionary, of a bigram entry corresponding to the selected bigram wherein the dynamic collocation dictionary is updated with at least a predetermined positive minimum number of bigrams selected in the document. - View Dependent Claims (6, 7, 8, 9, 10, 16, 17, 18, 19, 20)
a second computer; and
a communications system operatively coupling the first computer and the second computer to form a computer network.
-
-
10. The computer system of claim 9, wherein the second computer is a client and the first computer is a server.
-
16. The computer system of claim 5 wherein the updating process comprises:
-
a second determining process for determining that a selected bigram entry corresponding to the selected bigram exists in the dynamic collocation dictionary;
a second updating process for updating the selected bigram entry to be indicative of the frequency of occurrence of the selected bigram.
-
-
17. The computer system of claim 18 wherein the collocation adding process comprises:
-
a first threshold selection process for selecting an upper threshold representative of the number of new bigram entries that can be created in the dynamic collocation dictionary from selected bigrams in the document;
an inspection process for inspecting a new entry counter to determine how many new bigram entries have been created in the dynamic collocation dictionary;
a conditional incrementing process for incrementing the associated frequency of occurrence only if the new entry counter is below the upper threshold.
-
-
18. The computer system of claim 5 wherein the updating process comprises:
-
a first determining process for determining that no selected bigram entry corresponding to the selected bigram exists in the dynamic collocation dictionary;
a collocation adding process to add, to the dynamic collocation dictionary, a bigram entry corresponding to the selected bigram.
-
-
19. The computer system of claim 16 wherein the second updating process comprises:
-
a third determining process for determining that the frequency of occurrence of the selected bigram is greater than one; and
an incrementing process for incrementing the associated frequency of occurrence of the selected bigram entry by an amount representative of the frequency of occurrence of the selected bigram.
-
-
20. The computer system of claim 16 wherein the second updating process comprises:
-
a second threshold selecting process for selecting a lower threshold representative of the minimum number of bigram entries that must be updated in the dynamic collocation dictionary using selected bigrams from the document;
a bigram selection process for selecting additional bigrams from the document to the extent necessary to ensure that the number of bigram entries updated in the dynamic collocation dictionary is in excess of the lower threshold.
-
-
21. A computer-readable medium having encoded thereon software for updating a dynamic collocation dictionary having a plurality of bigram entries, each bigram entry identifying a bigram and associated frequency of occurrence, said software comprising instructions for executing the steps of:
-
identifying a selected bigram in a document;
determining a frequency of occurrence of the selected bigram in the document; and
updating the dynamic collocation dictionary with the selected bigram on the basis of the frequency of occurrence of the selected bigram and the existence, in the dynamic collocation dictionary, of a bigram entry corresponding to the selected bigram wherein the dynamic collocation dictionary is updated with at least a predetermined positive minimum number of bigrams selected in the document. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29)
determining that no selected bigram entry corresponding to the selected bigram exists in the dynamic collocation dictionary;
adding, to the dynamic collocation dictionary, a bigram entry corresponding to the selected bigram.
-
-
25. The computer-readable medium of claim 24 wherein the instructions for executing the step of adding a bigram entry corresponding to the selected bigram comprise instructions for executing the steps of:
-
selecting an upper threshold representative of the number of new bigram entries that can be created in the dynamic collocation dictionary from selected bigrams in the document;
inspecting a new entry counter to determine how many new bigram entries have been created in the dynamic collocation dictionary;
incrementing the associated frequency of occurrence only if the new entry counter is below the upper threshold.
-
-
26. The computer-readable medium of claim 21 wherein the instructions for executing the step of updating the dynamic collocation dictionary comprise instructions for executing the steps of:
-
determining that a selected bigram entry corresponding to the selected bigram exists in the dynamic collocation dictionary;
updating the selected bigram entry to be indicative of the frequency of occurrence of the selected bigram.
-
-
27. The computer-readable medium of claim 26 wherein the instructions for executing the step of updating the selected bigram entry comprise instructions for executing the steps of:
-
determining that the frequency of occurrence of the selected bigram is greater than one; and
incrementing the associated frequency of occurrence of the selected bigram entry by an amount representative of the frequency of occurrence of the selected bigram.
-
-
28. The computer-readable medium of claim 26 wherein the instructions for executing the step of updating the selected bigram entry comprise instructions for executing the steps of:
-
selecting a lower threshold representative of the minimum number of bigram entries that must be updated in the dynamic collocation dictionary using selected bigrams from the document;
selecting additional bigrams from the document to the extent necessary to ensure that the number of bigram entries updated in the dynamic collocation dictionary is in excess of the lower threshold.
-
-
29. The computer-readable medium of claim 21, wherein the instructions for executing the step of identifying a selected bigram comprise instructions for executing the step of identifying a bigram having a first constituent word and a second constituent word, said first and second constituent words being separated by no more than a selected plurality of intervening words.
Specification