System, method, program product, and networking use for recognizing words and their parts of speech in one or more natural languages
First Claim
1. A computer-implemented system for recognizing one or more words not listed in a dictionary database, the system comprising:
- at least one central processing unit;
a memory operably associated with the at least one processing unit; and
a dictionary augmentation system storable in memory and executable by the at least one processing unit, the dictionary augmentation system comprising;
a root process that searches the dictionary database to obtain root information about a root word, the root word being a word with no prefix and suffix; and
a statistical process that, if the root word is not found in the dictionary database, checks one or more proper substrings of the root word comprising two or more characters in the root word and every proper substring having fewer characters than the root word, against a complete database of each and every possible subset of individual valid words within the dictionary database, to determine, from the likelihood that the proper substring of the root word occurs in a sequence in the subsets of the individual valid words, a probability that the root word is a valid word that was previously unknown, wherein each character in the root word and in the individual valid words is an alphabet-based character and wherein the dictionary database is distinct from the complete database.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method, and computer program are disclosed for recognizing one or more words not listed in a dictionary database. One or more sequences of characters in the word are checked to determine a probability that the word is valid. A prefix removal process removes any prefixes from a word, and obtains information about the removed prefix. A suffix removal process removes any suffixes from the word, and obtains information about the removed suffix. A root process obtains information about a root word from the dictionary database. A combination process then determines if the prefix, the root, and the suffix can be combined into a valid word as defined by one or more combination rules, obtains one or more of the possible parts of speech of the valid word, and stores the parts of speech with the valid word in the dictionary database.
-
Citations
21 Claims
-
1. A computer-implemented system for recognizing one or more words not listed in a dictionary database, the system comprising:
-
at least one central processing unit; a memory operably associated with the at least one processing unit; and a dictionary augmentation system storable in memory and executable by the at least one processing unit, the dictionary augmentation system comprising; a root process that searches the dictionary database to obtain root information about a root word, the root word being a word with no prefix and suffix; and a statistical process that, if the root word is not found in the dictionary database, checks one or more proper substrings of the root word comprising two or more characters in the root word and every proper substring having fewer characters than the root word, against a complete database of each and every possible subset of individual valid words within the dictionary database, to determine, from the likelihood that the proper substring of the root word occurs in a sequence in the subsets of the individual valid words, a probability that the root word is a valid word that was previously unknown, wherein each character in the root word and in the individual valid words is an alphabet-based character and wherein the dictionary database is distinct from the complete database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer-implemented method for recognizing one or more words not listed in a dictionary database, the method comprising the steps of:
-
identifying a root word in a document, wherein the document is stored on one of a hard disk and a network, and wherein the root word is a word with no prefix and no suffix; using at least one processing unit, searching the dictionary database to obtain root information about the root word; and if the root word is not found in the dictionary database, checking one or more proper substrings of the root word comprising two or more characters in the root word, and every proper substring having fewer characters than the root word, against a complete database of each and every possible subset of individual valid words within the dictionary database, to determine, from the likelihood that the substrings of the root word occurs in a sequence in the subsets of the individual valid words, a probability that the root word is a valid word that was previously unknown, wherein each character in the root word and in the individual valid words is an alphabet-based character and wherein the dictionary database is distinct from the complete database.
-
-
20. A computer-implemented system for recognizing one or more words not listed in a dictionary database, the system comprising:
-
at least one central processing unit; a memory operably associated with the at least one processing unit; and a dictionary augmentation system storable in memory and executable by the at least one processing unit, the dictionary augmentation system comprising; means for searching the dictionary database to obtain root information about a root word, the root word being a word with no prefix and suffix; and means for checking one or more proper substrings of the root word comprising two or more characters in the root word, and every proper substring having fewer characters than the root word, against a complete database of each and every possible subset of individual valid words within the dictionary database, to determine, from the likelihood that the substrings of the root word occurs in a sequence in the subsets of the individual valid words, a probability that the root word is a valid word that was previously unknown, if the root word is not found in the dictionary database, wherein each character in the root word and in the individual valid words is an alphabet-based character and wherein the dictionary database is distinct from the complete database.
-
-
21. A computer memory storage device storing a dictionary augmentation System, the dictionary augmentation system comprising a computer program that causes a computer system to perform the steps of:
-
identifying a root word in a document, wherein the document is stored on one of a hard disk and a network, and wherein the root word is a word with no prefix and no suffix; using at least one processing unit, searching the dictionary database to obtain root information about the root word; and checking one or more proper substrings of the root word comprising two or more characters in the root word, and every proper substring having fewer characters than the root word, against a complete database of each and every possible subset comprising individual valid words within the dictionary database, to determine, from the likelihood that the subsets of the root word occurs in a sequence in the subsets of the individual valid words, a probability that the root word is a valid word that was previously unknown, if the root word is not found in the dictionary database, wherein each character in the root word and in the individual valid words is an alphabet-based character and wherein the dictionary database is distinct from the complete database.
-
Specification