Identifying word-senses based on linguistic variations
First Claim
Patent Images
1. A computer program product for identifying word-senses, the computer program product comprising:
- a non-transitory computer-readable storage medium having program code embodied therewith, the program code executable by a processor of a computer to perform a method comprising;
generating, by a computer, a plurality of arrays of aggregated statistical information of words, their corresponding word-senses, and temporal properties within different professional fields using an n-gram viewer, wherein the aggregated statistical information comprises frequency of usage of words, frequency of occurrence of words, frequency of co-occurrence of words with other words, and their respective corresponding word-senses;
generating, by the computer, a set of domain tables based on the generated plurality of arrays of aggregated statistical information, wherein each of the domain tables within the set of domain tables corresponds to a different professional field comprising medical, veterinary, legal, and engineering;
receiving, from a remote server through a network, a digital text stream comprising metadata and one or more words from a doctor, using the computer, the network being an internet connection;
selecting, using the metadata, a medical frequency domain table, veterinary frequency domain table, and a word-sense domain table from the set of domain tables;
determining a frequency of occurrence value for the received digital text stream within each of the selected domain tables;
receiving a threshold from the doctor;
associating the medical frequency domain table with the received digital text stream in response to the frequency of occurrence value satisfying the received threshold;
determining a word-sense of the received digital text stream, by determining a corresponding word sense to the received digital text stream within the medical frequency domain table;
assigning a confidence value to the word-sense based on a degree of frequency of occurrence of the received digital text stream within the medical domain, wherein the word-sense has a higher confidence value, when the frequency of occurrence of the received digital text stream is higher within the medical domain table; and
presenting the word-sense and the confidence value to the doctor.
1 Assignment
0 Petitions
Accused Products
Abstract
One or more words are received. A set of frequency of occurrence values of the received word(s) within a set of domain tables is determined. A domain table in the set of domain tables is associated to the received word(s), based on the set of frequency of occurrence values meeting a threshold value. A word-sense of the received word(s) is determined based on a corresponding word-sense in the associated domain table and/or corresponding domain dictionary.
-
Citations
1 Claim
-
1. A computer program product for identifying word-senses, the computer program product comprising:
-
a non-transitory computer-readable storage medium having program code embodied therewith, the program code executable by a processor of a computer to perform a method comprising; generating, by a computer, a plurality of arrays of aggregated statistical information of words, their corresponding word-senses, and temporal properties within different professional fields using an n-gram viewer, wherein the aggregated statistical information comprises frequency of usage of words, frequency of occurrence of words, frequency of co-occurrence of words with other words, and their respective corresponding word-senses; generating, by the computer, a set of domain tables based on the generated plurality of arrays of aggregated statistical information, wherein each of the domain tables within the set of domain tables corresponds to a different professional field comprising medical, veterinary, legal, and engineering; receiving, from a remote server through a network, a digital text stream comprising metadata and one or more words from a doctor, using the computer, the network being an internet connection; selecting, using the metadata, a medical frequency domain table, veterinary frequency domain table, and a word-sense domain table from the set of domain tables; determining a frequency of occurrence value for the received digital text stream within each of the selected domain tables; receiving a threshold from the doctor; associating the medical frequency domain table with the received digital text stream in response to the frequency of occurrence value satisfying the received threshold; determining a word-sense of the received digital text stream, by determining a corresponding word sense to the received digital text stream within the medical frequency domain table; assigning a confidence value to the word-sense based on a degree of frequency of occurrence of the received digital text stream within the medical domain, wherein the word-sense has a higher confidence value, when the frequency of occurrence of the received digital text stream is higher within the medical domain table; and presenting the word-sense and the confidence value to the doctor.
-
Specification