Recognizing chemical names in a chinese document
First Claim
Patent Images
1. A method comprising:
- a computer device receiving a Chinese document including chemical names;
the computer device recognizing chemical name segments in said document;
the computer device recognizing non-chemical name segments in said document, wherein the computer device recognizing said non-chemical name segments in said document comprises;
segmenting said document into words;
checking whether each segmented word is in a non-chemical name segment dictionary;
provided that said segmented word is in said non-chemical name segment dictionary, determining said segmented word to be a non-chemical name segment; and
recording position information of said non-chemical name segment; and
the computer device combining said chemical name segments to get said chemical names based on said recognized chemical name segments and non-chemical name segments to recognize said chemical names in Chinese documents, wherein the computer device recognizing said chemical name segments in said document comprises;
segmenting said document into sentences;
matching all of said chemical name segments appearing in sentences of said document based on a chemical name segment dictionary;
recording position information of said chemical name segments; and
reducing said chemical name segments in a same sentence, wherein reducing said chemical name segments in a same sentence is performed according to a principle of matching the most chemical name segments with the least number of chemical name segments; and
wherein the computer device combining said chemical name segments to get said chemical name based on said recognized chemical name segments and non-chemical name segments comprises;
determining adjacent chemical name segments in a same sentence according to said position information of said chemical name segments;
checking whether there are non-chemical name segments between said adjacent chemical name segments based on said position information of said chemical name segments and non-chemical name segments; and
provided that there are no non-chemical name segments between said adjacent chemical name segments, combining said adjacent chemical name segments to get a chemical name.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for recognizing chemical names in a Chinese document. The method includes: receiving a Chinese document including chemical names; recognizing chemical name segments in the document; recognizing non-chemical name segments in the document; and combining the chemical name segments to get chemical names based on the recognized chemical name segments and non-chemical name segments. Specific embodiments of the present invention can effectively recognize chemical names from a chemical document.
18 Citations
12 Claims
-
1. A method comprising:
-
a computer device receiving a Chinese document including chemical names; the computer device recognizing chemical name segments in said document; the computer device recognizing non-chemical name segments in said document, wherein the computer device recognizing said non-chemical name segments in said document comprises; segmenting said document into words; checking whether each segmented word is in a non-chemical name segment dictionary; provided that said segmented word is in said non-chemical name segment dictionary, determining said segmented word to be a non-chemical name segment; and recording position information of said non-chemical name segment; and the computer device combining said chemical name segments to get said chemical names based on said recognized chemical name segments and non-chemical name segments to recognize said chemical names in Chinese documents, wherein the computer device recognizing said chemical name segments in said document comprises; segmenting said document into sentences; matching all of said chemical name segments appearing in sentences of said document based on a chemical name segment dictionary; recording position information of said chemical name segments; and reducing said chemical name segments in a same sentence, wherein reducing said chemical name segments in a same sentence is performed according to a principle of matching the most chemical name segments with the least number of chemical name segments; and wherein the computer device combining said chemical name segments to get said chemical name based on said recognized chemical name segments and non-chemical name segments comprises; determining adjacent chemical name segments in a same sentence according to said position information of said chemical name segments; checking whether there are non-chemical name segments between said adjacent chemical name segments based on said position information of said chemical name segments and non-chemical name segments; and provided that there are no non-chemical name segments between said adjacent chemical name segments, combining said adjacent chemical name segments to get a chemical name. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
one or more processors, one or more computer-readable memories and one or more computer-readable, tangible storage devices; a receiving module configured to receive a Chinese document including chemical names; a first recognizing module configured to recognize chemical name segments in said document; a second recognizing module configured to recognize non-chemical name segments in said document, wherein said second recognizing module comprises; a word-segmenting module configured to segment said document into words; a first checking module configured to check whether each segmented word is in a non-chemical name segment dictionary; a first determining module configured to, provided that said segmented word is in said non-chemical name segment dictionary, determine said segmented word to be a non-chemical name segment; and a second recording module configured to record position information of said non-chemical name segment; and a combining module configured to combine said chemical name segments to get said chemical names based on said recognized chemical name segments and non-chemical name segments to recognize said chemical names in Chinese documents, wherein said first recognizing module comprises; a sentence-segmenting module configured to segment said document into sentences; a matching module configured to match all of said chemical name segments appearing in sentences of said document based on a chemical name segment dictionary; a matching module configured to record position information of said chemical name segments; and a reducing module configured to reduce said chemical name segments in a same sentence, wherein reducing said chemical name segments in a same sentence is performed according to a principle of matching the most chemical name segments with the least number of chemical name segments, and wherein said combining module further comprises; a second determining module configured to determine adjacent chemical name segments in a same sentence according to said position information of said chemical name segments; a second checking module configured to check whether there are non-chemical name segments between said adjacent chemical name segments based on said position information of said chemical name segments and non-chemical name segments; and a combination executing module configured to, provided that there are no non-chemical name segments between said adjacent chemical name segments, combine said adjacent chemical name segments to get a chemical name. - View Dependent Claims (8, 9, 10, 11, 12)
-
Specification