System and method for language extraction and encoding utilizing the parsing of text data in accordance with domain parameters
First Claim
1. A method for extracting information from natural-language text data, comprising:
- parsing the text data to determine the grammatical structure of the text data, said parsing step comprising the step of referring to a domain parameter having a value indicative of a domain from which the text data originated, the domain parameter corresponding to one or more rules of grammar within a knowledge base related to the domain to be applied for parsing the text data;
regularizing the parsed text data to form structured word terms; and
tagging the text data with a structured data component derived from the structured word terms.
2 Assignments
0 Petitions
Accused Products
Abstract
A computerized method for extracting information from natural-language text data includes parsing the text data to determine the grammatical structure of the text data and regularizing the parsed text data to form structured word terms. The parsing step, which can be performed in one or more parsing modes, includes the step of referring to a domain parameter having a value indicative of a domain from which the text data originated, wherein the domain parameter corresponds to one or more rules of grammar within a knowledge base related to the domain to be applied for parsing the text data. Preferably, the structured output is mapped back to the words in the original sentences of the text data input using XML tags.
-
Citations
39 Claims
-
1. A method for extracting information from natural-language text data, comprising:
-
parsing the text data to determine the grammatical structure of the text data, said parsing step comprising the step of referring to a domain parameter having a value indicative of a domain from which the text data originated, the domain parameter corresponding to one or more rules of grammar within a knowledge base related to the domain to be applied for parsing the text data;
regularizing the parsed text data to form structured word terms; and
tagging the text data with a structured data component derived from the structured word terms. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
segmenting the text data by sentences; and
segmenting each of the sentences at identified words or phrases.
-
-
11. The method according to claim 1, wherein said parsing step further comprises:
-
segmenting the text data by sentences; and
segmenting each of the sentences at a prefix.
-
-
12. The method according to claim 1, wherein said parsing step further comprises skipping undefined words.
-
13. The method according to claim 1, wherein said parsing step further comprises:
-
identifying one or more primary findings in the text data; and
identifying one or more modifiers associated with the primary findings.
-
-
14. The method according to claim 1, further comprising performing error recovery when parsing of the text data is unsuccessful.
-
15. The method according to claim 14, wherein said error recovery step comprises:
-
segmenting the text data; and
analyzing the segmented text data to achieve at least a partial parsing of the unsuccessfully parsed text data.
-
-
16. The method according to claim 1, wherein said tagging step comprises providing the structured data component in a Standard Generalized Markup Language (SGML) compatible format.
-
17. The method according to claim 1, wherein said tagging step comprises providing the structured data component in Extensible Markup Language (XML).
-
18. The method according to claim 1, further comprising highlighting one or more primary findings in the natural-language text data.
-
19. A computer system for extracting information from natural-language text data, comprising:
-
means for parsing the natural-language text data, said parsing means comprising means for referring to a domain parameter having a value indicative of a domain from which the natural-language text data originated, and wherein the domain parameter corresponds to one or more rules of grammar within a knowledge base related to the domain to be applied for parsing the natural-language text data;
means for regularizing the parsed text data to form structured word terms; and
means for tagging the text data with a structured data component derived from the structured word terms. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
means for segmenting the text data by sentences; and
means for segmenting each of the sentences at identified words or phrases.
-
-
29. The system according to claim 19, wherein said parsing means further comprises:
-
means for segmenting the text data by sentences; and
means for segmenting each of the sentences at a prefix.
-
-
30. The system according to claim 19, wherein said parsing means further comprises means for skipping undefined words.
-
31. The system according to claim 19, wherein said parsing means further comprises:
-
means for identifying one or more primary findings in the text data; and
means for identifying one or more modifiers associated with the primary findings.
-
-
32. The system according to claim 19, further comprising means for performing error recovery when parsing of the text data is unsuccessful.
-
33. The system according to claim 32. wherein said error recovery means comprises:
-
means for segmenting the text data; and
means for analyzing the segmented text data to achieve at least a partial parsing of the unsuccessfully parsed text data.
-
-
34. The system according to claim 19, wherein said tagging means comprises means for providing the structured data component in a Standard Generalized Markup Language (SGML) compatible format.
-
35. The system according to claim 19, wherein said tagging step comprises means for providing the structured data component in Extensible Markup Language (XML).
-
36. The system according to claim 19, further comprising means for highlighting one or more primary findings in the original text data.
-
37. A combination of the system according to claim 19 with an interface module for enabling the system to receive input from and/or to produce standardized output for the World-Wide Web and/or a local network.
-
38. The combination according to claim 37, further comprising means for viewing the output using a standardized browser.
-
39. The combination according to claim 38, wherein the browser is a Web-browser.
Specification