System and method for extracting information from unstructured text
First Claim
Patent Images
1. A method for extracting subject-verb-object (SVO) chunked text from unstructured text, the method comprising:
- identifying, by a SVO chunked text computing device, a plurality of part of speech (PoS) tokens in an unstructured text; and
determining, by the SVO chunked text computing device, a SVO chunked text directly from the plurality of PoS tokens using a machine learning chunker model, wherein the machine learning chunker model is trained on an SVO annotated training data, wherein the SVO annotated training data comprises a plurality of tokens, a plurality of corresponding PoS tags, and a plurality of corresponding SVO tags, the plurality of corresponding SVO tags comprises one or more of a subject tag, a verb tag, an object tag, or an object-subject tag, and the plurality of corresponding SVO tags is in beginninginside-other (BIO) format, and wherein the SVO annotated training data is generated based on a plurality of corresponding span information for the plurality of tokens by for each of a plurality of PoS tokens in each of a plurality of sets of syntactically related PoS tokens in a sentence, detecting a span information for a PoS token and tagging the PoS token as a subject, a verb, an object, or an object-subject based on the span information and a pervious tagging of the PoS token.
1 Assignment
0 Petitions
Accused Products
Abstract
This disclosure relates generally to natural language processing, and more particularly to a system and method for extracting subject-verb-object (SVO) chunked text from an unstructured text. In one embodiment, a method is provided for extracting SVO chunked text from an unstructured text. The method comprises identifying a plurality of part of speech (PoS) tokens in the unstructured text, and determining a plurality of SVO chunked text directly from the plurality of PoS tokens using a machine learning chunker model. The machine learning chunker model is trained on a subject-verb-object (SVO) annotated training data.
-
Citations
15 Claims
-
1. A method for extracting subject-verb-object (SVO) chunked text from unstructured text, the method comprising:
-
identifying, by a SVO chunked text computing device, a plurality of part of speech (PoS) tokens in an unstructured text; and determining, by the SVO chunked text computing device, a SVO chunked text directly from the plurality of PoS tokens using a machine learning chunker model, wherein the machine learning chunker model is trained on an SVO annotated training data, wherein the SVO annotated training data comprises a plurality of tokens, a plurality of corresponding PoS tags, and a plurality of corresponding SVO tags, the plurality of corresponding SVO tags comprises one or more of a subject tag, a verb tag, an object tag, or an object-subject tag, and the plurality of corresponding SVO tags is in beginninginside-other (BIO) format, and wherein the SVO annotated training data is generated based on a plurality of corresponding span information for the plurality of tokens by for each of a plurality of PoS tokens in each of a plurality of sets of syntactically related PoS tokens in a sentence, detecting a span information for a PoS token and tagging the PoS token as a subject, a verb, an object, or an object-subject based on the span information and a pervious tagging of the PoS token. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A subject-verb-object (SVO) chunked computing device, comprising;
-
at least one processor; and memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising; identify a plurality of part of speech (PoS) tokens in an unstructured text; and determine a SVO chunked text directly from the plurality of PoS tokens using a machine learning chunker model, wherein the machine learning chunker model is trained on an SVO annotated training data, wherein the SVO annotated training data comprises a plurality of tokens, a plurality of corresponding PoS tags, and a plurality of corresponding SVO tags, the plurality of corresponding SVO tags comprises one or more of a subject tag, a verb tag, an object tag, or an object-subject tag, and the plurality of corresponding SVO tags is in beginninginside-other (BIO) format, and wherein the SVO annotated training data is generated based on a plurality of corresponding span information for the plurality of tokens by for each of a plurality of PoS tokens in each of a plurality of sets of syntactically related PoS tokens in a sentence, detecting a span information for a PoS token and tagging the PoS token as a subject, a verb, an object, or an object-subject based on the span information and a pervious tagging of the PoS token. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A non-transitory computer-readable medium having stored thereon instructions for extracting subject-verb-object (SVO) chunked text from unstructured text comprising executable code which, when executed by one or more processors, causes the one or more processors to:
-
identify a plurality of part of speech (PoS) tokens in the unstructured text; and determine a plurality of SVO chunked text directly from the plurality of PoS tokens using a machine learning chunker model, wherein the machine learning chunker model is trained on a subject-verb-object (SVO) annotated training data, wherein the SVO annotated training data comprises a plurality of tokens, a plurality of corresponding PoS tags, and a plurality of corresponding SVO tags, the plurality of corresponding SVO tags comprises one or more of a subject tag, a verb tag, an object tag, or an object-subject tag, and the plurality of corresponding SVO tags is in beginninginside-other (BIO) format, and wherein the SVO annotated training data is generated based on a plurality of corresponding span information for the plurality of tokens by for each of a plurality of PoS tokens in each of a plurality of sets of syntactically related PoS tokens in a sentence, detecting a span information for a PoS token and tagging the PoS token as a subject, a verb, an object, or an object-subject based on the span information and a pervious tagging of the PoS token. - View Dependent Claims (12, 13, 14, 15)
-
Specification