Methods for information extraction, search, and structured representation of text data
First Claim
1. A computer-implemented method for detecting hidden information in an unstructured data source, and generating a new data object representing the detected information, comprising:
- defining a first attribute for identifying a target term in a text content, wherein the first attribute comprises a semantic or syntactic attribute, wherein the semantic attribute includes being or representing a name of an object or a property associated with an object;
defining a target term as a term that is associated with the first attribute;
identifying a second attribute as an additional attribute associated with the target term, wherein the second attribute comprises a semantic or syntactic attribute;
receiving a text content comprising a plurality of terms each comprising a word or a phrase;
identifying a first term in the text content;
identifying a second term in the context of the first term, wherein the context includes a term having a syntactic or semantic relation with the first term;
identifying whether the second term refers to or represents the second attribute;
if the second term is identified as referring to or representing the second attribute, then determining that the first term is the target term;
generating a first data object comprising a first label associated with the first term, wherein the first label is not an element extracted from the text content, wherein the first attribute and the second attribute are not explicitly indicated in the text content;
associating the first term to a search engine or a database; and
when the first term is associated to a search engine, using the first term as an automatically-generated query to perform a search and determine a relevance for a search result, and displaying the search result in a user interface, wherein the first term is displayed in the user interface for an enhanced user interface functionality of highlighting a key information item that is automatically discovered from an unstructured data source and used for determining the relevance of the search result, wherein the search engine comprises an enterprise search engine, a job-seeking search engine, a recruitment search engine, a text analytics tool, and a data mining search engine;
when the first term is associated to a database, compiling a data source comprising the first term, wherein the data source indicates that the first term is associated with the first label representing the first attribute that is hidden in the raw unstructured data source.
1 Assignment
0 Petitions
Accused Products
Abstract
System and methods for creating structured or semi-structured representations of information extracted from unstructured text data sources are described. In some embodiments, without requiring a predefined target data structure, the methods identify the grammatical and semantic attributes and context information in a text content, and create object-properties association data as knowledge and information extracted from the unstructured data, and represent such information in a structured or semi-structured format to facilitate search and trend analysis. In some other embodiments, the methods identify the types of information contained in the unstructured data, and for a pre-defined target information type, the methods identify the context and content of the portion of the text that represents the target information type, and extract the text, attach a tag or label to the extracted text, and store or display the data in a database table format or xml format for further pattern and trend analysis. Applications of the present system and methods include effectively analyzing user-generated contents such as customer feedback, reviews, comments, technical support forum messages, resume or job description documents, and other types of text contents.
-
Citations
19 Claims
-
1. A computer-implemented method for detecting hidden information in an unstructured data source, and generating a new data object representing the detected information, comprising:
-
defining a first attribute for identifying a target term in a text content, wherein the first attribute comprises a semantic or syntactic attribute, wherein the semantic attribute includes being or representing a name of an object or a property associated with an object; defining a target term as a term that is associated with the first attribute; identifying a second attribute as an additional attribute associated with the target term, wherein the second attribute comprises a semantic or syntactic attribute; receiving a text content comprising a plurality of terms each comprising a word or a phrase; identifying a first term in the text content; identifying a second term in the context of the first term, wherein the context includes a term having a syntactic or semantic relation with the first term; identifying whether the second term refers to or represents the second attribute; if the second term is identified as referring to or representing the second attribute, then determining that the first term is the target term; generating a first data object comprising a first label associated with the first term, wherein the first label is not an element extracted from the text content, wherein the first attribute and the second attribute are not explicitly indicated in the text content; associating the first term to a search engine or a database; and when the first term is associated to a search engine, using the first term as an automatically-generated query to perform a search and determine a relevance for a search result, and displaying the search result in a user interface, wherein the first term is displayed in the user interface for an enhanced user interface functionality of highlighting a key information item that is automatically discovered from an unstructured data source and used for determining the relevance of the search result, wherein the search engine comprises an enterprise search engine, a job-seeking search engine, a recruitment search engine, a text analytics tool, and a data mining search engine; when the first term is associated to a database, compiling a data source comprising the first term, wherein the data source indicates that the first term is associated with the first label representing the first attribute that is hidden in the raw unstructured data source. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented method for detecting hidden information and generating a structured data object from an unstructured data source, comprising:
-
defining a first information type for extracting a text unit from a text content, and creating a first label as a description of the first information type; defining a second information type as a sub-type of the first information type, and creating a second label as a description of the second information type, wherein the second information type represents a subclass of instances that are associated with the first information type; receiving a text content as an unstructured data source comprising a plurality of terms, each comprising a word or a phrase; identifying a text unit comprising a first term in the text content; identifying a first attribute associated with the first term; determining whether the text unit is associated with the first information type based on the first attribute; identifying a second attribute associated with a term in the text unit; determining whether the text unit is associated with the second information type based on the second attribute; if the text unit is determined to be associated with the first information type, generating a first data object comprising a first data field; associating the first data field with the first label, wherein the first label is not an element extracted from the text unit, wherein the first information type is not explicitly indicated in the text unit; and creating a structured data presentation format as a transformation from an unstructured data source, wherein the structured data presentation format comprises the first data field annotated by the first label; associating the structured data presentation format to a text data analysis tool, wherein the text data analysis tool comprises an element selected from the group consisting of at least an enterprise call center data analysis tool, a customer support text data analysis tool, a text mining tool; using the text data analysis tool to conduct a structured query, based on the structured data presentation format, for information originally hidden in an unstructured data source; and displaying, in a user interface associated with the text data analysis tool, the structured data presentation format as an enhanced information presentation functionality of the user interface for information originally hidden in an unstructured data source. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
-
Specification