UNDERSTANDING TABLES FOR SEARCH
First Claim
1. A method for detecting one or more subject columns of a table, the method comprising:
- selecting a specified number of columns from the table as subject column candidates, each subject column candidate being a candidate for a subject column of the table, each subject column candidate including a plurality of values;
for each subject column candidate;
identifying occurrences of any value from among the plurality of values being paired with one or more column names across a plurality of other tables; and
calculating a score for the subject candidate column based on the identified occurrences, the calculated score indicating a likelihood of the candidate column being a subject column; and
selecting at least one of the subject column candidates as a subject column of the table in accordance with the calculated scores.
3 Assignments
0 Petitions
Accused Products
Abstract
The present invention extends to methods, systems, and computer program products for understanding tables for search. Aspects of the invention include identifying a subject column for a table, detecting a column header using other tables, and detecting a column header using a knowledge base. Implementations can be utilized in a structured data search system (SDSS) that indexes structured information, such as, tables in a relational database or html tables extracted from web pages. The SDSS allows users to search over the structured information (tables) using different mechanisms including keyword search and data finding data.
-
Citations
33 Claims
-
1. A method for detecting one or more subject columns of a table, the method comprising:
-
selecting a specified number of columns from the table as subject column candidates, each subject column candidate being a candidate for a subject column of the table, each subject column candidate including a plurality of values; for each subject column candidate; identifying occurrences of any value from among the plurality of values being paired with one or more column names across a plurality of other tables; and calculating a score for the subject candidate column based on the identified occurrences, the calculated score indicating a likelihood of the candidate column being a subject column; and selecting at least one of the subject column candidates as a subject column of the table in accordance with the calculated scores. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. At a computer system the computer system, a method for detecting a column header for a table including one or more rows, the method comprising:
-
constructing a set of candidate column names for the table from data defining the table; for each candidate column name in the set of candidate column names; calculating a candidate column name frequency for the candidate column name by identifying one or more other tables, from among a set of other tables, that also contain the candidate column name as a candidate column name; and calculating a non-candidate column name frequency for the candidate column name by identifying a second one or more other tables, from among the set of other tables, that contain the candidate column name other than as a candidate column name; and selecting a row of the table as a column header when at least a specified threshold of candidate column names contained in the row have a candidate column name frequency that is greater than a non-candidate column name frequency. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. At a computer system the computer system, a method for detecting a column header for table including one or more rows, the method comprising:
-
constructing a set of candidate column names for the table; inferring that a column included in the set of candidate column names is a hypernym of the cell values contained in the column based on the cell values contained in the column; and selecting the row containing the column as a column header for the table. - View Dependent Claims (18, 19, 20)
-
-
21. A system, the system comprising:
-
one or more processors; system memory; and one or more computer storage media having stored thereon computer executable instructions representing a subject column detector, the subject column detector for detecting one or more subject columns of a table, the subject column detector configured to; select a specified number of columns from the table as subject column candidates, each subject column candidate being a candidate for a subject column of the table, each subject column candidate including a plurality of values; for each subject column candidate; identify occurrences of any value from among the plurality of values being paired with one or more column names across a plurality of other tables; and calculate a score for the subject candidate column based on the identified occurrences, the calculated score indicating a likelihood of the candidate column being a subject column; and select at least one of the subject column candidates as a subject column of the table in accordance with the calculated scores. - View Dependent Claims (22, 23, 24, 25)
-
-
26. A system, the system comprising:
-
one or more processors; system memory; and one or more computer storage media having stored thereon computer executable instructions representing a column header detector, the subject column detector for detecting one or more subject columns of a table, the subject column detector configured to; construct a set of candidate column names for the table from data defining the table; for each candidate column name in the set of candidate column names; calculate a candidate column name frequency for the candidate column name by identifying one or more other tables, from among a set of other tables, that also contain the candidate column name as a candidate column name; and calculate a non-candidate column name frequency for the candidate column name by identifying a second one or more other tables, from among the set of other tables, that contain the candidate column name other than as a candidate column name; and select a row of the table as a column header when at least a specified threshold of candidate column names contained in the row have a candidate column name frequency that is greater than a non-candidate column name frequency. - View Dependent Claims (27, 28, 29)
-
-
30. A system, the system comprising:
-
one or more processors; system memory; and one or more computer storage media having stored thereon computer executable instructions representing a column header detector, the subject column detector for detecting one or more subject columns of a table, the subject column detector configured to; construct a set of candidate column names for the table; infer that a column included in the set of candidate column names is a hypernym of the cell values contained in the column based on the cell values contained in the column; and select the row containing the column as a column header for the table. - View Dependent Claims (31, 32, 33)
-
Specification