Date ambiguity resolution
First Claim
1. A computer-implemented method for resolving ambiguities in date values associated with an attribute of an entity, the method comprising:
- at a computer system including one or more processors and memory storing one or more programs, the one or more processors executing the one or more programs to perform the operations of;
obtaining a first text string associated with an attribute of an entity, wherein the first text string is extracted from a first web document;
determining that the first text string conforms to one or more date formats;
assigning a first confidence value for each of the date formats for the first text string based on a first number of unknown variables that remain when interpreting the first text string using each of the date formats;
obtaining a second text string associated with the attribute of the entity, wherein the second text string is extracted from a second web document;
determining that the second text string conforms to one or more of the date formats;
assigning a second confidence value for each of the date formats for the second text string based on a second number of unknown variables that remain when interpreting the second text string using each of the date formats;
determining a first date string expressed in a date format with a highest first confidence value for the first text string;
determining a second date string expressed in a date format with a highest second confidence value for the second text string; and
merging a first subset of the first date string and a second subset of the second date string to obtain a date value for the attribute.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for resolving ambiguities in date values associated with an attribute in a memory of the computer system. If a first text string conforms to one or more date formats, a confidence value is assigned for each of the date formats for the first text string based on the amount of specificity with which the first text string conforms to each date format. Similarly, if a second text string conforms to one or more date formats, a confidence value is assigned for each of the date formats for the second text string based on the amount of specificity with which the second text string conforms to each date format. The date format with the highest confidence value for the first text string and the date format with the highest confidence value for the second text string are merged to obtain a date value for the attribute.
-
Citations
59 Claims
-
1. A computer-implemented method for resolving ambiguities in date values associated with an attribute of an entity, the method comprising:
-
at a computer system including one or more processors and memory storing one or more programs, the one or more processors executing the one or more programs to perform the operations of; obtaining a first text string associated with an attribute of an entity, wherein the first text string is extracted from a first web document; determining that the first text string conforms to one or more date formats; assigning a first confidence value for each of the date formats for the first text string based on a first number of unknown variables that remain when interpreting the first text string using each of the date formats; obtaining a second text string associated with the attribute of the entity, wherein the second text string is extracted from a second web document; determining that the second text string conforms to one or more of the date formats; assigning a second confidence value for each of the date formats for the second text string based on a second number of unknown variables that remain when interpreting the second text string using each of the date formats; determining a first date string expressed in a date format with a highest first confidence value for the first text string; determining a second date string expressed in a date format with a highest second confidence value for the second text string; and merging a first subset of the first date string and a second subset of the second date string to obtain a date value for the attribute. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 25)
-
-
16. A computer system for resolving ambiguities in date values associated with an attribute of an entity, the computer system comprising:
-
one or more processors; memory; and one or more programs stored in the memory, the one or more programs comprising instructions to; obtain a first text string associated with an attribute of an entity, wherein the first text string is extracted from a first web document; determine if that the first text string conforms to one or more date formats; assign a first confidence value for each of the date formats for the first text string based on a first number of unknown variables that remain when interpreting the first text string using each of the date formats; obtain a second text string associated with the attribute of the entity, wherein the second text string is extracted from a second web document; determine if that the second text string conforms to one or more date formats; assign a second confidence value for each of the date formats for the second text string based on a second number of unknown variables that remain when interpreting the second text string using each of the date formats; determine a first date string expressed in a date format with a highest first confidence value for the first text string; determine a second date string expressed in a date format with a highest second confidence value for the second text string; and merge a first subset of the first date string and a second subset of the second date string to obtain a date value for the attribute. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 29, 30, 34, 52)
-
-
31. A computer program product comprising a computer-readable storage medium, for resolving ambiguities in date values associated with an attribute of an entity, the computer-readable storage medium comprising:
-
program code for obtaining a first text string associated with an attribute of an entity, wherein the first text string is extracted from a first web document; program code for determining that the first text string conforms to one or more date formats; program code for assigning a first confidence value for each of the date formats for the first text string based on a first number of unknown variables that remain when interpreting the first text string using each of the date formats; program code for obtaining a second text string associated with the attribute of the entity, wherein the second text string is extracted from a second web document; program code for determining if that the second text string conforms to one or more of the date formats; program code for assigning a second confidence value for each of the date formats for the second text string based on a second number of unknown variables that remain when interpreting the second text string using each of the date formats; program code determining a first date string expressed in a date format with a highest first confidence value for the first text string; program code determining a second date string expressed in a date format with a highest second confidence value for the second text string; and program code for merging a first subset of the first date string and a second subset of the second date string to obtain a date value for the attribute. - View Dependent Claims (32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
-
46. A computer-implemented method for resolving ambiguities in date values associated with an attribute of an entity, the method comprising:
identifying a plurality of web documents associated with an attribute of an entity;
for each web document in the plurality of web documents,obtaining, from the web document, one or more text strings associated with the attribute of the entity; identifying one or more date formats for at least one of the one or more text strings; and assigning confidence values to each of the one or more date formats for the at least one of the one or more text strings based on a number of unknown variables that remain when interpreting the at least one of the one or more text strings using each of the one or more date formats; determining date strings expressed in date formats with highest confidence values for the at least one of the one or more text strings; and merging subsets of the date strings to obtain a date value for the attribute. - View Dependent Claims (47, 48, 49, 50, 51)
-
53. A computer system for resolving ambiguities in date values associated with an attribute of an entity, the computer system comprising:
-
one or more processors;
memory; andone or more programs stored in the memory, the one or more programs comprising instructions to; identify a plurality of web documents associated with an attribute of an entity; for each web document in the plurality of web documents, obtain, from the web document, at least two text strings associated with the attribute of the entity; identify one or more date formats for at least two text strings; and assign confidence values to each of the one or more date formats for the at least two text strings based on a number of unknown variables that remain when interpreting the at least one of the one or more at least two text strings using each of the one or more date formats; determine date strings expressed in date formats with highest confidence values for the at least two text strings; and merge subsets of the date strings to obtain a date value for the attribute. - View Dependent Claims (54, 55, 56, 57, 58, 59)
-
Specification