Methods and apparatus for identification and analysis of temporally differing corpora
First Claim
1. A method for identifying n-grams about an object, comprising:
- identifying, as a result of computing hardware and programmable memory, an object-specific corpus, that is a subset of a first corpus, where approximately all statements of the object-specific corpus are about a same first object;
identifying, as a result of computing hardware and programmable memory, statements of the object-specific corpus, for inclusion in a corpus of interest, upon a basis of a statement relating to a time period of interest;
identifying, as a result of computing hardware and programmable memory, statements of the object-specific corpus, for inclusion in a reference corpus, upon a basis of a statement relating to a reference time period that is different from the time period of interest;
identifying, as a result of computing hardware and programmable memory, n-grams of the corpus of interest, for inclusion in a corpus-of-interest list of n-grams;
identifying, as a result of computing hardware and programmable memory, n-grams of the reference corpus, for inclusion in a reference-corpus list of n-grams;
identifying, as a result of computing hardware and programmable memory, for each n-gram of the corpus-of-interest list of n-grams, for subsequent access in conjunction with an n-gram, a number of occurrences of the n-gram in the corpus of interest;
identifying, as a result of computing hardware and programmable memory, for each n-gram of the reference-corpus list of n-grams, for subsequent access in conjunction with an n-gram, a number of occurrences of the n-gram in the reference corpus;
determining, as a result of computing hardware and programmable memory, a selected list of n-grams from the corpus-of-interest list of n-grams or the reference-corpus list of n-grams;
determining, as a result of computing hardware and programmable memory, for each n-gram of the selected list of n-grams, for subsequent access in conjunction with an n-gram, an average number of occurrences of the n-gram, in the reference-corpus;
determining, as a result of computing hardware and programmable memory, for each n-gram of the selected list of n-grams, for subsequent access in conjunction with an n-gram, a difference value, between a number of occurrences of the n-gram in the corpus of interest and an average number of occurrences of the n-gram in the reference-corpus;
normalizing, as a result of computing hardware and programmable memory, for each n-gram of the selected list of n-grams, for subsequent access in conjunction with an n-gram, the difference value to produce a normalized difference value; and
determining, as a result of computing hardware and programmable memory, from the selected list of n-grams, a second selected list of n-grams, on a basis of the normalized difference value.
11 Assignments
0 Petitions
Accused Products
Abstract
Differences are identified, at the lexical unit and/or phrase level, between time-varying corpora. A corpus for a time period of interest is compared with a reference corpus. N-grams are generated for both the corpus of interest and reference corpus. Numbers of occurrences are counted. An average number of occurrences, for each n-gram of the reference corpus, is determined. A difference value, between number of occurrences in corpus of interest and average number of occurrences, is determined. Each difference value is normalized. N-grams can be selected for display, or for further processing, on the basis of the normalized difference value. Further processing can include selecting a sample period. A plurality of reference corpora are produced, where a begin time, for each sub-corpus of the plurality of reference corpora, differs, from a begin time for the corpus of interest, by an integer multiple of the sample period. Word Cloud visualization is shown.
27 Citations
28 Claims
-
1. A method for identifying n-grams about an object, comprising:
-
identifying, as a result of computing hardware and programmable memory, an object-specific corpus, that is a subset of a first corpus, where approximately all statements of the object-specific corpus are about a same first object; identifying, as a result of computing hardware and programmable memory, statements of the object-specific corpus, for inclusion in a corpus of interest, upon a basis of a statement relating to a time period of interest; identifying, as a result of computing hardware and programmable memory, statements of the object-specific corpus, for inclusion in a reference corpus, upon a basis of a statement relating to a reference time period that is different from the time period of interest; identifying, as a result of computing hardware and programmable memory, n-grams of the corpus of interest, for inclusion in a corpus-of-interest list of n-grams; identifying, as a result of computing hardware and programmable memory, n-grams of the reference corpus, for inclusion in a reference-corpus list of n-grams; identifying, as a result of computing hardware and programmable memory, for each n-gram of the corpus-of-interest list of n-grams, for subsequent access in conjunction with an n-gram, a number of occurrences of the n-gram in the corpus of interest; identifying, as a result of computing hardware and programmable memory, for each n-gram of the reference-corpus list of n-grams, for subsequent access in conjunction with an n-gram, a number of occurrences of the n-gram in the reference corpus; determining, as a result of computing hardware and programmable memory, a selected list of n-grams from the corpus-of-interest list of n-grams or the reference-corpus list of n-grams; determining, as a result of computing hardware and programmable memory, for each n-gram of the selected list of n-grams, for subsequent access in conjunction with an n-gram, an average number of occurrences of the n-gram, in the reference-corpus; determining, as a result of computing hardware and programmable memory, for each n-gram of the selected list of n-grams, for subsequent access in conjunction with an n-gram, a difference value, between a number of occurrences of the n-gram in the corpus of interest and an average number of occurrences of the n-gram in the reference-corpus; normalizing, as a result of computing hardware and programmable memory, for each n-gram of the selected list of n-grams, for subsequent access in conjunction with an n-gram, the difference value to produce a normalized difference value; and determining, as a result of computing hardware and programmable memory, from the selected list of n-grams, a second selected list of n-grams, on a basis of the normalized difference value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A system for identifying n-grams about an object, comprising:
-
one or more processors and programmable memory, wherein the system is configured; to accomplish identifying an object-specific corpus, that is a subset of a first corpus, where approximately all statements of the object-specific corpus are about a same first object; to accomplish identifying statements of the object-specific corpus, for inclusion in a corpus of interest, upon a basis of a statement relating to a time period of interest; to accomplish identifying statements of the object-specific corpus, for inclusion in a reference corpus, upon a basis of a statement relating to a reference time period that is different from the time period of interest; to accomplish identifying n-grams of the corpus of interest, for inclusion in a corpus-of-interest list of n-grams; to accomplish identifying n-grams of the reference corpus, for inclusion in a reference-corpus list of n-grams; to accomplish identifying, for each n-gram of the corpus-of-interest list of n-grams, for subsequent access in conjunction with an n-gram, a number of occurrences of the n-gram in the corpus of interest; to accomplish identifying, for each n-gram of the reference-corpus list of n-grams, for subsequent access in conjunction with an n-gram, a number of occurrences of the n-gram in the reference corpus; to accomplish determining a selected list of n-grams from the corpus-of-interest list of n-grams or the reference-corpus list of n-grams; to accomplish determining, for each n-gram of the selected list of n-grams, for subsequent access in conjunction with an n-gram, an average number of occurrences of the n-gram, in the reference-corpus; to accomplish determining, for each n-gram of the selected list of n-grams, for subsequent access in conjunction with an n-gram, a difference value, between a number of occurrences of the n-gram in the corpus of interest and an average number of occurrences of the n-gram in the reference-corpus; to accomplish normalizing, for each n-gram of the selected list of n-grams, for subsequent access in conjunction with an n-gram, the difference value to produce a normalized difference value; and to accomplish determining, from the selected list of n-grams, a second selected list of n-grams, on a basis of the normalized difference value. - View Dependent Claims (24, 25, 26, 27, 28)
-
Specification