Computer implemented semantic search methodology, system and computer program product for determining information density in text
First Claim
1. A computer program product comprising a non-transitory computer readable medium having computer usable program code executable to perform operations for determining an informative score of textualized digital web media, the operations of the computer program product comprising:
- compiling a list of web media sources for analysis in computer readable memory;
querying the web media sources for a block of text;
storing the block of text in volatile computer readable memory;
identifying sentences within the block of text;
storing the sentences as strings within an array;
parsing each sentence to quantify one or more of the following;
a number of words in the sentence;
a number of prepositions, postpositions, adjectives, adverbs, verbs, nouns, and grammatical conjunctions, by referencing words within the sentence with a dictionary in computer readable memory;
a number of dependent clauses in the sentence;
a number of independent clauses in the sentence;
a number of ellipsis, a number of dashes (both en dashes and em dashes), and a number of commas, semicolons, and colons;
a number of subjects and predicates in the sentence;
a number of appositions in the sentence;
a number of syllables in each word of the sentence by cross-referencing each word with the dictionary in persistent storage; and
a number of alphanumeric characters in the sentence;
storing each quantified number in a persistent computer readable database with a time-stamp identifying the date the number(s) were quantified;
calculating a semantic density score for each web media source in the list of web media sources, wherein the semantic density score is a function of the quantified numbers for each sentence in the web media source; and
storing the semantic density score in a persistent computer readable database, the score exclusively associated with the web media content from which the score was derived.
0 Assignments
0 Petitions
Accused Products
Abstract
A method, computer program product and system are disclosed for determining the semantic density of textualized digital media (a measure of how much information is conveyed in a sentence or clause relative to its length). The more semantically dense text is, the more information it conveys in a given space. Users input a topic, a timeline, and one or more target web media sources for analysis. Text in the target media sources is deconstructed to determine density, and a density rating assigned to the web media source. Over time, users can track trends in the density of text media relative to a given topic, and determine how much information is being conveyed in connection with the topic, such as a political campaign. Line graphs, pie charts, and other time-elapsed output graphic representations of the semantic density are generated and rendered for the user.
32 Citations
20 Claims
-
1. A computer program product comprising a non-transitory computer readable medium having computer usable program code executable to perform operations for determining an informative score of textualized digital web media, the operations of the computer program product comprising:
-
compiling a list of web media sources for analysis in computer readable memory; querying the web media sources for a block of text; storing the block of text in volatile computer readable memory; identifying sentences within the block of text; storing the sentences as strings within an array; parsing each sentence to quantify one or more of the following; a number of words in the sentence; a number of prepositions, postpositions, adjectives, adverbs, verbs, nouns, and grammatical conjunctions, by referencing words within the sentence with a dictionary in computer readable memory; a number of dependent clauses in the sentence; a number of independent clauses in the sentence; a number of ellipsis, a number of dashes (both en dashes and em dashes), and a number of commas, semicolons, and colons; a number of subjects and predicates in the sentence; a number of appositions in the sentence; a number of syllables in each word of the sentence by cross-referencing each word with the dictionary in persistent storage; and a number of alphanumeric characters in the sentence; storing each quantified number in a persistent computer readable database with a time-stamp identifying the date the number(s) were quantified; calculating a semantic density score for each web media source in the list of web media sources, wherein the semantic density score is a function of the quantified numbers for each sentence in the web media source; and storing the semantic density score in a persistent computer readable database, the score exclusively associated with the web media content from which the score was derived. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system to perform operations for determining a level of discussion of a topic, the system comprising:
-
a webserver; a semantic indexing server; an (Relational Database Management System) RDBMS; a topic prompter module configured to prompt a user with a graphic user interface to input a topic for semantic density determination; a timeline prompter module configured to prompt the user with a graphic user interface to input a timeline for semantic density determination; a (Uniform Resource Locator) URL prompter module configured to prompt the user with a graphic user interface to input URLs for one or more web media sources, the URLs compiled into a list of web media sources for analysis in computer readable memory; a synonym determiner module configured to determine synonym(s) for the topic by referencing a thesaurus in computer readable memory; a block storer module configured to store the blocks of text forming the web media sources in volatile computer readable memory; a sentence identifier module configured to identify sentences within the block of text; a sentence parser module configured to parse each sentence to quantify one or more of the following; a number of words in the sentence; a number of prepositions, postpositions, adjectives, adverbs, verbs, nouns, and grammatical conjunctions, by referencing words within the sentence with a dictionary in computer readable memory; a number of dependent clauses in the sentence; a number of independent clauses in the sentence; a number of ellipsis, a number of dashes (both en dashes and em dashes), and a number of commas, semicolons, and colons; a number of subjects and predicates in the sentence; a number of appositions in the sentence; a number of syllables in each word of the sentence by cross-referencing each word with the dictionary in persistent storage; and a number of alphanumeric characters in the sentence; a value storer module configured to store each quantified number in a persistent computer readable database with a time-stamp identifying the date the number(s) were quantified; a calculator module configured to calculate a semantic density score for each web media source in the list of web media sources, wherein the semantic density score is a function of the quantified numbers for each sentence in the web media source; a density storer module configured to store the semantic density score in a persistent computer readable database, the score exclusively associated with the web media content from which the score was derived; a text querier module configured to query textualized media on the web media source for instances of the topic; a synonym query module configured to query textualized media on the web media source for instances of synonyms of the topic identified by referencing the dictionary; and a render module configured to render a graph on a computer display showing elapsed time across one axis, the graph showing a plurality of semantic density ratings for the web media identified by the user. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A computer program product comprising a non-transitory computer readable medium having computer usable program code executable to perform operations for determining a semantic density of textualized digital web media, the operations of the computer program product comprising:
-
prompting a user with a graphic user interface to input one or more keyword(s) for semantic density determination; prompting the user with the graphic user interface to input a timeline for semantic density determination; prompting the user with the graphic user interface to input domain names for one or more web media sources, the domain names compiled into a list of web media sources for analysis in computer readable memory; storing the blocks of text in the web media sources in volatile computer readable memory; identifying sentences within the block of text, wherein sentences are identified within the block of text as being a string of text following a period (.) immediately followed by a space ( ) which string of text also precedes a period (.) immediately followed by a space ( ); storing the sentences as strings within an array; identifying clauses within the sentences satisfying one or more of the criteria from the group consisting of; a string of text enclosed ellipsis; a string of text enclosed by en dashes a string of text enclosed by em dashes; a string of text within the sentence enclosed by one or more of commas, semicolons, and colons; a string of text within the sentence comprising words following a subject-object-verb (SOV) word order; and string of text within the sentence comprising words following an agent-object-verb (AOV) order; storing the identified clauses as strings within an array; analyzing dependencies between the clauses and increasing the magnitude of a dependency identifier exclusively associated with the sentence, the dependency identifier increased for each dependency identified between a first clause and a second clause in the sentence, wherein a dependency comprises a clause with one or more characteristics from the group consisting of;
a modifier in the first clause modifying a verb in a second clause;
a modifier in a first clause modifying one of a noun, verb, and adverb in second clause;
a complement in the first clause of one of a noun, adjective, adverb, or preposition in a second clause; and
an interjection referencing a noun in the second clause;for each sentence, creating one of an abstract semantic graph (ASG) and an abstract syntax tree (AST); calculating a semantic density score for each web media source in the list of web media sources, wherein the semantic density score is a function of the number of clauses in each sentence of the web media source, the dependency identifier, and one or more of the ASG and the AST; storing the semantic density score in a persistent computer readable database, the score exclusively associated with the web media content from which the score was derived; determining synonym(s) for the keyword by referencing a thesaurus in computer readable memory; querying the sentences for instances of the keyword and synonyms; storing semantic density ratings for sentences comprising the keyword and synonyms; and rendering a graph on a computer display showing elapsed time across one axis, the graph showing a plurality of semantic density ratings for the web media identified by the user. - View Dependent Claims (20)
-
Specification