COMPUTER IMPLEMENTED SEMANTIC SEARCH METHODOLOGY, SYSTEM AND COMPUTER PROGRAM PRODUCT FOR DETERMINING INFORMATION DENSITY IN TEXT
First Claim
1. A computer program product comprising a computer readable medium having computer usable program code executable to perform operations for determining an informative score of textualized digital web media, the operations of the computer program product comprising:
- compiling a list of web media sources for analysis in computer readable memory;
querying the web media sources for a block of text;
storing the block of text in volatile computer readable memory;
identifying sentences within the block of text;
storing the sentences as strings within an array;
parsing each sentence to quantify one or more of the following;
a number of words in the sentence;
a number of prepositions, postpositions, adjectives, adverbs, verbs, nouns, and grammatical conjunctions, by referencing words within the sentence with a dictionary in computer readable memory;
a number of dependent clauses in the sentence;
a number of independent clauses in the sentence;
a number of ellipsis, a number of dashes (both en dashes and em dashes), and a number of commas, semicolons, and colons;
a number of subjects and predicates in the sentence;
a number of appositions in the sentence;
a number of syllables in each word of the sentence by cross-referencing each word with the dictionary in persistent storage; and
a number of alphanumeric characters in the sentence;
storing each quantified number in a persistent computer readable database with a time-stamp identifying the date the number(s) were quantified;
calculating a semantic density score for each web media source in the list of web media sources, wherein the semantic density score is a function of the quantified numbers for each sentence in the web media source; and
storing the semantic density score in a persistent computer readable database, the score exclusively associated with the web media content from which the score was derived.
0 Assignments
0 Petitions
Accused Products
Abstract
A method, computer program product and system are disclosed for determining the semantic density of textualized digital media is (a measure of how much information is conveyed in a sentence or clause relative to its length). The more semantically dense text is, the more information it conveys in a given space. Users input a topic, a timeline, and one or more target web media sources for analysis. Text in the target media sources is deconstructed to determine density, and a density rating assigned to the web media source. Over time, users can track trends in the density of text media relative to a given topic, and determine how much information is being conveyed in connection with the topic, such as a political campaign. Line graphs, pie charts, and other time-elapsed output graphic representations of the semantic density are generated and rendered for the user.
46 Citations
20 Claims
-
1. A computer program product comprising a computer readable medium having computer usable program code executable to perform operations for determining an informative score of textualized digital web media, the operations of the computer program product comprising:
-
compiling a list of web media sources for analysis in computer readable memory; querying the web media sources for a block of text; storing the block of text in volatile computer readable memory; identifying sentences within the block of text; storing the sentences as strings within an array; parsing each sentence to quantify one or more of the following; a number of words in the sentence; a number of prepositions, postpositions, adjectives, adverbs, verbs, nouns, and grammatical conjunctions, by referencing words within the sentence with a dictionary in computer readable memory; a number of dependent clauses in the sentence; a number of independent clauses in the sentence; a number of ellipsis, a number of dashes (both en dashes and em dashes), and a number of commas, semicolons, and colons; a number of subjects and predicates in the sentence; a number of appositions in the sentence; a number of syllables in each word of the sentence by cross-referencing each word with the dictionary in persistent storage; and a number of alphanumeric characters in the sentence; storing each quantified number in a persistent computer readable database with a time-stamp identifying the date the number(s) were quantified; calculating a semantic density score for each web media source in the list of web media sources, wherein the semantic density score is a function of the quantified numbers for each sentence in the web media source; and storing the semantic density score in a persistent computer readable database, the score exclusively associated with the web media content from which the score was derived. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system to perform operations for determining a level of discussion of a topic, the system comprising:
-
a webserver; a semantic indexing server; an RDBMS; a topic prompter module configured to prompt a user with a graphic user interface to input a topic for semantic density determination; a timeline prompter module configured to prompt the user with a graphic user interface to input a timeline for semantic density determination; a URL prompter module configured to prompt the user with a graphic user interface to input URLs for one or more web media sources, the URLs compiled into a list of web media sources for analysis in computer readable memory; a synonym determiner module configured to determine synonym(s) for the topic by referencing a thesaurus in computer readable memory; a block storer module configured to store the blocks of text forming the web media sources in volatile computer readable memory; a sentence identifier module configured to identify sentences within the block of text; a sentence parser module configured to parse each sentence to quantify one or more of the following; a number of words in the sentence; a number of prepositions, postpositions, adjectives, adverbs, verbs, nouns, and grammatical conjunctions, by referencing words within the sentence with a dictionary in computer readable memory; a number of dependent clauses in the sentence; a number of independent clauses in the sentence; a number of ellipsis, a number of dashes (both en dashes and em dashes), and a number of commas, semicolons, and colons; a number of subjects and predicates in the sentence; a number of appositions in the sentence; a number of syllables in each word of the sentence by cross-referencing each word with the dictionary in persistent storage; and a number of alphanumeric characters in the sentence; a value storer module configured to store each quantified number in a persistent computer readable database with a time-stamp identifying the date the number(s) were quantified; a calculator module configured to calculate a semantic density score for each web media source in the list of web media sources, wherein the semantic density score is a function of the quantified numbers for each sentence in the web media source; a density storer module configured to store the semantic density score in a persistent computer readable database, the score exclusively associated with the web media content from which the score was derived; a text querier module configured to query textualized media on the web media source for instances of the topic; a synonym query module configured to query textualized media on the web media source for instances of synonyms of the topic identified by referencing the dictionary; and a render module configured to render a graph on a computer display showing elapsed time across one axis, the graph showing a plurality of semantic density ratings for the web media identified by the user. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A computer program product comprising a computer readable medium having computer usable program code executable to perform operations for determining a semantic density of textualized digital web media, the operations of the computer program product comprising:
-
prompting a user with a graphic user interface to input one or more keyword(s) for semantic density determination; prompting the user with the graphic user interface to input a timeline for semantic density determination; prompting the user with the graphic user interface to input domain names for one or more web media sources, the domain names compiled into a list of web media sources for analysis in computer readable memory; storing the blocks of text in the web media sources in volatile computer readable memory; identifying sentences within the block of text, wherein sentences are identified within the block of text as being a string of text following a period (.) immediately followed by a space ( ) which string of text also precedes a period (.) immediately followed by a space ( ); storing the sentences as strings within an array; identifying clauses within the sentences satisfying one or more of the criteria from the group consisting of; a string of text enclosed ellipsis; a string of text enclosed by en dashes a string of text enclosed by em dashes; a string of text within the sentence enclosed by one or more of commas, semicolons, and colons; a string of text within the sentence comprising words following a subject-object-verb (SOV) word order; and string of text within the sentence comprising words following an agent-object-verb (AOV) order; storing the identified clauses as strings within an array; analyzing dependencies between the clauses and increasing the magnitude of a dependency identifier exclusively associated with the sentence, the dependency identifier increased for each dependency identified between a first clause and a second clause in the sentence, wherein a dependency comprises a clause with one or more characteristics from the group consisting of;
a modifier in the first clause modifying a verb in a second clause;
a modifier in a first clause modifying one of a noun, verb, and adverb in second clause;
a complement in the first clause of one of a noun, adjective, adverb, or preposition in a second clause; and
an interjection referencing a noun in the second clause;for each sentence, creating one of an abstract semantic graph (ASG) and an abstract syntax tree (AST); calculating a semantic density score for each web media source in the list of web media sources, wherein the semantic density score is a function of the number of clauses in each sentence of the web media source, the dependency indentifier, and one or more of the ASG and the AST; storing the semantic density score in a persistent computer readable database, the score exclusively associated with the web media content from which the score was derived; determining synonym(s) for the keyword by referencing a thesaurus in computer readable memory; querying the sentences for instances of the keyword and synonyms; storing semantic density ratings for sentences comprising the keyword and synonyms; and rendering a graph on a computer display showing elapsed time across one axis, the graph showing a plurality of semantic density ratings for the web media identified by the user. - View Dependent Claims (20)
-
Specification