A kind of blog information identifies the method for crucial blog collection in propagating
A kind of blog information identifies the method for crucial blog collection in propagating
 CN 102,262,681 B
 Filed: 08/19/2011
 Issued: 12/02/2015
 Est. Priority Date: 08/19/2011
 Status: Active Grant
First Claim
1. blog information identifies a method for crucial blog collection in propagating, and it is characterized in that comprising following steps:
 1) collect in units of blog and determine associating between bloger;
Associate the linking relationship comprised in concern relation and blog between article;
Collect and determine that the association process between blog is;
first obtaining blog data from Blog Website, is each blog, i.e. bloger, giving unique identification ;
Then obtain the buddy list of bloger or pay close attention to list;
Buddy list determines friend relation twoway between bloger;
Pay close attention to list and determine concern relation unidirectional between bloger;
Friend relation can be expressed as two concern relations reverse each other;
If bloger pay close attention to bloger , then concern relation mark is between the two ;
Following acquisition blog ? the article in a few days pasted, to blog in each section of article if, link blog in article , then blog is thought with blog between there is linking relationship, be labeled as , wherein represent article paste the difference of date and current date;
If blog repeatedly quote blog middle article, then for minimum value wherein;
2) be that node builds blog network chart with blog, the limit of figure is the association between blog, the linking relationship between corresponding blog or the concern relation between bloger;
3) according to the weight of the association in Information Propagation Model determination blog network chart between blog, the weight of directed edge in blog network chart is namely determined, point three kinds of situations;
opposite side collection ein each directed edge , analyze corresponding incidence relation;
Wherein;
Situation 1, incidence relation is linking relationship;
, now adopt independent cascade model to be that weight is composed on limit, namely , the wherein initial value of linking relationship weight be set to 0.1, index parameters be set to 0.5;
Situation 2, incidence relation is concern relation;
, now adopt weighting cascade model to be that weight is composed on limit, namely , wherein gather bloger concern collection, refer to the scale of set;
The maximal value of concern relation weight be set to 0.6;
Situation 3, incidence relation be simultaneously linking relationship and pay close attention to relation, now select both determine that the maximal value of weight is as the weight on this limit, namely；
4) each blog is calculated to the expectation value of other blog information propagation effect power based on blog network chart and arranging of associated weights;
5) according to the expectation value of Information Communication influence power between blog, the key node set that in blog network chart, Information Communication influence power is maximum is identified, i.e. crucial blog collection.
Chinese PRB Reexamination
Abstract
The invention discloses a kind of can fast, accurately identify in blog information is propagated the method for crucial blog collection the steps include: 1) collect in units of blog and determine the concern relation between blog and linking relationship; 2) be that node builds blog network chart with blog, the limit of figure is the association between blog; 3) according to the weight associating (directed edge) between Information Propagation Model determination blog; 4) expectation value of each blog to other blog propagation effect power is calculated based on blog network chart; 5) key node set that in blog network chart, Information Communication influence power is maximum is identified.Combining information propagation model of the present invention, the incidence relation between application blog, is propagated by computing information and expects, the blog set that identification blog information is crucial in propagating fast, to facilitate the supervision of blog information.
4 Claims

1. blog information identifies a method for crucial blog collection in propagating, and it is characterized in that comprising following steps:

1) collect in units of blog and determine associating between bloger;
Associate the linking relationship comprised in concern relation and blog between article;
Collect and determine that the association process between blog is;
first obtaining blog data from Blog Website, is each blog, i.e. bloger, giving unique identification ;
Then obtain the buddy list of bloger or pay close attention to list;
Buddy list determines friend relation twoway between bloger;
Pay close attention to list and determine concern relation unidirectional between bloger;
Friend relation can be expressed as two concern relations reverse each other;
If bloger pay close attention to bloger , then concern relation mark is between the two ;
Following acquisition blog ? the article in a few days pasted, to blog in each section of article if, link blog in article , then blog is thought with blog between there is linking relationship, be labeled as , wherein represent article paste the difference of date and current date;
If blog repeatedly quote blog middle article, then for minimum value wherein;
2) be that node builds blog network chart with blog, the limit of figure is the association between blog, the linking relationship between corresponding blog or the concern relation between bloger; 3) according to the weight of the association in Information Propagation Model determination blog network chart between blog, the weight of directed edge in blog network chart is namely determined, point three kinds of situations;
opposite side collection ein each directed edge , analyze corresponding incidence relation;
Wherein;
Situation 1, incidence relation is linking relationship;
, now adopt independent cascade model to be that weight is composed on limit, namely , the wherein initial value of linking relationship weight be set to 0.1, index parameters be set to 0.5;
Situation 2, incidence relation is concern relation;
, now adopt weighting cascade model to be that weight is composed on limit, namely , wherein gather bloger concern collection, refer to the scale of set;
The maximal value of concern relation weight be set to 0.6;
Situation 3, incidence relation be simultaneously linking relationship and pay close attention to relation, now select both determine that the maximal value of weight is as the weight on this limit, namely ；
4) each blog is calculated to the expectation value of other blog information propagation effect power based on blog network chart and arranging of associated weights; 5) according to the expectation value of Information Communication influence power between blog, the key node set that in blog network chart, Information Communication influence power is maximum is identified, i.e. crucial blog collection.


2. blog information according to claim 1 identifies the method for crucial blog collection in propagating, and it is characterized in that step 2) in build blog network chart flow process be:
 first define blog network chart for digraph, wherein for blog set, each blog is as figure interior joint;
for the set associated between blog, i.e. the set of directed edge in figure;
Then to blog group in any two blogs with if, with between there is concern relation , or there is linking relationship , then exist with between define directed edge ;
In like manner if or , then directed edge is defined .
 first define blog network chart for digraph, wherein for blog set, each blog is as figure interior joint;

3. blog information according to claim 2 identifies the method for crucial blog collection in propagating, and it is characterized in that step 4) in calculate the expectation value of each blog to other blog information propagation effect power and divide three kinds of situations, for blog node with , mark for node right informational influence power expectation value, be equal to Information Communication impact probability;
 Wherein i and j is node with subscript value, points of three kinds situations calculate;
Situation 1, , then , represent that node affects oneself certainly;
Situation 2, if , and node to node unreachable, then ;
Situation 3, if , and node to node can reach, then find network chart interior joint to node shortest path, be labeled as , the path of maximum weight in all simple paths between dactylus point, wherein the simple path node on footpath that shows the way does not repeat;
equal shortest path weights, namely in path comprise the product of limit weight;
 Wherein i and j is node with subscript value, points of three kinds situations calculate;

4. blog information according to claim 3 identifies the method for crucial blog collection in propagating, and it is characterized in that step 5) in identify that the process of the key node set that Information Communication influence power is maximum in blog network chart is:
 first determine set scale, definition for being gathered in network chart the expectation value of the number of nodes of middle blog node impact;
Then the individual node that in network chart, coverage is maximum is selected to form initial sets , corresponding be calculated as follows;
 first determine set scale, definition for being gathered in network chart the expectation value of the number of nodes of middle blog node impact;
Specification(s)