popularity prediction of online news based on radial basis...

11
Popularity Prediction of Online News Based on Radial Basis Function Neural Networks with Factor Methodology WU Wei 1,2 , DU Wencai 2,3 , XU Hongzhou 1 , ZHOU Hui 2 , HUANG Mengxing 2 1 Institute of Deep-sea Science and Engineering, Chinese Academy of Sciences, Sanya 572000, China 2 College of Information Science and Technology, Hainan University, Haikou 570228, China 3 Faculty of International Tourism and Management, City University of Macau, Macau 999078, China [email protected], [email protected], [email protected], [email protected], [email protected] Abstract Online news reflects the dramatically increasing trend of social network use. Understanding what type of online news is popular and easy to spread to the public is a valuable focus for media influence analysis and social marketing. By abstracting detailed characteristics of online news, important influential factors are selected from diverse variables according to the principle component method and function approximation. In consideration of the high-dimensionality of the popularity ranking model, back-propagation neural networks (BPNN) was employed to predict popularity using artificial neural networks. The simulation results compare various forecasting methods based on factors achieved in previous work. This provides an effective prediction model according to real situations, with an accuracy level of 95%. Keywords: online news popularity; back-propagation neural networks; factor analysis; model identification; neural network prediction. 1 Introduction Online news dominates web resources and continues to consolidate its superior status. It is easy to produce and to distribute online news in user-friendly ways at low cost; as a result, the internet is full of articles written by numerous authors, whose works suffer from comprehensive favor ranking by viewers, i.e., popularity rating. These rating systems are accounted for by comment columns or feedback access from users via personal computers, tablets, and mobile phones, including text messages, emoticons, and sharing mechanisms [1] . The number of article hits is not well-accepted as a measure of real situations because it can be a misleading number affected by web crawlers or search engines. A widely-accepted method is to focus on the sharing times of the article, because this can represent its general influence. This statistical idea is widely accepted by mass media and advertising companies. For website managers, blogs, Twitter, or news applications, it is important to understand what type of online news is popular and easy to disseminate to the public, particularly in order to execute proper advertising activities and distribute specific content in more effective ways [2] . Therefore, the primary aim of this research is to determine what type of online news enjoys the greatest popularity and shares in public (or specific groups) in order to capture the characteristics of news and to build a connection model between these factors and popularity ratings. The remainder of this paper is organized as follows: in Section 2, research progress relevant to this issue is introduced, including several inspiring achievements and models. Section 3 presents the proposed dataset description and model structure. The simulation results and discussion are presented in Section 4. Conclusions and some possible future research directions are provided in Section 5. 2 Related Works The internet represents an industrialized concept growing as a result of impetus from an industrial chain. Online news is quickly delivered between parties via this capable carrier. Sharing on Facebook can share more than 2,600,000 i/min (items per minute), sharing video links on VINE can reach approximately 8,333 i/min, and sharing links on Twitter can share approximately 300,000 i/min. Online news tracking with real-time coverage prevails over traditional physical media. Understanding how the public respond to specific issues (the popularity of a specific article) is a burgeoning research branch. Many encouraging media tracking and prediction models have been achieved from a time-based analysis perspective, and some researchers have posited that popularity rating is time-sensitive: numbers of YouTube video viewers fluctuate within 24 hours, and sharing times constantly vary [3] . U.S. online newspapers indicate a quality of hysteresis when compared to user activities. Structural equations corresponding to this phenomenon can be determined [4] . A survival model is applied to the lifetime analysis of online news by setting a threshold for comparison computing over one week (an observation period of less than seven days) [5] . Characteristics analysis is another exciting direction for social news analysis, which focuses on the news itself (represented by global sharing) and eliminates other possible influential factors such as time, comments, and emoticons. It is a reasonable approach because sharing is continuously increasing, its impact on the public fluctuates day-by-day, and it will exceed certain time limits (several hours or a couple of days). Moreover, this research aspect is diverse due to the multiplicity of media formats. Video clips, music albums, and text news are spreading at various speeds [6, 7] . Particularly in regard to online news prediction, an article is comprised of many characteristics, leading some experts to forecast trends by dividing it into smaller unites such as subject, content sensitivity, and semantic networks, so that the news contains more local variables for prediction. This method is valuable because it can capture detailed information in one piece of news. Mathematical methods of online news popularity prediction are evolving from simple function regressions

Upload: others

Post on 10-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Popularity Prediction of Online News Based on Radial Basis ...ir.idsse.ac.cn/bitstream/183446/3949/1/Popularity Prediction of Onlin… · Popularity Prediction of Online News Based

Popularity Prediction of Online News Based on Radial Basis Function

Neural Networks with Factor Methodology

WU Wei1,2, DU Wencai*2,3, XU Hongzhou1, ZHOU Hui2, HUANG Mengxing2

1 Institute of Deep-sea Science and Engineering, Chinese Academy of Sciences, Sanya 572000, China 2 College of Information Science and Technology, Hainan University, Haikou 570228, China

3 Faculty of International Tourism and Management, City University of Macau, Macau 999078, China

[email protected], [email protected], [email protected], [email protected], [email protected]

Abstract Online news reflects the dramatically increasing trend

of social network use. Understanding what type of online

news is popular and easy to spread to the public is a

valuable focus for media influence analysis and social

marketing. By abstracting detailed characteristics of online

news, important influential factors are selected from

diverse variables according to the principle component

method and function approximation. In consideration of

the high-dimensionality of the popularity ranking model,

back-propagation neural networks (BPNN) was employed

to predict popularity using artificial neural networks. The

simulation results compare various forecasting methods

based on factors achieved in previous work. This provides

an effective prediction model according to real situations,

with an accuracy level of 95%.

Keywords: online news popularity; back-propagation

neural networks; factor analysis; model identification;

neural network prediction.

1 Introduction

Online news dominates web resources and continues to

consolidate its superior status. It is easy to produce and to

distribute online news in user-friendly ways at low cost; as

a result, the internet is full of articles written by numerous

authors, whose works suffer from comprehensive favor

ranking by viewers, i.e., popularity rating. These rating

systems are accounted for by comment columns or

feedback access from users via personal computers, tablets,

and mobile phones, including text messages, emoticons,

and sharing mechanisms [1]. The number of article hits is

not well-accepted as a measure of real situations because it

can be a misleading number affected by web crawlers or

search engines. A widely-accepted method is to focus on

the sharing times of the article, because this can represent

its general influence. This statistical idea is widely

accepted by mass media and advertising companies. For

website managers, blogs, Twitter, or news applications, it

is important to understand what type of online news is

popular and easy to disseminate to the public, particularly

in order to execute proper advertising activities and

distribute specific content in more effective ways [2].

Therefore, the primary aim of this research is to determine

what type of online news enjoys the greatest popularity

and shares in public (or specific groups) in order to capture

the characteristics of news and to build a connection model

between these factors and popularity ratings.

The remainder of this paper is organized as follows: in

Section 2, research progress relevant to this issue is

introduced, including several inspiring achievements and

models. Section 3 presents the proposed dataset

description and model structure. The simulation results

and discussion are presented in Section 4. Conclusions and

some possible future research directions are provided in

Section 5.

2 Related Works The internet represents an industrialized concept

growing as a result of impetus from an industrial chain.

Online news is quickly delivered between parties via this

capable carrier. Sharing on Facebook can share more than

2,600,000 i/min (items per minute), sharing video links on

VINE can reach approximately 8,333 i/min, and sharing

links on Twitter can share approximately 300,000 i/min.

Online news tracking with real-time coverage prevails

over traditional physical media. Understanding how the

public respond to specific issues (the popularity of a

specific article) is a burgeoning research branch.

Many encouraging media tracking and prediction

models have been achieved from a time-based analysis

perspective, and some researchers have posited that

popularity rating is time-sensitive: numbers of YouTube

video viewers fluctuate within 24 hours, and sharing times

constantly vary [3]. U.S. online newspapers indicate a

quality of hysteresis when compared to user activities.

Structural equations corresponding to this phenomenon

can be determined

[4]. A survival model is applied to the

lifetime analysis of online news by setting a threshold for

comparison computing over one week (an observation

period of less than seven days)[5].

Characteristics analysis is another exciting direction

for social news analysis, which focuses on the news itself

(represented by global sharing) and eliminates other

possible influential factors such as time, comments, and

emoticons. It is a reasonable approach because sharing is

continuously increasing, its impact on the public fluctuates

day-by-day, and it will exceed certain time limits (several

hours or a couple of days). Moreover, this research aspect

is diverse due to the multiplicity of media formats. Video

clips, music albums, and text news are spreading at various

speeds [6, 7]. Particularly in regard to online news prediction,

an article is comprised of many characteristics, leading

some experts to forecast trends by dividing it into smaller

unites such as subject, content sensitivity, and semantic

networks, so that the news contains more local variables

for prediction. This method is valuable because it can

capture detailed information in one piece of news.

Mathematical methods of online news popularity

prediction are evolving from simple function regressions

Page 2: Popularity Prediction of Online News Based on Radial Basis ...ir.idsse.ac.cn/bitstream/183446/3949/1/Popularity Prediction of Onlin… · Popularity Prediction of Online News Based

to intelligent identification. Variable prediction accuracy

is not easy to control due to system characteristics and

sampling datasets. For a stable media company, variable

prediction accuracy indicates gradual data trends, which

provide an effective basis for function approximation;

typical examples are shown in [8]. However, in most

circumstances, due to the uncertain system status of users

and operations, online media prediction models are

nonlinear; thus, there is a need to develop better

identification mechanisms, including exponential

functions, differential algorithms, and support vector

regressions[9]. However, a significant problem is that the

complexity of advanced prediction methods requires

advanced computing facilities, which may provide

unattainable solutions [10, 11].

Based on the discussion above, it is clear that online

news prediction is a complex compromise between

variable selection and computing modes. Achieving

acceptable prediction results requires elaborate efforts

including: a) data preparation and data cleaning, which

require experimental and data processing techniques; b)

selection of representable independent variables from

irrelevant ones; c) confirmation of suitable models or

algorithms; and d) adjustment of simulation algorithm

adaptability for different circumstances.

3 Popularity Prediction Modeling In this section, datasets are introduced and the overall

structure of a popularity prediction model with neural

networks based on factor methodology is proposed.

3.1 Dataset Description

This dataset summarizes a heterogeneous set of

features of articles published by Mashable over a period of

two years, The dataset is an open source donation in

machine learning databases of UC Irvine Machine

Learning Repository. The goal is to predict the number of

shares in social networks (popularity). A rough description

is provided in Table 1, and some variables are illustrated in

Figs. 1 through 4.

Fig. 1. Variable descriptions (1rd to 13th 39,797 samples): (a) n_tokens_title#1 represents the number of words in the title; (b)n_tokens_content#2 represents the number of words in the content; (c) n_unique_tokens#3 represents the rate of unique words in the content; (d) n_non_stop_words#4

represents the rate of non-stop words in the content, and n_non_stop_unique_tokens#5 represents the rate of unique non-stop words in the content; (e)

num_hrefs#6 represents the number of links; (f) num_self_hrefs#7 represents the number of links to other articles published by Mashable; (g) num_imgs#8

represents the number of images; (h) num_videos#9 represents the number of videos; (i) average_token_length#10 represents the average length of the

words in the content; (j) num_keywords#11 represents the number of keywords in the metadata; (k) rate_positive_words#12 represents the rate of positive words among non-neutral tokens; (l) rate_negative_words#13 represents the rate of negative words among non-neutral tokens.

Page 3: Popularity Prediction of Online News Based on Radial Basis ...ir.idsse.ac.cn/bitstream/183446/3949/1/Popularity Prediction of Onlin… · Popularity Prediction of Online News Based

Fig. 2. Variable descriptions (14th to 32th 39,797 samples): (a) Data channel is lifestyle#14, entertainment#15, business#16, socmed#17, tech#18 and world#19. (b)

Article published date Monday#20, Tuesday#21, Wednesday#22, Thursday#23, Friday#24, Saturday#25, Sunday#26 and Weekend#27. The answer is negative/no

when the x value is less than 1, the answer is positive/yes when the x value is greater than 1. (c) The article closeness to LDA topics of LDA_00#28,

LDA_01#29, LDA_02#30, LDA_03#31, LDA_04#32.

Fig. 3. Variable descriptions (33th to 48th 39,797 samples): (a) kw_min_min#33 represents the worst keyword (minimum shares); (b) kw_max_min#34 represents the best keyword (maximum shares); (c) kw_avg_min#35 represents the worst keyword (average shares); (d) kw_min_max#36 represents the best

keyword (minimum shares); (e) kw_max_max#37 represents the best keyword (maximum shares); (f) kw_avg_max#38 represents the best keyword (average shares); (g) kw_min_avg#39 represents the average keyword (minimum shares); (h) kw_max_avg#40 represents the average keyword (maximum shares); (i)

kw_avg_avg#41 represents the average keyword (average shares); (j) self_reference_min_shares#42 represents the minimum shares of referenced articles in

Mashable; (k) self_reference_max_shares#43 represents the maximum shares of referenced articles in Mashable; (l) self_reference_avg_sharess#44

represents the average shares of referenced articles in Mashable.(m) global_subjectivity#45 represents text subjectivity ; (n) global_sentiment_polarity#46

represents text sentiment polarity; (o) global_rate_positive_words#47 represents the rate of positive words; (p) global_rate_negative_words#48 represents the rate of negative words in the content.

Page 4: Popularity Prediction of Online News Based on Radial Basis ...ir.idsse.ac.cn/bitstream/183446/3949/1/Popularity Prediction of Onlin… · Popularity Prediction of Online News Based

Fig. 4. Variable descriptions (49th to 58th 39,797 samples): (a) avg_positive_polarity#49 represents the average polarity of positive words; (b) min_positive_polarity#50 represents the minumum polarity of positive words; (c) max_positive_polarity#51 represents the maximum polarity of positive

words; (d) avg_negative_polarity#52 represents the average polarity of negative words; (e) min_negative_polarity#53 represents the minimum polarity of negative words ; (f) max_negative_polarity#54 represents the maximum polarity of negative words; (g) title_subjectivity#55 and abs_title_subjectivity#56

represent the title subjectivity and absolute subjectivity level of the article, respectively; (h) title_sentiment_polarity#57 and abs_title_sentiment_polarity#58

represent the title polarity and absolute polarity level of the article, respectively.

The dataset in Table 1 contains 61 attributes, with 58

predictive attributes and 1 target value; the dataset is an

open source file provided by the UCI Machine Learning

Repository. In this file, data with missing or incorrect

values is absent, so the major priority accomplished by

data cleaning is negligible. In regard to an excess of

sparese data (0 or 1) representing the published news date,

each specific data point is non-removable. As shown in

Fig. 1, attributes such as n_tokens_title#1, average_token_length#10,

num_keywords#11, rate_positive_words#12, and rate_negative_words#13

indicate that the maximun share is achieved when these

factors reach specific appropriate values, neither too large

nor too small. Some variables such as n_tokens_content#2,

num_hrefs#6, num_self_hrefs#7, num_imgs#8, and num_videos#9 can be

simulated as F-distributions. As shown in Fig. 2, the

published date and data channel are discrete attributes; this

type of data is acceptable for artificial model simulation.

3.2 Problem Formulation & System Structure

As demonstrated in the previous section, let the 1st to

58th variables form a parameter matrix P(39,769×58), and let

shares form the target matrix T(39,769×1). The system

function is then determined as follows:

(1)

System F is the given form of true transmission from P to

T (data pairs). The primary research goal is to build an

approximate and robust Fζ to replace the unknown real

system with an acceptable error ε; this error can be

calculated according to M with Euclidean distance,

mean-squared error, and many effective equations, and all

processes are expected to finish in limited time t0.

(2)

Let a temporal data sequence Sζ be produced by system

Yζ, in which Yζ is measurable and bounded, Yζ∈ Ω, Ω is a

Compact set, and the system status is represented by

regression deterministic tracking [12].

(3)

With regard to the aforementioned recorded time series,

news sharing T is a bounded series, and all variables vary

within limited bounds, i.e., let Tζ → T, T = [t(1),t(2),...,t(n)]

∈ Rm×n, where Tζ=[tζ(1), tζ(2),..., tζ(m)], i = 1,2,...,m.

Model prediction methods according to offline training

is a strong basis for online forecasting [13]. Online

prediction models are often derived from offline models,

and rely heavily on training and preliminary results. The

research goal is to predict the popularity of news with a

given dataset in an offline model design, which provides a

useful structure for further advancements. In regard to the

collection of 39,797 samples with 61 attributes, they can

be divided into two parts: let the majority serve as training

items and the remaining samples be used as testing

instances.

3.3 Principle Component Abstracting of Dataset

The matrix P contains 58 variables related to online

news, but some may not be necessary for system

simulation. As shown in Figs. 1(c) and 1(d),

n_unique_tokens#3, n_non_stop_words#4, and

n_non_stop_unique_tokens#5 represent sharing with identical

trends, therefore indicating similar features in most

circumstances. This phenomenon is not rare in this matrix:

the kw_max_min#34 and kw_avg_min#35 in Figs. 3(b) and 3(c);

self_reference_min_shares#42, self_reference_max_shares#43, and

self_reference_avg_sharess#44 in Figs. 3(j), 3(k), and 3(l). For

modeling systems, these variables convey specific

influences on shares, but there is no need to include them

all at the expense of algorithm convergence speed.

Removal of repetitive variables from P(39,769×58) is a

preliminary procedure before further investigation and

modeling, which requires elimination of irrelevant and

similar parameters.

Page 5: Popularity Prediction of Online News Based on Radial Basis ...ir.idsse.ac.cn/bitstream/183446/3949/1/Popularity Prediction of Onlin… · Popularity Prediction of Online News Based

Principal component analysis (PCA) is a mathematical

method designed to reassemble independent variables x1,

x2, xp from a full matrix X, effective for dimensionality

reduction. For an observation dataset with p variables, X

can be arranged as follows:

(4)

Where xj = (x1j, x2j, xnj) T, j=1, 2, p. Principal component

transform can be expressed as follows:

(5)

In all linear combinations of xj, Fi is the first principal

component because it effectively conveys information

with maximum variance value; it requires the following:

(6)

The operation details of the current research include the

following: let F0 represent knowledge from P(39,769×58) fully,

so that:

(7)

Step 1: Normalize the dataset by

(8)

Where

(9)

Step 2: Calculate the correlation matrix of the dataset

(10)

Step 3: Calculate eigenvalues λ1, λ2,…, λp and

eigenvectors ai1, ai2,…, aip of R according to the Jacobi

method, and select a principle component for further

investigation.

3.4 Prediction Based on Neural Networks

A nonlinear system realization requires identification

models with more elaborate structures, in which artificial

intelligence is promising. It provides an effective

computing mode derived from biology and neural science,

and has recently been embraced by the field of computer

science. Achievements in engineering areas such as

nonlinear control, pattern recognition, optimization, signal

analysis and processing, aerospace, and intelligent

monitoring have provided inspiring results which impact

daily life, companies, and projects[14].

A mixed structure is necessary to describe the relationship

between influential factors due to the interconnection

among neurons; identification modeling based on dynamic

neural networks can be adopted as an appropriate

simulation; the basic form of neural networks is shown in

Fig. 5.

Fig. 5. Structure of artificial neural networks: (a) neuron connection and activation signal; (b) neural networks with three layers (with M nodes in

the input layer, K nodes in the hidden layer, and L nodes in the output layer).

As shown in Fig. 5, the perception layer can receive input

signals from real circumstances, and these real data are

transmitted to the hidden layer via nonlinear mapping.

The linear weighting procedure is executed in the output

layer, computing results from the hidden layer. The final

stage is the application level, which introduces the results

from the output layer as control/prediction attributes for

real applications.

Let input x=(x1,x2,…,xM)T, x∈ RM. Data from real

systems is accepted; this dataset is sent to the hidden layer

for nonlinear mapping, and the output is y=(y1,y2,…,yL)T , y

∈ RL. The activation functions adopted here include

sigmoid, gaussian, piecewise linear, and threshold forms

[15]

(11)

Where this logistic function is determined by the slope

coefficient a. Here, ci represents the center nodes set.

Neural networks are considered to approach the

continuous function u(x) in the bounded compact set Ω if

the center nodes are reasonably distributed, and the

approximation error is arbitrarily small. The output of

node j can be expressed as follows:

(12)

Where σj is the normalized constant of j, cj is the center

vector of j, and c=(c1,c2,…,cM)T, c∈RM.

Page 6: Popularity Prediction of Online News Based on Radial Basis ...ir.idsse.ac.cn/bitstream/183446/3949/1/Popularity Prediction of Onlin… · Popularity Prediction of Online News Based

The linear mapping process uj(x) →yk is achieved from

the hidden layer to the output layer, i.e., the output of the

output layer node k is represented as follows:

(13)

Where x represents the adjusted weights from the hidden

layer to the output layer, and bk is the bias. Here, yk is a

response signal to the corresponding input; these data are

transferred to the workspace depending on the application

background.

Multilayer perception is appropriately applied due to the

nonlinear parameter structure, in which back-propagation

neural networks (BPNNs) and radial basis function neural

networks (RBFNNs) are widely applied with several

advantages including massively parallel distributed

architecture and self-learning adaptive ability [16].

4 Simulation and Discussion The prediction model is shown in Fig. 6, it consists of

three portions: (1) Reduction of data scale to decrease

irrelevant and secondary parameters by factor analysis; (2)

Building of a prediction model based on neural networks

using factors achieved in the previous stage; (3)

Evaluation of system performance by dataset validation,

and improvement of the accuracy and adaptability of the

algorithm by adjusting the scale of factors and the network

structure.

4.1 Prediction Based on Neural Networks

Factor abstraction is a compulsory means of system

modeling; the reasoning is provided in Section 3.3. For all

labeled variables of matrix P(39,769 × 58), the factors are

abstracted in Table 2.

The information contained in Fi decreases due to the

downsizing variance of each principle component. As

shown in Table 2, variables can be sorted by their

contribution rate Ci, calculated as follows:

(14)

Fig. 6. Popularity prediction model of online news with three interconnected parts: factor abstraction, identification with artificial intelligence and model evaluation (note: the identification model in the third stage results from stage 2).

Page 7: Popularity Prediction of Online News Based on Radial Basis ...ir.idsse.ac.cn/bitstream/183446/3949/1/Popularity Prediction of Onlin… · Popularity Prediction of Online News Based

The number of principle components is determined by

Ci. In most cases, Ci must be equal to or exceed 80% to

ensure that the combined variables can represent the

majority of information from the original variables; the

accumulative total variance is summarized in Table 3.

According to C22=81.923% accumulated from Table 3,

one possible option is to select these 22 variables as PCA

parameters Pd from Table 2.

(15)

where pj=(p1j, p2j, … , pnj)T, j = 1, 2, …, 22; n = 1, 2, … ,

39767. Specifically, the parameter Pd is achieved by the

22 variables described in Table 4. This result indicates that

the popularity level is affected by the published date, data

channel, LDA closeness, and other variables. This

connected relationship is vague and beyond subjective

judgments; meanwhile, a tree model is provided in Fig. 5

to visually indicate these results. Certain differences exist

between the tree model and the PCA method; however, the

variables selected are similar.

As shown in Fig. 7, the new popularity can be roughly

determined by variables. For example, if the news

publishing date is a weekend, then it may receive more

shares with an average of 4186.155 shares; however, most

articles (91.6% of instances) are not published at this

prime time, receiving an average of 2351.337 shares.

Based on node 3, it is suggested that if the news is

published on social media channels, the shares will be

twice as frequent (4310.918) as the other share types

(2211.531). Similarly, lifestyle channels are more popular

than other channels.

Page 8: Popularity Prediction of Online News Based on Radial Basis ...ir.idsse.ac.cn/bitstream/183446/3949/1/Popularity Prediction of Onlin… · Popularity Prediction of Online News Based

Fig. 7. Decision tree representing shares (by Chi-Square Automatic Interaction Detection increasing method).

4.2 Prediction Model Based on Neural Networks

4.2.1 Modeling Based on Neural Networks Training samples with correct outputs (goals) are

indispensable for identification with neural networks.

Fortunately, matrices Pd and T are competent as

supervisors to guide the offline learning processing. This

adjusting process includes two primary phases [17, 18].

The first step is unsupervised learning, which

determines the center vectors cj (from the hidden layer)

and the normalization constant by clustering all samples,

in which the K-means algorithm is executed as follows:

Step 1: Initialize cj(0)=[c1j(0),c2j(0),…,cMj(0)]T,

(j=1,2,…,K), learning rate β (0), and the termination

condition of error calculation ε.

Step 2: Calculate the Euclidean distance to confirm

node r with minimum distance

(16)

Where p is the sample sequence, and r is the node with the

minimum distance between cj(p-1)$ and x(p).

Step 3: Center adjustment

(17)

Where β(p) is the learning rate, and int() indicates a

number-rounding operation.

Step 4: Evaluation of clustering quality: for all samples

p(1,2,…,N), execute steps 2 and 3 until

(18)

The second stage consists of supervised learning. The

goal is to train weight wij by the least mean square method

or delta rules.

Step 1: Initialize weights wij (0), j=1,2,…,K; k=1,2, …,

L.

Step 2: Define the input-output data pairs; the desired

output is yk﹡

(k=1,2,…, L).

Step 3: The output from the hidden layer node j is

expressed as follows (current input is group p):

(19)

The output of node k in the output layer is expressed as

follows:

Step 4: Delta rule (weight adjustment rule):

(20)

where u(x(p)) = [u1(x(p)),u2 (x(p)), … ,uK (x(p))]T, and

η is the learning rate.

Let the output yk﹡

(x(p))(k=1,2,…,L;p=1,2,…,N); then,

the local error function and global total error function are

expressed as follows:

(22)

This circulation is terminated until J converges to ε.

4.2.2 Simulation Results and Discussion Identifying the relationship between decisional

variables of news and shares with BPNN is well

established. In consideration of 22 independent variables,

this model does not provide linear function expressions

and thus, variables cannot be separated from one another

because some are interconnected (e.g., the number of

statistics).

Let the input matrix be Pd. It is sent to the hidden layer

for the mapping operation, with adjusted weights. Finally,

popularity (represented by shares) is derived from the

output layer. Prediction results are shown in Fig. 8.

Page 9: Popularity Prediction of Online News Based on Radial Basis ...ir.idsse.ac.cn/bitstream/183446/3949/1/Popularity Prediction of Onlin… · Popularity Prediction of Online News Based

Fig. 8. Prediction based on back propagation neural networks (model parameters:ε=10-6, η=0.001).

case 1: (a) share comparison between predictions and expectations, K=20, selected previous 80% samples trained, the remaining 20% tested; (b) errors; (c) relative error coefficient.

case 2: (d) share comparison between predictions and expectations, K=20, random selected 80% samples trained, the remaining 20% tested;

(e) errors; (f) relative error coefficient. case 3: (g) share comparison between predictions and expectations, K=30, selected previous 80% samples trained, the remaining 20% tested;

(h) errors; (i) relative error coefficient. case 4: (j) share comparison between predictions and expectations, K=30, random selected 80% samples trained, the remaining 20% tested;

(h) errors; (l) relative error coefficient.

Page 10: Popularity Prediction of Online News Based on Radial Basis ...ir.idsse.ac.cn/bitstream/183446/3949/1/Popularity Prediction of Onlin… · Popularity Prediction of Online News Based

The simulation results in Fig. 8 provide acceptable

prediction shares under various conditions: the relative

error coefficient in (c) demonstrates that only five points

exceed 10%. The overall model accuracy is 95%, with

detailed data depicted in Table 5.

5 Conclusions Online news popularity prediction with neural network

modeling is feasible and valuable. The PCA method can be

used as an effective tool for factor analysis and selection.

The supervised training of radial basis function neural

networks requires a proper dataset with determined tutors

(given the answers), so this nonlinear system is well-suited

to data forecasting and self-evolution. As shown by the

simulation results in the previous section, this

identification method (influential factor selection and

network simulation) can be popularized into other

simulation circumstances. Parallel computing and

distributed file systems can be adopted in data processing;

however, factor abstraction is a preliminary procedure,

which requires intervention by human guides.

Acknowledgements This research was supported by the following grants:

National Science Foundation of China (Grant NO.

61162010, NO.61440019, NO.61462022, and NO

71161007 and the projects of Ministry of Science and

Technology of China (Grant NO. S2013HR0034L), the

100 Talents Project of Chinese Academy of Sciences

(Grant NO.SIDSSE-BR-201304, and Hainan Science

Foundation (Grant NO. 614228).

*responding author.

References [1] Tatar A, Antoniadis P, De Amorim M D, et al. From

popularity prediction to ranking online news. Social

Network Analysis and Mining, Vol. 4, No. 1, pp. 1-12,

April, 2014.

[2] Liu Q, Zhou M, Zhao X. Understanding News 2.0: A

framework for explaining the number of comments

from readers on online news. Information &

Management, Vol. 52, No. 7, pp. 764-776, April, 2015.

[3] Pinto H, Almeida J M, Gonalves M A. Using early

view patterns to predict the popularity of youtube

video. The sixth ACM international conference on

Web search and data mining. Rome, Italy, 2013, pp.

365-374.

[4] Lee J G, Moon S, Salamatian K. An approach to model

and predict the popularity of online contents with

explanatory factors. IEEE/WIC/ACM International

Conference on Web Intelligence and Intelligent Agent

Technology. Toronto, Canada, 2010, pp. 623-630.

[5] Lee A M, Lewis S C, Powers M. Audience Clicks and

News Placement A Study of Time-Lagged Influence in

Online Journalism. Communication Research, Vol. 41,

No. 4, pp. 505-530, November, 2014.

[6] Bhaskar A, Gyani J, Narsimha G. A novel approach to

predict the popularity of the video. IEEE Region 10

Symposium. Kuala Lumpur, Malaysia, 2014, pp.

578-583.

[7] Ren Y, Shen J, Wang J, et al. Mutual verifiable

provable data auditing in public cloud storage. Journal

of Internet Technology, Vol. 16, No. 2, pp. 317-323,

March, 2015.

[8] Nuutinen T, Ray C, Roos E. Do computer use, TV

viewing, and the presence of the media in the bedroom

predict school-aged children's sleep habits in a

longitudinal study. BMC Public Health, Vol. 13, No. 1,

pp. 684-685, March, 2013.

[9] F, Almeida J M, Gonalves M A, et al. On the

dynamics of social media popularity: a YouTube case

study [J]. ACM Transactions on Internet Technology

(TOIT), Vol.14, No.4, pp. 1-22, December, 2014.

[10] Shen C C. Maximum Likelihood DOA Estimation

Using Particle Swarm Optimization under Sensor

Perturbation Conditions. Journal of Internet

Technology, Vol. 16, No. 5, pp. 847-855, September,

2015.

[11] Du C, Zhou Z B, Ying S, et al. An efficient indexing

and query mechanism for ubiquitous IoT services.

International Journal of Ad Hoc and Ubiquitous

Computing, Vol. 18, No. 4, pp. 245-255, June, 2015.

[12]Wang C, Hill D J. Deterministic learning and rapid

dynamical pattern recognition. IEEE Transactions on

Neural Networks, Vol. 18, No.3, pp. 617-630. May,

2007.

[13]Mohanty S, Chattopadhyay A, Peralta P, et al.

Bayesian statistic based multivariate Gaussian process

approach for offline/online fatigue crack growth

prediction. Experimental mechanics, Vol. 51, No.6, pp.

833-843, July, 2011.

[14]Hornik K, Stinchcombe M, White H. Multilayer

feedforward networks are universal approximators.

Neural networks, Vol. 2, No.5, pp. 359-366, May,

1989.

[15]Seshagiri S, Khalil H K. Output feedback control of

nonlinear systems using RBF neural networks, IEEE

Transactions on Neural Networks, Vol. 11, No. 1, pp.

69-79, January, 2000.

[16] Shen B, Hu B W, Zhang H. Method for the analysis of

the preferences of network users. IET Networks, Vol.

5, No. 1, pp. 8-12, January, 2016.

[17] Karia D C, Lande B K, Daruwala R D. Performance

analysis of HMM–and ANN–based spectrum vacancy

predictor behavior for cognitive radios. International

Journal of Ad Hoc and Ubiquitous Computing, Vol. 11,

No. 4, pp. 206-213, May, 2012.

[18] Schmidhuber J. Deep learning in neural networks: An

overview. Neural Networks, Vol. 61, pp. 85-117,

January, 2015.

Page 11: Popularity Prediction of Online News Based on Radial Basis ...ir.idsse.ac.cn/bitstream/183446/3949/1/Popularity Prediction of Onlin… · Popularity Prediction of Online News Based

Biographies

Wei Wu received the B.Sci from

Hubei University of Science and

Technology, the Master Degrees in

College of Information Science and

Technology from Hainan University.

Now he is serving as a full time faculty

in the Institute of Deep-sea Science and

Engineering, Chinese Academy of

Sciences. His research interest covers artificial

intelligence, big data theory and application.

Wencai Du received the B.Sci from

Peking University, China, two

Master Degrees from Twente

University (ITC), The Netherlands

and Hohai University, China,

respectively, the Ph.D. degree from

South Australia University,

Australia, and Post-doct fellow in

Israel Institute of Technology,

Haifa, Israel. He is a Professor of ICT, working in

Hainan University and City University of Macau. His

expertise covers broad areas of information and

communication technologies, social networking and

e-service. His research interests are in the areas of

maritime communication, information management,

and marketing, the focus especially being on tourism

industry operating in the domains of social media

marketing, e-Commerce and e-Education.

Hongzhou Xu received the B.Sc. from

Ocean University of China, Master

Degree and Ph.D. degree from South

China Sea Institute of Oceanology,

Chinese Academy of Sciences. Now he

is serving as a full time faculty in the

Institute of Deep-sea Science and

Engineering, Chinese Academy of

Sciences. His research interest covers

massive data modeling and simulation, ocean circulation

observation and numerical simulation.

Hui Zhou received the B.S. degree in

computer science from University of

Science and Technology of China in

2002, the PhD degree in computer

software and technology from

Graduate University of Chinese

Academy of Sciences (GUCAS) in

2008. Hui Zhou has worked in IBM

Research & Development Center

(Beijing) from July 2008, and joined Hainan University as

a staff college since May 2011. Hui Zhou’s research

interests include computer network, digital tourism, and

cluster file system.

Mengxing Huang received the

PhD degree from the School of

Automation, Northwestern

Polytechnic University, and

Post-doct fellow of Computer

Science and Technology in

Tsinghua University. Now he is

serving as a full time faculty in the

College of Information Science

and Technology from Hainan University. His research

interest covers data and knowledge engineering, big data

and cloud computing, Internet of Things.