research article a novel adaptive conditional probability...

15
Research Article A Novel Adaptive Conditional Probability-Based Predicting Model for User’s Personality Traits Mengmeng Wang, 1,2 Wanli Zuo, 1,2 and Ying Wang 1,2,3 1 College of Computer Science and Technology, Jilin University, Changchun 130012, China 2 Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun 130012, China 3 College of Mathematics, Jilin University, Changchun 130012, China Correspondence should be addressed to Ying Wang; [email protected] Received 13 March 2015; Accepted 21 June 2015 Academic Editor: Antonino Laudani Copyright © 2015 Mengmeng Wang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. With the pervasive increase in social media use, the explosion of users’ generated data provides a potentially very rich source of information, which plays an important role in helping online researchers understand user’s behaviors deeply. Since user’s personality traits are the driving force of user’s behaviors, hence, in this paper, along with social network features, we first extract linguistic features, emotional statistical features, and topic features from user’s Facebook status updates, followed by quantifying importance of features via Kendall correlation coefficient. And then, on the basis of weighted features and dynamic updated thresholds of personality traits, we deploy a novel adaptive conditional probability-based predicting model which considers prior knowledge of correlations between user’s personality traits to predict user’s Big Five personality traits. In the experimental work, we explore the existence of correlations between user’s personality traits which provides a better theoretical support for our proposed method. Moreover, on the same Facebook dataset, compared to other methods, our method can achieve an F1-measure of 80.6% when taking into account correlations between user’s personality traits, and there is an impressive improvement of 5.8% over other approaches. 1. Introduction As a new medium for information dissemination, social network has become a novel means of social interactions. Hence, user’s individual behaviors have gradually turned into the key factors in social networks analysis. Besides, although some users post their desirable images and lives onto social media to achieve self-presentation which reflect some sort of “untrue” self, users’ contributions and activities, which can be instantly made available to entire social network [1], still provide a valuable insight into individual behaviors. Psychologists believed that user’s personality traits were the driving force of user’s behaviors, and individual differ- ences in personality traits may have an impact on user’s online activities [2, 3]. Due to the accessibility of user’s personality traits features, in order to avoid basic rights violation or discrimination, some researchers carried out experiments with users’ psychological features and profiles which were recorded for research purposes with users’ consent to make a deep understanding of online social networks via user’s personality traits. User’s personality traits can be used to predict early adoption about Facebook [4]: a conscientious person has sparing use of Facebook, an extroverted person has long sessions and abundant friendships, and a neurotic person has high frequency of sessions. Moreover, user’s per- sonality traits may help optimize search results [5], manifest social influence [6], and distinguish individuals who have some common properties in the crowd [7]. It also plays an important role in relationship outcomes (customer trust, satisfaction, and loyalty) [8]. To sum up, user’s personality traits recognition has an important theoretical significance to mine user’s behavior patterns and get user’s potential needs under different contexts. Hence, analyzing and forecasting user’s personality traits through mining data from online social networking sites have become a research focus. Our work on predicting user’s personality traits is motivated by its broad application prospect. Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2015, Article ID 472917, 14 pages http://dx.doi.org/10.1155/2015/472917

Upload: others

Post on 30-Apr-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article A Novel Adaptive Conditional Probability ...downloads.hindawi.com/journals/mpe/2015/472917.pdf · Research Article A Novel Adaptive Conditional Probability-Based

Research ArticleA Novel Adaptive Conditional Probability-Based PredictingModel for Userrsquos Personality Traits

Mengmeng Wang12 Wanli Zuo12 and Ying Wang123

1College of Computer Science and Technology Jilin University Changchun 130012 China2Key Laboratory of Symbolic Computation and Knowledge Engineering Ministry of Education Jilin UniversityChangchun 130012 China3College of Mathematics Jilin University Changchun 130012 China

Correspondence should be addressed to Ying Wang 726854768qqcom

Received 13 March 2015 Accepted 21 June 2015

Academic Editor Antonino Laudani

Copyright copy 2015 Mengmeng Wang et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

With the pervasive increase in social media use the explosion of usersrsquo generated data provides a potentially very rich source ofinformation which plays an important role in helping online researchers understand userrsquos behaviors deeply Since userrsquos personalitytraits are the driving force of userrsquos behaviors hence in this paper along with social network features we first extract linguisticfeatures emotional statistical features and topic features from userrsquos Facebook status updates followed by quantifying importanceof features via Kendall correlation coefficient And then on the basis of weighted features and dynamic updated thresholds ofpersonality traits we deploy a novel adaptive conditional probability-based predicting model which considers prior knowledgeof correlations between userrsquos personality traits to predict userrsquos Big Five personality traits In the experimental work we explorethe existence of correlations between userrsquos personality traits which provides a better theoretical support for our proposed methodMoreover on the same Facebook dataset compared to othermethods ourmethod can achieve an F1-measure of 806when takinginto account correlations between userrsquos personality traits and there is an impressive improvement of 58 over other approaches

1 Introduction

As a new medium for information dissemination socialnetwork has become a novel means of social interactionsHence userrsquos individual behaviors have gradually turned intothe key factors in social networks analysis Besides althoughsome users post their desirable images and lives onto socialmedia to achieve self-presentation which reflect some sort ofldquountruerdquo self usersrsquo contributions and activities which canbe instantly made available to entire social network [1] stillprovide a valuable insight into individual behaviors

Psychologists believed that userrsquos personality traits werethe driving force of userrsquos behaviors and individual differ-ences in personality traitsmay have an impact on userrsquos onlineactivities [2 3] Due to the accessibility of userrsquos personalitytraits features in order to avoid basic rights violation ordiscrimination some researchers carried out experimentswith usersrsquo psychological features and profiles which wererecorded for research purposes with usersrsquo consent to make

a deep understanding of online social networks via userrsquospersonality traits Userrsquos personality traits can be used topredict early adoption about Facebook [4] a conscientiousperson has sparing use of Facebook an extroverted personhas long sessions and abundant friendships and a neuroticperson has high frequency of sessions Moreover userrsquos per-sonality traits may help optimize search results [5] manifestsocial influence [6] and distinguish individuals who havesome common properties in the crowd [7] It also playsan important role in relationship outcomes (customer trustsatisfaction and loyalty) [8] To sum up userrsquos personalitytraits recognition has an important theoretical significance tomine userrsquos behavior patterns and get userrsquos potential needsunder different contexts Hence analyzing and forecastinguserrsquos personality traits through mining data from onlinesocial networking sites have become a research focus Ourwork on predicting userrsquos personality traits is motivated byits broad application prospect

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015 Article ID 472917 14 pageshttpdxdoiorg1011552015472917

2 Mathematical Problems in Engineering

Nevertheless confronted with the same problem as statedin [9] accessibility of userrsquos personality traits features mayvary Consequently userrsquos personality traits are by naturedifficult to predict Generally psychologists regarded person-ality traits which were reflected in userrsquos attitudes towardsthings and actions taken by user [10] as a personrsquos uniquepattern of long-term thoughts emotions and behaviors [1112] Therefore leveraging long-term mode of expression andemotion behind numerous contents that a user posts topredict hisher personality traits can be a feasible measureFurthermore users who have diverse personality traits tendto pay attention to different types of things in such ascenario we expect that topics a user is concerned aboutwould enhance the performance of userrsquos personality traitsprediction Based on the research train of thought in thispaper we propose a new adaptive conditional probability-based framework for userrsquos personality traits prediction Ourmain contributions are summarized next

(1) Demonstrate the existence of interdependenciesbetween userrsquos personality traits

(2) On account of social network features linguistic fea-tures emotional statistical features and topic featureswe put forward a novel unsupervised adaptive condi-tional probability-based framework for the problemof predicting userrsquos personality traits through takingprior knowledge of correlations between userrsquos per-sonality traits into consideration

(3) Exploit correlations between features and userrsquos per-sonality traits via Kendall correlation coefficient so asto quantify importance of each feature

(4) Update threshold of each personality trait dynami-cally rather than adopt a unified threshold

The rest of the paper is organized as follows Section 2describes the related work Section 3 defines the method wepropose details of the experimental results and dataset whichis used in this study are given in Section 4 finally conclusionappears in Section 5

2 Related Work

As research on userrsquos behaviors in social networks has becomea hot spot userrsquos personality recognition has received asignificant amount of attention in both theory and practiceArgamon et al [13] and Mairesse et al [14] were first dedi-cated to this research field Generally there were two mainapproaches adopted for studying userrsquos personality traits insocial networks Merely based on social network activitiesone was using machine learning algorithms to capture userrsquospersonality traits Moore and McElroy [15] explored userrsquospersonality traits through questionnaires and log data of 219college students and scale reliabilities for the five personalitydimensions of the IPIPwere acceptablewithCronbachrsquos alphavalues of 090 for extraversion 081 for agreeableness 082for conscientiousness 083 for emotional stability and 079for openness to experience which revealed that mining userrsquospersonality traits in Facebook was feasible Kosinski et al

[5] first presented that there were psychologically meaningfullinks between usersrsquo personalities their website preferencesand Facebook profile features and then predicted individ-ualrsquos personality traits via multivariate linear regression Theexperimental results indicated that extroversion was mosthighly expressed by Facebook features followed by neuroti-cism conscientiousness and openness Agreeableness wasthe hardest trait to predict (005 in terms of accuracy) usingFacebook profile features and the simple model

The other one extended personality-related features withlinguistic cues On the basis of the corpus which was derivedfrom essays written by students at the University of Texasat Austin [16] Argamon et al [13] utilized SMO [17] todetermine whether each author had high or low neuroticismor extraversion respectively For neuroticism their experi-mental results have shown clearly the usefulness of functionallexical features in particular the appraisal lexical taxonomywhile in the case of extraversion experimental results wereless clear but examination of indicative function wordspointed the way to developing more effective features byfocusing on expressions related to norms (in)completenessand (un)certainty Golbeck et al [18] explored whether thepublicly available information on usersrsquo Facebook profile canpredict personality traits In order to predict personality traitsof 279 Facebook users based on linguistic structural andsemantic features they used the profile data as a featureset and trained two machine learning algorithms m5suprsquoRules [19] and Gaussian Processes [20] to predict each ofthe five personality traits within 11 of their actual valueIn addition the experimental results have shown that userrsquosBig Five personality traits can be predicted from the publicinformation they shared on Facebook

Mairesse et al [14] leveraged classification regressionand ranking models to recognize personality traits via LIWCand MRC features automatically And experiments werecarried out with the essays corpus and the EAR corpusrespectively The results revealed that the LIWC featuresoutperformed the MRC features for every trait and theLIWC features on their own always performed slightly betterthan the full feature set Concerning the algorithms itcan be found that AdaboostM1 [17] performed the bestfor extraversion (563 correct classifications) while SMOproduced the best models for all other traits They alsopointed out that features were likely to vary dependingon the source of language and the method of assessmentof personality through analyzing the correlations betweendifferent characters In order to forecast RenRenrsquos 335 usersrsquopersonality traits according to the number of userrsquos friendsand a state recently released Bai et al [21] used manyclassification algorithms such as Naive Bayesian (NB) [17]Support VectorMachine (SVM) and Decision Tree [17] Andthey found out that C45 Decision Tree can get the bestresults (0697 for agreeableness 0749 for neuroticism 0824for conscientiousness 0838 for extraversion and 0811 foropenness in terms of 1198651-measure) Oberlander and Nowson[22] achieved better results (ranking on raw accuracy agree-ableness gt conscientiousness gt neuroticism gt extraversionthe best agreeableness accuracy was 304 absolute overthe baseline (772 relative)) on classification of personality

Mathematical Problems in Engineering 3

traits through leveraging Native Bayes model on account ofdiffering sets of n-gram features They also demonstratedthat with respect to feature selection policies automaticselection generally outperformed ldquomanualrdquo selection On thebasis of automatically derived psycholinguistic and mood-based features of a userrsquos textual messages Nguyen et al[23] utilized SVM classifier for examining two two-classclassification problems influential versus noninfluential andextraversion versus introversion They experimented withthree subcorpora of 10000 users each and presented the mosteffective predictors for each category The best classificationresult at 80 was achieved using psycholinguistic featuresHowever they did not predict personality traits in finer grainBai et al [24] proposed a multitask regression algorithm andan incremental regression algorithm to predict usersrsquo BigFive personality traits from their usages of Sina microblogobjectively The results indicated that personality traits canbe predicted in a high accuracy through online microblogusages Besides the average mean absolute error of multitaskregression model was 1384 which got about 5 percentagepoints reduction compared to incremental regression Sunand Wilson [25] demonstrated that without any significantaddition or modification a cognitive architecture can serveas a generic model of personality traits Besides integratingpersonality modeling with generic computational cognitivemodeling was shown to be feasible and useful

However some drawbacks can be pointed out in pre-vious work on userrsquos personality traits prediction (1) someresearchers have made an assumption that there had beenlittle or no correlations between userrsquos personality traits[26] however the massive approaches have been exploredto examine personality psychology and have revealed thecorrelations between the Big Five dimensions instead of littleor no correlations [27ndash30] (2) Although different featuresplayed different roles in predicting userrsquos personality traits[18 31 32] only a few researchers considered the correlationsbetween features and personality traits [33] (3) In multilabellearning task such as PT5 method proposed by Tsoumakasand Katakis [34] thresholds were usually unified to the samevalue which was not appropriate In this light we propose anadaptive conditional probability-based model to improve theperformance of userrsquos personality traits prediction task

3 Adaptive Conditional Probability-BasedPredicting Model for Userrsquos PersonalityTraits

In this section we present predicting model adopted inour work Initially we make a definition of preliminary fea-tures (Section 31) followed by measurement of distributingweights to characteristics (Section 32) And then we outlineframework of adaptive conditional probability-based userrsquospersonality traits prediction model (Section 33) Finallyin Section 34 we depict our algorithm and analyze timecomplexity of our proposed model

31 Definition of Features As a personrsquos unique pattern oflong-term thoughts emotions and behaviors personality

traits are reflected in userrsquos attitudes towards things andactions taken by user Therefore aside from social networkfeatures which were provided in Facebook dataset [35]directly we introduced linguistic features emotional statis-tical features and topic features to predict userrsquos personalitytraits Linguistic features emotional statistical features andtopic features can bemeasured via analysis of userrsquos Facebookstatus updates

311 Social Network Features Facebook dataset includesseven social network features namely date of userrsquos registernetwork size ego betweenness centrality normalized egobetweenness centrality density brokerage normalized bro-kerage and transitivity which reflect userrsquos behavior patternsjust through userrsquos network structure Similar to the clusterassumption [36] in this work we took the above features intoconsideration to complete prediction task and assumed thatthe more similar were the network structures between twousers the more likely they shared the same label

312 Linguistic Features Since each user showed a particularmode of expression some researchers held the view thatcorrelations between personality traits and spoken or writtenlinguistic cues were significant [14 16] so language-basedassessments can constitute valid personality measures [28]Hence in this paper we regarded linguistic cues as factors soas to mine userrsquos personality traits through userrsquos means ofexpression

A natural language parser is used to work out gram-matical structure of sentences such as grouping wordstogether as ldquophrasesrdquo and obtaining subject or object of averb Probabilistic parsers try to produce the most likelyanalysis of new sentences via leveraging knowledge of lan-guage gained from hand-parsed sentences Stanford Parser(httpnlpstanfordedusoftwarelex-parsershtmlAbout) isa probabilistic natural language open source parser whichimplements a factored product model with separate PCFGphrase structure and lexical dependency experts whosepreferences are combined by efficient exact inference usingan Alowast algorithm Either of these yields a good performancestatistical parsing system [37] A GUI is provided for utilizingit simply as an accurate unlexicalized stochastic context-freegrammar parser and viewing the phrase structure tree outputof the parser Thus in order to learn traits of contents thatuser yields we got word frequency statistics of 35 kinds ofparts of speech with Stanford Parser that were conjunctionnumeral determiner existential there foreign word modalauxiliary singular or mass noun plural noun proper nounplural proper noun predeterminer genitive marker personalpronoun plural personal pronoun ordinal adverb compara-tive adverb superlative adverb particle symbol interjectionverb in base form verb in past tense gerund verb in pastparticiple verb in present tense (not 3rd person singular)verb in present tense (3rd person singular) subordinat-ing conjunction ordinal adjective comparative adjectivesuperlative adjective list itemmarkerWH-determinerWH-pronoun WH-plural pronoun and WH-adverb Besideswe defined another six linguistic features the total numberof words and frequency statistics of punctuation comma

4 Mathematical Problems in Engineering

period exclamation and question Nevertheless we filteredhyperlinks in usersrsquo contents as blog services tended to pro-duce many hyperlinks for navigation and advertisement [38]which had no relationship with personality traits prediction

313 Emotional Statistical Features Userrsquos attitudes towardsthings which show userrsquos different personality traits reflectuserrsquos unique pattern of long-term emotions For instancea neurotic person may have a tendency to experienceunpleasant emotions easily such as anger anxiety depressionand vulnerability Hence userrsquos statistics of emotion can becharacteristics in userrsquos personality traits predicting modelIn this paper userrsquos emotional statistical characteristicsincluded proportion of positive words and negative wordsused in userrsquos posts On the basis of adjectives and theirvariants obtained in Section 312 userrsquos emotional statisticalcharacteristics were calculated with the corpus of HowNetKnowledge (httpwwwkeenagecomdownloadsentimentrar) HowNet Knowledge which includes 8945 words andphrases consists of six files positive emotional words list filenegative emotional words list file positive review words listfile negative review words list file degree level words list fileand proposition words list file

As userrsquos emotional statistical features userrsquos positive andnegative emotional statistical characteristics are defined as

PT (119894) =119901119899119894

sum119894

NT (119894) =119899119899119894

sum119894

(1)

where PT(119894) andNT(119894) represent proportion of positivewordsand negative words used in user 119894rsquos posts respectively119901119899

119894and

119899119899119894represent the number of positive emotional words and

the number of negative emotional words user 119894 used whichare included in HowNet Knowledge and sum

119894represents the

total number of words in user 119894rsquos contents

314 Topic Features The things user focuses on may have animpact on actions that user has taken Take openness which isone of the Big Five personality traits as an example it reflectsdegree of intellectual curiosity creativity and a preference fornovelty and variety a person hasTherefore wemined a seriesof userrsquos concerned themes from userrsquos status via LDA (LatentDirichlet Allocation) [39] for predicting userrsquos personalitytraits

Since our purpose is to extract all concerned themes of auser rather than to extract specific themes of each post wemerged all microblogs of a user into one document and thenextracted userrsquos concerned themes namely each documentcorresponded to a user The results of LDA model are shownas follows

DT =[[[[[[

[

DT11 DT12 sdot sdot sdot DT1119899

DT21 DT21 sdot sdot sdot DT2119899

d

DT1198981 DT

1198982 sdot sdot sdot DT119898119899

]]]]]]

]

(2)

whereDT stands for an119898times119899matrix which ismainly used forstorage of distribution of themes and documents 119898 standsfor the number of documents 119899 stands for the number ofthemes and element DT

119894119895(119894 = 1 2 119898 and 119895 = 1 2 119899)

in matrix DT stands for probability that the 119894th documentbelongs to the 119895th theme that is degree of attention whichuser 119894 pays to the 119895th theme

32 Weight Distribution of Features Every feature has adifferent impact on userrsquos personality traits prediction It isof great importance to allocate featuresrsquo weights reasonablyso as to be able to perform good prediction based onlyon scant knowledge of personality traits Kendall test is anonparametric hypothesis test which calculates correlationcoefficient to test statistical dependence of two randomvariables Since values of features and scores of personalitytraits in the dataset we used were numeric therefore inorder to quantify importance of each feature we analyzedrelevance between userrsquos personality traits and characteristicsvia Kendall correlation coefficient in which values of featuresand scores of personality traits were treated as two randomvariables Kendall correlation coefficient is calculated as

120591 (119883 119884) =119862 minus 119863

radic(1198733 minus 1198731) times (1198733 minus 1198732) (3)

where 120591(119883 119884) isin [minus1 1] and if random variables 119883 and 119884have a positive correlation then 120591(119883 119884) = 1 else if randomvariables119883 and 119884 have a negative correlation then 120591(119883 119884) =minus1 else if random variables119883 and 119884 are independent of eachother then 120591(119883 119884) = 0 119862 represents the number of tupleswhose two random variables have consistency 119863 representsthe number of tupleswhose two randomvariables donot haveconsistency and 119873

1and 119873

2represent the total number of

repeated elements in random variables119883 and 119884 respectivelyThe calculation of1198731 is shown as

1198731 =119904

sum

119894=1

12119880119894(119880119894minus 1) (4)

where 119904 denotes the number of elements that have repeatedelements in random variable 119883 and 119880

119894denotes the number

of repeated elements of the 119894th element Similarly 1198732is

calculated as

1198732 =1199041015840

sum

119895=1

12119880119895(119880119895minus 1) (5)

where 1199041015840 denotes the number of elements that have repeatedelements in random variable 119884 and 119880

119895denotes the number

of repeated elements of the 119895th element1198733denotes the total

number of merged sequences which is calculated as

1198733 =12119873(119873minus 1) (6)

Mathematical Problems in Engineering 5

Facebook

Social networkfeatures set

Userrsquosstatus set

Analyze correlations

betweenfeatures

and personality

traits

Linguistic features set

Emotional statistical features set

Topic features set

Assign weights

tofeatures

neighbors set

Analyzecorrelations

betweenpersonality

traits

Initializethresholds of personality

traits

Userrsquos personality traits set

Updatethresholds of personality traits

Calculateerror rates of personality traits

Predict userrsquos personality traits set

Minimum error rates

No

Yes

Get k nearest

Figure 1 The architecture of adaptive conditional probability-based predicting model for userrsquos personality traits

where119873 represents dimensions of tuples Thus according toKendall correlation coefficient between features and person-ality traits importance of the 119894th feature 119891

119894is calculated as

119868 (119891119894) =

sum119901119895isin119875120591 (119891119894 119901119895)

5

(7)

where 120591(119891119894 119901119895) stands for Kendall correlation coefficient

between the 119894th feature 119891119894and the 119895th personality trait 119901

119895and

119875 stands for set of Big Five personality traits which will beintroduced in Section 34 After normalizing importance ofthe 119894th feature 119891

119894 weight of 119891

119894is calculated as follows

119882(119891119894) =

119868 (119891119894)

sum119891119895isin119865119895 =119894

119868 (119891119895)

(8)

where 119865 denotes set of personality traits predicting features

33 A Framework of Predicting Userrsquos Personality TraitsFigure 1 shows the architecture of adaptive conditionalprobability-based predicting model for userrsquos personalitytraits An analysis of correlations between userrsquos personalitytraits and features is conducted before assigning weights tolinguistic emotional statistical and topic features which are

extracted from contents that user produces as well as socialnetwork features Meanwhile correlation analysis of userrsquospersonality traits is carried on Then according to weightsof features 119896 nearest neighbors set is obtained Finallygiven 119896 nearest neighbors set dynamic updated thresholds ofpersonality traits and correlations between personality traitsuserrsquos personality traits set is predicted

34 Algorithm of Adaptive Conditional Probability-BasedUserrsquos Personality Traits Prediction Before we proposed ourpredicting algorithm we conducted experiments to analyzecorrelations between userrsquos personality traits which will beshown in detail later in Section 42 And the results ofcorrelation analysis revealed that there were interdependentrelationships between userrsquos personality traits Thus givendynamic updated thresholds of personality traits as wellas considering interdependencies between personality traitsa novel adaptive correlation-based predicting model wasproposed

In psychology the Big Five personality traits are fivebroad domains or dimensions of personality traits that areused to describe human personality The theory based onthe Big Five factors is called Five Factor Model (FFM) [40]The Big Five factors are extraversion (denoted by EXT)

6 Mathematical Problems in Engineering

neuroticism (denoted by NEU) agreeableness (denoted byAGR) conscientiousness (denoted by CON) and openness(denoted by OPN)

Our method aimed to predict user 119894rsquos Big Five personalitytraits set 119875119879

119894 As stated in [41] here each personality trait

was classified into two degrees namely extraversion wasmapped onto extravert and shy neuroticism was mappedonto neurotic and secure agreeableness was mapped ontofriendly and uncooperative conscientiousness was mappedonto precise and careless and openness was mapped ontoinsightful and unimaginative For convenience we denotedthe two degrees of each personality trait by positive leveland negative level in short respectively First we set initialthreshold of each personality trait as 05 and threshold ofthe 120579th personality trait 119901

120579is denoted by 119879(119901

120579) followed by

calculating distance between user 119894 and other users with (9)as follows

Dis (119894 119895) =119898

sum

ℎ=1119882(119891ℎ) times1003816100381610038161003816119865 (119894 119891ℎ) minus 119865 (119895 119891ℎ)

1003816100381610038161003816 (9)

where Dis(119894 119895) presents distance between user 119894 and user 119895119865(119894 119891

ℎ) and 119865(119895 119891

ℎ) present the ℎth feature 119891

ℎrsquos value of user

119894 and user 119895 and119898 presents the number of featuresSecondly we sorted user 119894rsquos distance set in ascending

order and selected top 119896 users as user 119894rsquos neighbors (denotedas119873(119894)) Then for each 119901

120579 we recorded the number of users

who had positive level of 119901120579in119873(119894) as 119899

119901120579

Thirdly we selected a 119901

120579from unvisited personality traits

set UP if user 119894rsquos personality traits set PT119894was empty and then

we calculated probability that user 119894 had positive level of 119901120579as

follows

119891 (119894 119901120579) =

119875 (119867119901120579

| 119862119899119901120579

119901120579

)

119875 (sim 119867119901120579

| 119862119899119901120579

119901120579

)

(10)

where119862119899119901120579

119901120579

presents event that there are exactly119862119899119901120579

119901120579

instanceshaving positive level of 119901

120579in119873(119894) whose 119896 nearest neighbors

also have 119899119901120579

users with positive level of 119901120579 119867119901120579

and sim 119867119901120579

present event that user 119894 has positive level of 119901120579and event that

user 119894 has negative level of 119901120579 respectively And if user 119894rsquos per-

sonality traits set PT119894was not empty considering correlations

between personality traits we calculated probability that user119894 had positive level of 119901

120579on the basis of prior knowledge

about correlations between119901120579and personality traits that have

already been visited in UP as follows

119891 (119894 119901120579) =

119875 (119867119901120579

| 1199011198971 1199011198972 ) 119875 (119862

119899119901120579

119901120579

| 119867119901120579

)

119875 (sim 119867119901120579

| 1199011198971 1199011198972 ) 119875 (119862

119899119901120579

119901120579

| sim 119867119901120579

)

(11)

where 1199011198971 1199011198972 isin PT

119894present personality traits which have

already been visited in UP If 119891(119894 119901120579) was greater than or

equal to 119879(119901120579) then positive level of 119901

120579was added to PT

119894

else negative level of 119901120579was added to PT

119894

And then we calculated error rate of each personalitytrait and error rate of 119901

120579is denoted by Err(119901

120579) which is

calculated as

Err (119901120579) = 1minus

119903120579

to (12)

where 119903120579denotes the number of instances which are classified

correctly to denote the total number of instancesFinally if unvisited personality traits set UP was empty

and error rate of each personality trait fell within a specifiedrange then algorithm was terminated else if unvisited per-sonality traits set UP was not empty then we continued topredict another personality trait in UP else if there was apersonality trait 119901

120579whose error rate did not reach defined

limits then we updated 119879(119901120579) with (12) and added it to

personality traits set UP to predict it again

119879 (119901120579) = 119879 (119901

120579) + 120576 timesErr (119901

120579) (13)

where 120576 denotes a monotonic decreasing learning rate Ouralgorithm can now be defined as Algorithm 1 Assumethat size of dataset is 119879119903 dimension of features is 119889 andsize of the nearest neighbors set is 119896 The complexity ofconditional probability-based predicting model for userrsquospersonality traits is analyzed as follows From step (10) tostep (12) computation time of Kendall correlation coefficientbetween features and personality traits is taking119874(1198791199032) timecalculating weights of features takes 119874(119889) time and thendistributing weights to features can be computed in 119874(119889119879119903)time the total time is119874(1198791199032) as dimension of features 119889 is farless than size of dataset119879119903 Calculating distance between user119894 and other users from step (13) to step (15) will take 119874(119889119879119903)time In step (16) it will take 119874(119879119903log2119879119903) time for sortingset of distances between user 119894 and other users Step (18) tostep (20) take 119874(119896) time to select 119896 nearest neighbors of user119894 If algorithm goes from step (23) to step (25) it will take119874(1198962

) time Else if algorithm goes from step (26) to step (28)it will take 119874(1198791199031198962) time It takes 119874(119879119903) time from step (37)to step (45) Assume that the number of iterations is 119905 Sincesteps (10) (11) and (12) can be done offline and do not needto be calculated repeatedly every time hence from step (13)to step (52) the overall online complexity of our proposedmethod is 119874(119879119903log

2119879119903) as 119896 and 119905 are far less than 119879119903 From

here we see that our proposed method is feasible in a big dataenvironment as in Facebook monitoring

4 Experimental Evaluation

In this section we first describe dataset used in our experi-ments And then we analyze the interdependent relationshipsbetween userrsquos personality traits Finally we conduct experi-ments on different kinds of features and make a comparisonwith other methods based on the same dataset

41 Dataset MyPersonality (httpmypersonalityorg) is apopular Facebook application which is a 336-question testmeasuring the 30 facets underlying the Big Five traits thatallows users to take real psychometric tests The respondentswho come from various age groups backgrounds and cul-tures are highly motivated to answer honestly and carefullyAdditionally usersrsquo psychological and Facebook profilesare ldquoverifiedrdquo by individualsrsquo social circles hence it is hardto lie to the people that know you best Also there is nopressure and one does not have to share hisher profileinformation thus myPersonality avoids deliberate faking

Mathematical Problems in Engineering 7

Input user 119894 users set 119880 size of the nearest neighbors set 119896 unvisited personality traits set UPOutput user 119894rsquos personality traits set PT

119894

(1) 119891 larr 119891119886119897119904119890

lowast 119891 stands for a flag if any of the thresholds of personality traits is updated then 119891 equals to trueotherwise 119891 equals to false lowast

(2) For each personality trait 119901120579isin UP do

(3) set temporary error rate Temp(119901120579) as 0

(4) End for(5) For each personality trait 119901

120579isin UP do

(6) set initial threshold 119879(119901120579) as 05

(7) End for(8) PT

119894larr 0

(9) UPlarr all of the Big Five personality traits(10) For each feature 119891

ℎisin user 119894rsquos feature set do

(11) calculate 119891ℎrsquos weight according to (8)

(12) End for(13) For user 119895 isin 119880 and 119895 = 119894 do(14) calculate Dis(119894 119895) according to (9)(15) End for(16) sort Dis(119894 1)Dis(119894 2) in ascending order(17) 119873(119894) larr select top 119896 users in Dis(119894 1)Dis(119894 2) (18) For each personality trait 119901

120579isin UP do

(19) 119899119901120579

larr the number of users who have positive level of personality trait 119901120579in119873(119894)

(20) End for(21) While UP = 0 do(22) 119901

120579larr UPpoll()

(23) If PT119894= 0 then

(24) calculate 119891(119894 119901120579) according to (10)

(25) End if(26) Else if PT

119894= 0 then

(27) calculate 119891(119894 119901120579) according to (11)

(28) End if(29) If 119891(119894 119901

120579) ge 119879(119901

120579) then

(30) 119875119879119894larr positive level of 119901

120579

(31) End if(32) Else if 119891(119894 119901

120579) lt 119879(119901

120579) then

(33) PT119894larrnegative level of 119901

120579

(34) End if(35) End while(36) 119891 larr 119891119886119897119904119890

(37) For each personality trait 119901120579isin PT119894do

(38) calculate 119901120579rsquos error rate Err(119901

120579) according to (12)

(39) If |Err(119901120579) minus Temp(119901

120579)| gt 120575 then 120575 denotes a constant which is small enough

(40) 119891 larr 119905119903119906119890

(41) Temp(119901120579) larr Err(119901

120579)

(42) update 119901120579rsquos threshold 119879(119901

120579) according to (13)

(43) UPlarr 119901120579

(44) End if(45) End for(46) If 119891 == 119905119903119906119890 then(47) Go to step (21)(48) End if(49) Else if 119891 == 119891119886119897119904119890 then(50) Go to step (52)(51) End if(52) Return PT

119894

Algorithm 1 Adaptive conditional probability-based predicting model for userrsquos personality traits

8 Mathematical Problems in Engineering

of data With consent usersrsquo psychological and Facebookprofiles are recorded for research purposes Currently thedatabase contains more than 6000000 test results togetherwith more than 4000000 individual Facebook profilesIn this paper we adopted the dataset that was provided inthe Workshop on Computational Personality Recognition(Shared Task) (httpmypersonalityorgwikilibexefetchphpmedia=wikimypersonality finalzip) [35] The datasetwas a subset of myPersonality database including 250users which had both information about personality traits(annotated with usersrsquo personality scores and gold standardpersonality labelswhichwere self-assessments obtained usinga 100-item long version of the IPIP personality questionnaire(httpipiporiorgnewFinding Labeling IPIP Scaleshtm))and social network structure (network size betweennesscentrality density brokerage and transitivity) along withtheir 9900 status updates in raw text With the aid of thisstandard dataset we can compare the performance of ourproposed method with othersrsquo personality recognitionsystems on a common benchmark

42 Analysis of Correlations between Personality Traits Pre-vious works were usually predicting userrsquos personality traitswithout considering interdependencies between them In thiscontext we investigatedwhether there had been relationshipsbetween userrsquos personality traits Since not all users had thesame level of interactions consequently we first groupedusers according to different degrees of userrsquos personalitytraits As an example if a user has positive level of agree-ableness another user has positive level of agreeableness aswell then they will be divided into a group else if a user haspositive level of agreeableness another user has negative levelof agreeableness and they will not be divided into a groupThen we explored cooccurrences between personality traitsin each group simultaneously which is shown in Figure 2

It can be observed intuitively that a significant proportionof users have negative level of extraversion positive levelof openness or positive level of agreeableness It is alsonoteworthy that positive level of extraversion has less overlapwith negative level of openness negative level of consci-entiousness and negative level of agreeableness Besidespositive level of neuroticism has less overlap with positivelevel of conscientiousness positive level of agreeableness andnegative level of openness Furthermore there are bits ofusers that have positive level of extraversion and positive levelof neuroticism simultaneously

However the above cooccurrences may be due to the apriori statistics of each trait For instance if there are moreusers with negative level of neuroticism positive level ofopenness and positive level of agreeableness than othersit is more likely to have more users with cooccurrencebetween negative level of neuroticism and positive level ofopenness negative level of neuroticism and positive level ofagreeableness and positive level of openness and positivelevel of agreeableness than others as it is shown in Figure 2Therefore since users were grouped according to differentdegrees of userrsquos personality traits in order to investigatewhether the above phenomenon meant that there werepositive or negative correlations between different degrees

of personality traits in each group we treated scores ofa certain personality trait according to which users weredivided into the group and scores of another personalitytrait with a certain degree as two random variables followedby employing Kendall correlation coefficient which wascalculated in (3) to analyze real correlations between themThe results are presented in Table 1

As it can be seen from Table 1 we can observe anegative correlation between negative level of extraversionand positive level of neuroticism in other words people thatscore high on extraversion have less possibility to score highon neuroticism as well In psychology a person with negativelevel of extraversion can be explained as solitary and a personwith negative level of conscientiousnessmeans being carelessConsequently negative level of extraversion and negativelevel of conscientiousness are not in line with positive level ofneuroticism which tends to experience unpleasant emotionseasily Another factor is social desirability Moreover agree-ableness extraversion conscientiousness and openness aremore or less ldquodesirablerdquo whereas neuroticism is quite clearlynegative Thus generally there are negative correlationsbetween neuroticism and other personality traits

What is more it is inconsistent with Figure 2 that thereis a positive correlation between positive level of neuroticismand positive level of extraversion It may be explained thata person with positive level of extraversion tends to beenthusiastic and talkative which is compatible with positivelevel of neuroticism In addition in line with Figure 2since negative level of openness shows cautiousness it hasa positive correlation with negative level of extraversionand positive level of agreeableness which is a tendency tobe compassionate However there is a positive correlationbetween negative level of openness and positive level ofneuroticism which is not in keeping with Figure 2

Since there are some contradictions between Figure 2and Table 1 we leveraged Jensen-Shannon divergence [45]to further analyze correlations between userrsquos personalitytraits Jensen-Shannon divergence between two probabilitydistributions is calculated as

119863JS (119875 119876) =12(119863KL (119875 119877) +119863KL (119876 119877)) (14)

where 119875 and 119876 denote two probability distributions 119877denotes an average distribution of 119875 and 119876 and 119863KL(119875 119877)denotes Kullback-Leibler divergence [46] between 119875 and 119877which is calculated with

119863KL (119875 119877) = sum119894

119875 (119894) log 119875 (119894)119877 (119894)

(15)

where 119875(119894) and 119877(119894) denote the 119894th value of 119875 and 119877 Theresults are presented in Table 2

In keeping with Figure 2 and Table 1 Jensen-Shannondivergences between negative level of neuroticism and posi-tive level of agreeableness positive level of conscientiousnessand positive level of openness are larger It is because ofthe fact that a person with negative level of neuroticismshows confidence on everything and positive level of open-ness reflects degree of intellectual curiosity creativity and

Mathematical Problems in Engineering 9

Table 1 Kendall correlation coefficients between personality traits 119901 and 119899 stand for positive level and negative level of a certain personalitytrait The bold number is the highest Kendall correlation coefficients and the italic number is the lowest Kendall correlation coefficients Thebigger Kendall correlation coefficient is the better the consistency is

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 022 minus003 009 minus015 minus006 minus013 007 001119899 minus023 minus018 004 004 004 018 007 022

NEU 119901 minus009 minus017 010 minus021 minus003 027119899 minus014 minus013 002 minus001 minus006 minus010

AGR 119901 minus007 007 016 029119899 minus008 016 006 002

CON 119901 minus001 minus010119899 007 minus007

OPN 119901

119899

PEXT and PNEUPEXT and PAGR

PEXT and PCONPEXT and POPNPEXT and NNEUPEXT and NAGRPEXT and NCONPEXT and NOPNPNEU and PAGR

PNEU and PCONPNEU and POPNPNEU and NEXTPNEU and NAGRPNEU and NCONPNEU and NOPNPAGR and PCONPAGR and POPNPAGR and NEXTPAGR and NNEUPAGR and NCONPAGR and NOPNPCON and POPNPCON and NEXTPCON and NNEUPCON and NAGRPCON and NOPNPOPN and NEXTPOPN and NNEUPOPN and NAGRPOPN and NCONNEXT and NNEUNEXT and NAGRNEXT and NCONNEXT and NOPNNNEU and NAGRNNEU and NCONNNEU and NOPNNAGR and NCONNAGR and NOPNNCON and NOPN

Proportion ()

000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000()

Figure 2The proportion of users who have a certain pair of personality traits simultaneously PEXT PNEU PAGR PCON and POPN standfor positive level of extraversion neuroticism agreeableness conscientiousness and openness NEXT NNEU NAGR NCON and NOPNstand for negative level of extraversion neuroticism agreeableness conscientiousness and openness

10 Mathematical Problems in Engineering

Table 2 Jensen-Shannon divergence between personality traits 119901and 119899 stand for positive level and negative level of a certain personal-ity traitThe bold number is the highest Jensen-Shannon divergenceand the italic number is the lowest Jensen-Shannon divergenceSmaller Jensen-Shannon divergence means better consistency

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 058 1479 044 219 070 236 073 084119899 479 449 521 236 561 191 1043 211

NEU 119901 080 212 110 246 298 040119899 1978 294 1697 305 2473 361

AGR 119901 077 329 082 091119899 340 086 463 125

CON 119901 115 117119899 616 140

OPN 119901

119899

a preference for novelty and variety a person has alongwith positive level of agreeableness and positive level ofconscientiousness which are compatible Furthermore it isin conformity with Table 1 that positive level of extraversionhas smaller Jensen-Shannon divergences with positive levelof neuroticism and positive level of agreeableness besidespositive level of neuroticism has a smaller Jensen-Shannondivergence with negative level of openness

In addition Soto et al [29] demonstrated convergent anddiscriminate correlations for the Big Five Inventory Soto andJohn [30] also illustrated that there was strong convergencebetween each BFI facet scale and corresponding NEO PI-Rfacet Moreover Park et al [28] demonstrated convergencewith self-reports of personality at the domain- and facet-levelIn summary although utilizing different Big Five measuresit can be found that correlations in myPersonality datasetare also in keeping with the findings from other samplesHence correlations between personality traits are expectedand actually prove that myPersonality results are validSo leveraging prior knowledge about relationships betweenpersonality traits may contribute to a better prediction onpersonality traits The results provide a better theoreticalsupport for our proposed method

43 Evaluation Metric Since most researchers exploringon personality recognition leveraged different measures toevaluate their experiments on various datasets it was hard tocompletely appraise their performance and quality Howevera common dataset which was annotated with gold standardpersonality labels was available in the Workshop on Com-putational Personality Recognition (Shared Task) users werefree to split the training and test sets as they wish andprecision recall and 1198651-measure were suggested to be usedto evaluate results of predictions So in this paper for morereliable results the proposed method was also evaluated withstratified 5-fold cross-validation Since there was no trainingset in unsupervised learning the common dataset was splitinto folds and in order to avoid overfitting the same author

Table 3 Precision recall and 1198651-measure of adaptive conditionalprobability-based predicting model (in bold best performance)

Precision Recall 1198651-measureEXT 093 076 083NEU 086 067 075AGR 096 074 083CON 089 070 078OPN 091 079 084Mean 091 0732 0806

did not appear twofold at the same timeThemean precisionrecall and 1198651-measure are calculated as follows

Precision = tptp + fp

Recall = tptp + fn

1198651-measure = 2 times 119875 times 119877119875 + 119877

(16)

where tp stands for the number of positive samples thatare classified correctly fp stands for the number of positivesamples that are classified incorrectly and fn stands for thenumber of negative samples that are classified incorrectly

44 Analysis of Impacts of Different Factors First we carriedon experiments with our proposed method Scores of preci-sion recall and 1198651-measure are shown in Table 3 includingmean scores of all personality traits

441 Impacts of Different Categories of Feature Sets In thissection we conducted experiments on social network fea-tures linguistic features emotional statistical features topicfeatures and all features respectively Due to space restric-tions Figure 3 only illustrates 1198651-measure of all personalitytraits along with mean 1198651-measure of them

It can be observed that experiments which are conductedon linguistic features emotional statistical features and topicfeatures result in unsatisfactory performance with respectto social network features and social network features havethe best classification performance for extraversion which isconsistent with the conclusion in [33] Since social networkfeatures reflect a userrsquos pattern of social status and habitswhich are relatively important decisive factors in userrsquospersonality traits as a result they provide more informationthan other types of features Nevertheless whether more orless each kind of characteristic provides unique informationin userrsquos personality traits prediction Therefore carryingon experiment with combination of social network featureslinguistic features emotional statistical features and topicfeatures can achieve better performance

442 Impact of Weighted Features Additionally we con-ducted experiments on unweighted features Here we onlypresented the most successful results in Table 4 includingmean scores of all personality traits

Mathematical Problems in Engineering 11

1

featuresEssential

featuresLinguistic

featuresEmotional

featuresTopic

featuresAll

Categories of features

EXTNEUAGR

CONOPNMEAN

09

08

07

06

05

04

03

F-s

core

s

Figure 3 1198651-measures of five personality traits along with mean1198651-measures of them with different categories of feature sets

Table 4 Precision recall and 1198651-measure of predictingmodel withunweighted features (in bold best performance)

Precision Recall 1198651-measureEXT 085 076 082NEU 092 068 076AGR 094 064 076CON 091 057 068OPN 087 078 083Mean 090 0686 077

Table 5 Precision recall and 1198651-measure of predictingmodel withunified thresholds (in bold best performance)

Precision Recall 1198651-measureEXT 062 054 057NEU 080 016 025AGR 057 073 063CON 057 077 060OPN 070 100 082Mean 0652 064 0574

From Tables 3 and 4 we can observe that the best resultsfor myPersonality dataset are achieved after distributingweights to features It illustrates that considering correlationsbetween features and userrsquos personality traits can bring per-formance gain to the prediction task since there are limitedfeatures that may remain useful for userrsquos personality traitsprediction

443 Impact of Dynamic Updated Thresholds of PersonalityTraits In addition we predicted userrsquos personality traits withunified thresholds as well The results are summarized inTable 5

Multilabel learning task is to predict one or more cat-egories for each instance In the existing algorithms suchas PT5 method proposed by Tsoumakas and Katakis [34]

Table 6 Precision of different methods on Facebook dataset Thebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 093 086 096 089 091 0910Verhoevenet al [42] SVM 079 071 067 072 087 0752

Farnadi etal [33]

SVM kNNNB 058 054 050 055 060 0554

Alam et al[43]

SVM BLRmNB 058 059 059 059 060 0590

Tomlinsonet al [44] LR NA NA NA NA NA NA

generally thresholdswere unified to the same valueHoweverall categories employing a unified minimum threshold arenot appropriate If threshold is set too high not all categoriestags will be predicted on the other hand if threshold is settoo low too many classes will be predicted in results Hencein this paper we updated thresholds of personality traitsdynamically according to error rates of them Comparedto Table 3 based on thresholds obtained through updatingdynamically our method can achieve better results

45 Performance EvaluationwithDifferentMethods Asmen-tioned in Section 41 in the Workshop on ComputationalPersonality Recognition (Shared Task) different systems forpersonality recognition from text and network features ona common benchmark have been compared Verhoeven etal [42] proposed a meta-learning approach which can beextended to certain component classifiers from other genreswith other class systems or even from other languagesFarnadi et al [33] leveraged SVM 119896 nearest neighbors(kNN) [17] and Naıve Bayes (NB) for automatic recognitionof personality traits from usersrsquo Facebook statues updatesrespectively even with a small set of training dataset it couldachieve better results than most baseline algorithms On thebasis of a set of features extracted from Facebook datasetAlam et al [43] explored suitability and performance ofseveral classification techniques which were SVM BayesianLogistic Regression (BLR) [47] andMultinomial Naıve Bayes(mNB) [48] Tomlinson et al [44] used ranking algorithmsfor feature selection and Logistic Regression (LR) [49] aslearning algorithms which achieved a high performance Inthis paper we tested our proposed method with the sameset than the participants in the Workshop on ComputationalPersonality Recognition (Shared Task) In terms of precisionrecall and 1198651-measure a comparison of the works describedabove as well as our proposed method (denoted by CP) issummarized in Tables 6 7 and 8

From Tables 6 7 and 8 the experimental results indicatethat openness is the easiest personality trait to be predictedby Facebook features with the above methods because thereare more users with positive level of openness than otherpersonality traits in Facebook dataset Since Verhoeven et al[42] trained an ensemble SVM model based on both Face-book and Essays personality traits dataset as a consequence

12 Mathematical Problems in Engineering

Table 7 Recall of different methods on Facebook dataset The bestperformance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 076 067 074 070 079 0732Verhoevenet al [42] SVM 079 072 068 072 087 0756

Farnadi etal [33]

SVMkNN NB 061 053 050 054 070 0576

Alam et al[43]

SVMBLR mNB 058 058 059 059 060 0588

Tomlinsonet al [44] LR NA NA NA NA NA NA

Table 8 1198651-measure of different methods on Facebook datasetThebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 083 075 083 078 084 0806Verhoevenet al [42] SVM 079 070 067 072 086 0748

Farnadi etal [33]

SVMkNN NB 056 052 050 054 061 0546

Alam et al[43]

SVMBLR mNB 058 058 058 059 060 0586

Tomlinsonet al [44] LR NA NA NA NA NA 0630

when tested on the same Facebook dataset it can achievebetter results than our proposed method Furthermorewhen trained and tested on the same Facebook dataset ourmethod outperforms the methods in [33 43 44] which donot take weighted features dynamic updated thresholds ofpersonality traits and correlations between personality traitsinto consideration

5 Conclusion

In this paper we studied the problem of exploiting interde-pendencies between userrsquos personality traits for predictinguserrsquos personality traits First after analyzing importance offeatures we conducted experiments on Facebook datasetto demonstrate the existence of correlations between userrsquospersonality traits Bearing this in mind an unsupervisedframework adaptive conditional probability-based predict-ing model was then proposed to predict userrsquos Big Fivepersonality traits based on importance of features dynamicupdated thresholds of personality traits and prior knowledgeabout correlations between personality traits Furthermorewe compared our results with the ones achieved by othersin the Workshop on Computational Personality Recognition(Shared Task) on the same dataset In general the experimen-tal results demonstrated the effectiveness of our proposedframework

In future work we will speculate on what directions canbe undertaken to ameliorate its performance with respect

to time complexity so as to better apply it to a big dataenvironment as in Facebook monitoring Besides in orderto make our framework applicable to dynamic networksbetter we will explore combining time series analysis withpersonality traits predicting algorithm to capture dynamicevolution process of information and network structureFurthermore since social network features (described inSection 311) are specific for our dataset they may be difficult(or even impossible) to obtain in another dataset hencewe will conduct experiments on other datasets such as theTweeter corpus of PAN shared task (httppanwebisde)on author profiling where also personality is considered tofurther validate the effectiveness of the proposed frameworkbased on different features

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China under Grant no 61300148 the Scientificand Technological Break-Through Program of Jilin ProvinceunderGrant no 20130206051GX the Science andTechnologyDevelopment Program of Jilin Province under Grant no20130522112JH the Science Foundation for China Postdoc-tor under Grant no 2012M510879 and the Basic Scien-tific Research Foundation for the Interdisciplinary Researchand Innovation Project of Jilin University under Grant no201103129

References

[1] M Oussalah F Bhat K Challis and T Schnier ldquoA softwarearchitecture for Twitter collection search and geolocationservicesrdquo Knowledge-Based Systems vol 37 pp 105ndash120 2013

[2] R R McCrae and P T Costa ldquoValidation of the 5-factor modelof personality across instruments and observersrdquo Journal ofPersonality and Social Psychology vol 52 no 1 pp 81ndash90 1987

[3] G Saucier and L R Goldberg ldquoThe structure of personalityattributesrdquo in Personality and Work Reconsidering the Role ofPersonality in Organizations M R Barrick and A M RyanEds pp 1ndash29 Jossey-Bass 2003

[4] B Caci M Cardaci M E Tabacchi and F Scrima ldquoPersonalityvariables as predictors of facebook usagerdquoPsychological Reportsvol 114 no 2 pp 528ndash539 2014

[5] M Kosinski Y Bachrach P Kohli D Stillwell and T GraepelldquoManifestations of user personality in website choice andbehaviour on online social networksrdquoMachine Learning vol 95no 3 pp 357ndash380 2014

[6] M-G Cojocaru H Thille E Thommes D Nelson and SGreenhalgh ldquoSocial influence and dynamic demand for newproductsrdquoEnvironmentalModellingamp Software vol 50 pp 169ndash185 2013

[7] F Durupinar N Pelechano J Allbeck U Gudukbay andN Badler ldquoThe impact of the OCEAN personality modelon the perception of crowdsrdquo IEEE Computer Graphics andApplications vol 31 no 3 pp 22ndash31 2011

Mathematical Problems in Engineering 13

[8] X Feng ldquoStudy on customer personality characteristics andrelationship outcomes using SEM analysisrdquo in Proceedings ofthe International Conference on Mechatronics and InformationTechnology pp 841ndash844 2013

[9] J G Lee SMoon andK Salamatian ldquoModeling and predictingthe popularity of online contents with Cox proportional hazardregression modelrdquo Neurocomputing vol 76 no 1 pp 134ndash1452012

[10] J W Stoughton L F Thompson and A W Meade ldquoBigfive personality traits reflected in job applicantsrsquo social mediapostingsrdquo Cyberpsychology Behavior and Social Networkingvol 16 no 11 pp 800ndash805 2013

[11] D A Cobb-Clark and S Schurer ldquoThe stability of big-fivepersonality traitsrdquo Economics Letters vol 115 no 1 pp 11ndash152012

[12] W Mischel ldquoToward an integrative science of the personrdquoAnnual Review of Psychology vol 55 pp 1ndash22 2004

[13] S Argamon S Dhawle M Koppel and J Pennebaker ldquoLexicalpredictors 19 of personality typerdquo in Proceedings of the JointAnnual Meeting of the Interface and the Classification Society ofNorth America 2005

[14] F Mairesse M AWalker M R Mehl and R K Moore ldquoUsinglinguistic cues for the automatic recognition of personality inconversation and textrdquo Journal of Artificial Intelligence Researchvol 30 pp 457ndash500 2007

[15] K Moore and J C McElroy ldquoThe influence of personalityon Facebook usage wall postings and regretrdquo Computers inHuman Behavior vol 28 no 1 pp 267ndash274 2012

[16] J W Pennebaker and L A King ldquoLinguistic styles languageuse as an individual differencerdquo Journal of Personality and SocialPsychology vol 77 no 6 pp 1296ndash1312 1999

[17] J Han and M Kamber Data Mining Concepts and TechniquesThe Morgan Kaufmann Series in Data Management Systemsedited by J Gray Morgan Kaufmann Boston Mass USA 3rdedition 2011

[18] J Golbeck C Robles and K Turner ldquoPredicting personalitywith social mediardquo in Proceedings of the 29th Annual CHIConference on Human Factors in Computing Systems (CHI rsquo11)pp 253ndash262 ACM May 2011

[19] J RQuinlan ldquoLearningwith continuous classesrdquo inProceedingsof the 5thAustralian Joint Conference onArtificial Intelligence (Airsquo92) pp 343ndash348 Hobart Australia November 1992

[20] S Roweis and Z Ghahramani ldquoA unifying review of lineargaussian modelsrdquo Neural Computation vol 11 no 2 pp 305ndash345 1999

[21] S Bai T Zhu and L Cheng ldquoBig-five personality pre-diction based on user behaviors at social network sitesrdquohttparxivorgpdf12044809v1pdf

[22] J Oberlander and S Nowson ldquoWhose thumb is it anywayclassifying author personality from weblog textrdquo in Proceedingsof the 44th Annual Meeting of the Association for ComputationalLinguistics pp 627ndash634 2006

[23] T Nguyen D Q Phung B Adams and S Venkatesh ldquoTowardsdiscovery of influence and personality traits through social linkpredictionrdquo in Proceedings of the International Conference onWeblogs and Social Media pp 566ndash569 Barcelona Spain July2011

[24] S T Bai B B Hao A Li S Yuan R Gao and T S ZhuldquoPredicting big five personality traits of microblog usersrdquo inProceedings of the 12th IEEEWICACM International JointConferences on Web Intelligence and Intelligent Agent Technolog(WI rsquo13) pp 501ndash508 November 2013

[25] R Sun and N Wilson ldquoA model of personality should be acognitive architecture itselfrdquo Cognitive Systems Research vol29-30 no 1 pp 1ndash30 2014

[26] A Ortigosa R M Carro and J I Quiroga ldquoPredicting userpersonality by mining social interactions in Facebookrdquo Journalof Computer and System Sciences vol 80 no 1 pp 57ndash71 2014

[27] O P John L P Naumann and C J Soto ldquoParadigm shift tothe integrative Big Five trait taxonomy history measurementand conceptualissuesrdquo in Handbook of Personality Theory andResearch O P John R W Robins and L A Pervin Eds pp114ndash158 Guilford Press New York NY USA 3rd edition 2008

[28] G Park H A Schwartz J C Eichstaedt et al ldquoAutomaticpersonality assessment through social media languagerdquo Journalof Personality and Social Psychology vol 108 no 6 pp 934ndash9522015

[29] C J Soto O P John S D Gosling and J Potter ldquoAge differencesin personality traits from 10 to 65 big five domains and facets ina large cross-sectional samplerdquo Journal of Personality and SocialPsychology vol 100 no 2 pp 330ndash348 2011

[30] C J Soto and O P John ldquoTen facet scales for the BigFive Inventory convergence with NEO PI-R facets self-peeragreement and discriminant validityrdquo Journal of Research inPersonality vol 43 no 1 pp 84ndash90 2009

[31] M V Kosti R Feldt and L Angelis ldquoPersonality emotionalintelligence and work preferences in software engineering anempirical studyrdquo Information and Software Technology vol 56no 8 pp 973ndash990 2014

[32] H A Schwartz J C Eichstaedt M L Kern et al ldquoPersonalitygender and age in the language of social media the open-vocabulary approachrdquo PLoS ONE vol 8 no 9 Article IDe73791 2013

[33] G Farnadi S ZoghbiM FMoens andMD Cock ldquoRecognis-ing personality traits using facebook status updatesrdquo in Proceed-ings of the Workshop on Computational Personality RecognitionJuly 2013 httpcliccimecunitnitfabiowcpr13farnadi wcpr-13pdf

[34] G Tsoumakas and I Katakis ldquoMulti-label classification anoverviewrdquo International Journal of Data Warehousing and Min-ing vol 3 no 3 pp 1ndash13 2007

[35] M Kosinski D Stillwell and T Graepel ldquoPrivate traits andattributes are predictable from digital records of human behav-iorrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 110 no 15 pp 5802ndash5805 2013

[36] M Belkin and P Niyogi ldquoLaplacian eigenmaps for dimension-ality reduction and data representationrdquo Neural Computationvol 15 no 6 pp 1373ndash1396 2003

[37] R Socher J Bauer C D Manning and Y N Andrew ldquoParsingwith compositional vector grammarsrdquo in Proceedings of the 51stAnnual Meeting of the Association for Computational Linguistics(ACL rsquo13) pp 455ndash465 Sofia Bulgaria August 2013

[38] K Kazama M Imada and K Kashiwagi ldquoCharacteristics ofinformation diffusion in blogs in relation to information sourcetyperdquo Neurocomputing vol 76 no 1 pp 84ndash92 2012

[39] M B David Y N Andrew and I J Michael ldquoLatent Dirichletallocationrdquo Journal ofMachine Learning Research vol 3 no 4-5pp 993ndash1022 2003

[40] P T Costa andR RMcCraeRevisedNEOPersonality Inventory(NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Profes-sional Manual Psychological Assessment Resources 1992

[41] R L Atkinson C A Richard E S Edward J B Daryland S Nolen-Hoeksema Hilgardrsquos Introduction to Psychology

14 Mathematical Problems in Engineering

Harcourt College Publishers Orlando Fla USA 13th edition2000

[42] B Verhoeven W Daelemans and T D Smedt ldquoEnsemblemethods for personality recognitionrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 35ndash38Boston Mass USA July 2013

[43] F Alam A Stepanov and G Riccardi ldquoPersonality traitsrecognition on social networkmdashfacebookrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 6ndash9Boston Mass USA July 2013

[44] M T Tomlinson D Hinote and D B Bracewell ldquoPredictingconscientiousness through semantic analysis of facebook postsrdquoin Proceedings of the Workshop on Computational PersonalityRecognition pp 31ndash34 July 2013

[45] D M Endres and J E Schindelin ldquoA newmetric for probabilitydistributionsrdquo IEEETransactions on InformationTheory vol 49no 7 pp 1858ndash1860 2003

[46] S Kullback and R A Leibler ldquoOn information and sufficiencyrdquoThe Annals of Mathematical Statistics vol 22 no 1 pp 79ndash861951

[47] A Genkin D D Lewis and D Madigan ldquoLarge-scale Bayesianlogistic regression for text categorizationrdquo Technometrics vol49 no 3 pp 291ndash304 2007

[48] A K McCallum and K Nigam ldquoA comparison of event modelsfor naive bayes text classificationrdquo in Proceedings of the AAAI-98 Workshop on Learning for Text Categorization pp 137ndash142Madison Wis USA July 1998

[49] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article A Novel Adaptive Conditional Probability ...downloads.hindawi.com/journals/mpe/2015/472917.pdf · Research Article A Novel Adaptive Conditional Probability-Based

2 Mathematical Problems in Engineering

Nevertheless confronted with the same problem as statedin [9] accessibility of userrsquos personality traits features mayvary Consequently userrsquos personality traits are by naturedifficult to predict Generally psychologists regarded person-ality traits which were reflected in userrsquos attitudes towardsthings and actions taken by user [10] as a personrsquos uniquepattern of long-term thoughts emotions and behaviors [1112] Therefore leveraging long-term mode of expression andemotion behind numerous contents that a user posts topredict hisher personality traits can be a feasible measureFurthermore users who have diverse personality traits tendto pay attention to different types of things in such ascenario we expect that topics a user is concerned aboutwould enhance the performance of userrsquos personality traitsprediction Based on the research train of thought in thispaper we propose a new adaptive conditional probability-based framework for userrsquos personality traits prediction Ourmain contributions are summarized next

(1) Demonstrate the existence of interdependenciesbetween userrsquos personality traits

(2) On account of social network features linguistic fea-tures emotional statistical features and topic featureswe put forward a novel unsupervised adaptive condi-tional probability-based framework for the problemof predicting userrsquos personality traits through takingprior knowledge of correlations between userrsquos per-sonality traits into consideration

(3) Exploit correlations between features and userrsquos per-sonality traits via Kendall correlation coefficient so asto quantify importance of each feature

(4) Update threshold of each personality trait dynami-cally rather than adopt a unified threshold

The rest of the paper is organized as follows Section 2describes the related work Section 3 defines the method wepropose details of the experimental results and dataset whichis used in this study are given in Section 4 finally conclusionappears in Section 5

2 Related Work

As research on userrsquos behaviors in social networks has becomea hot spot userrsquos personality recognition has received asignificant amount of attention in both theory and practiceArgamon et al [13] and Mairesse et al [14] were first dedi-cated to this research field Generally there were two mainapproaches adopted for studying userrsquos personality traits insocial networks Merely based on social network activitiesone was using machine learning algorithms to capture userrsquospersonality traits Moore and McElroy [15] explored userrsquospersonality traits through questionnaires and log data of 219college students and scale reliabilities for the five personalitydimensions of the IPIPwere acceptablewithCronbachrsquos alphavalues of 090 for extraversion 081 for agreeableness 082for conscientiousness 083 for emotional stability and 079for openness to experience which revealed that mining userrsquospersonality traits in Facebook was feasible Kosinski et al

[5] first presented that there were psychologically meaningfullinks between usersrsquo personalities their website preferencesand Facebook profile features and then predicted individ-ualrsquos personality traits via multivariate linear regression Theexperimental results indicated that extroversion was mosthighly expressed by Facebook features followed by neuroti-cism conscientiousness and openness Agreeableness wasthe hardest trait to predict (005 in terms of accuracy) usingFacebook profile features and the simple model

The other one extended personality-related features withlinguistic cues On the basis of the corpus which was derivedfrom essays written by students at the University of Texasat Austin [16] Argamon et al [13] utilized SMO [17] todetermine whether each author had high or low neuroticismor extraversion respectively For neuroticism their experi-mental results have shown clearly the usefulness of functionallexical features in particular the appraisal lexical taxonomywhile in the case of extraversion experimental results wereless clear but examination of indicative function wordspointed the way to developing more effective features byfocusing on expressions related to norms (in)completenessand (un)certainty Golbeck et al [18] explored whether thepublicly available information on usersrsquo Facebook profile canpredict personality traits In order to predict personality traitsof 279 Facebook users based on linguistic structural andsemantic features they used the profile data as a featureset and trained two machine learning algorithms m5suprsquoRules [19] and Gaussian Processes [20] to predict each ofthe five personality traits within 11 of their actual valueIn addition the experimental results have shown that userrsquosBig Five personality traits can be predicted from the publicinformation they shared on Facebook

Mairesse et al [14] leveraged classification regressionand ranking models to recognize personality traits via LIWCand MRC features automatically And experiments werecarried out with the essays corpus and the EAR corpusrespectively The results revealed that the LIWC featuresoutperformed the MRC features for every trait and theLIWC features on their own always performed slightly betterthan the full feature set Concerning the algorithms itcan be found that AdaboostM1 [17] performed the bestfor extraversion (563 correct classifications) while SMOproduced the best models for all other traits They alsopointed out that features were likely to vary dependingon the source of language and the method of assessmentof personality through analyzing the correlations betweendifferent characters In order to forecast RenRenrsquos 335 usersrsquopersonality traits according to the number of userrsquos friendsand a state recently released Bai et al [21] used manyclassification algorithms such as Naive Bayesian (NB) [17]Support VectorMachine (SVM) and Decision Tree [17] Andthey found out that C45 Decision Tree can get the bestresults (0697 for agreeableness 0749 for neuroticism 0824for conscientiousness 0838 for extraversion and 0811 foropenness in terms of 1198651-measure) Oberlander and Nowson[22] achieved better results (ranking on raw accuracy agree-ableness gt conscientiousness gt neuroticism gt extraversionthe best agreeableness accuracy was 304 absolute overthe baseline (772 relative)) on classification of personality

Mathematical Problems in Engineering 3

traits through leveraging Native Bayes model on account ofdiffering sets of n-gram features They also demonstratedthat with respect to feature selection policies automaticselection generally outperformed ldquomanualrdquo selection On thebasis of automatically derived psycholinguistic and mood-based features of a userrsquos textual messages Nguyen et al[23] utilized SVM classifier for examining two two-classclassification problems influential versus noninfluential andextraversion versus introversion They experimented withthree subcorpora of 10000 users each and presented the mosteffective predictors for each category The best classificationresult at 80 was achieved using psycholinguistic featuresHowever they did not predict personality traits in finer grainBai et al [24] proposed a multitask regression algorithm andan incremental regression algorithm to predict usersrsquo BigFive personality traits from their usages of Sina microblogobjectively The results indicated that personality traits canbe predicted in a high accuracy through online microblogusages Besides the average mean absolute error of multitaskregression model was 1384 which got about 5 percentagepoints reduction compared to incremental regression Sunand Wilson [25] demonstrated that without any significantaddition or modification a cognitive architecture can serveas a generic model of personality traits Besides integratingpersonality modeling with generic computational cognitivemodeling was shown to be feasible and useful

However some drawbacks can be pointed out in pre-vious work on userrsquos personality traits prediction (1) someresearchers have made an assumption that there had beenlittle or no correlations between userrsquos personality traits[26] however the massive approaches have been exploredto examine personality psychology and have revealed thecorrelations between the Big Five dimensions instead of littleor no correlations [27ndash30] (2) Although different featuresplayed different roles in predicting userrsquos personality traits[18 31 32] only a few researchers considered the correlationsbetween features and personality traits [33] (3) In multilabellearning task such as PT5 method proposed by Tsoumakasand Katakis [34] thresholds were usually unified to the samevalue which was not appropriate In this light we propose anadaptive conditional probability-based model to improve theperformance of userrsquos personality traits prediction task

3 Adaptive Conditional Probability-BasedPredicting Model for Userrsquos PersonalityTraits

In this section we present predicting model adopted inour work Initially we make a definition of preliminary fea-tures (Section 31) followed by measurement of distributingweights to characteristics (Section 32) And then we outlineframework of adaptive conditional probability-based userrsquospersonality traits prediction model (Section 33) Finallyin Section 34 we depict our algorithm and analyze timecomplexity of our proposed model

31 Definition of Features As a personrsquos unique pattern oflong-term thoughts emotions and behaviors personality

traits are reflected in userrsquos attitudes towards things andactions taken by user Therefore aside from social networkfeatures which were provided in Facebook dataset [35]directly we introduced linguistic features emotional statis-tical features and topic features to predict userrsquos personalitytraits Linguistic features emotional statistical features andtopic features can bemeasured via analysis of userrsquos Facebookstatus updates

311 Social Network Features Facebook dataset includesseven social network features namely date of userrsquos registernetwork size ego betweenness centrality normalized egobetweenness centrality density brokerage normalized bro-kerage and transitivity which reflect userrsquos behavior patternsjust through userrsquos network structure Similar to the clusterassumption [36] in this work we took the above features intoconsideration to complete prediction task and assumed thatthe more similar were the network structures between twousers the more likely they shared the same label

312 Linguistic Features Since each user showed a particularmode of expression some researchers held the view thatcorrelations between personality traits and spoken or writtenlinguistic cues were significant [14 16] so language-basedassessments can constitute valid personality measures [28]Hence in this paper we regarded linguistic cues as factors soas to mine userrsquos personality traits through userrsquos means ofexpression

A natural language parser is used to work out gram-matical structure of sentences such as grouping wordstogether as ldquophrasesrdquo and obtaining subject or object of averb Probabilistic parsers try to produce the most likelyanalysis of new sentences via leveraging knowledge of lan-guage gained from hand-parsed sentences Stanford Parser(httpnlpstanfordedusoftwarelex-parsershtmlAbout) isa probabilistic natural language open source parser whichimplements a factored product model with separate PCFGphrase structure and lexical dependency experts whosepreferences are combined by efficient exact inference usingan Alowast algorithm Either of these yields a good performancestatistical parsing system [37] A GUI is provided for utilizingit simply as an accurate unlexicalized stochastic context-freegrammar parser and viewing the phrase structure tree outputof the parser Thus in order to learn traits of contents thatuser yields we got word frequency statistics of 35 kinds ofparts of speech with Stanford Parser that were conjunctionnumeral determiner existential there foreign word modalauxiliary singular or mass noun plural noun proper nounplural proper noun predeterminer genitive marker personalpronoun plural personal pronoun ordinal adverb compara-tive adverb superlative adverb particle symbol interjectionverb in base form verb in past tense gerund verb in pastparticiple verb in present tense (not 3rd person singular)verb in present tense (3rd person singular) subordinat-ing conjunction ordinal adjective comparative adjectivesuperlative adjective list itemmarkerWH-determinerWH-pronoun WH-plural pronoun and WH-adverb Besideswe defined another six linguistic features the total numberof words and frequency statistics of punctuation comma

4 Mathematical Problems in Engineering

period exclamation and question Nevertheless we filteredhyperlinks in usersrsquo contents as blog services tended to pro-duce many hyperlinks for navigation and advertisement [38]which had no relationship with personality traits prediction

313 Emotional Statistical Features Userrsquos attitudes towardsthings which show userrsquos different personality traits reflectuserrsquos unique pattern of long-term emotions For instancea neurotic person may have a tendency to experienceunpleasant emotions easily such as anger anxiety depressionand vulnerability Hence userrsquos statistics of emotion can becharacteristics in userrsquos personality traits predicting modelIn this paper userrsquos emotional statistical characteristicsincluded proportion of positive words and negative wordsused in userrsquos posts On the basis of adjectives and theirvariants obtained in Section 312 userrsquos emotional statisticalcharacteristics were calculated with the corpus of HowNetKnowledge (httpwwwkeenagecomdownloadsentimentrar) HowNet Knowledge which includes 8945 words andphrases consists of six files positive emotional words list filenegative emotional words list file positive review words listfile negative review words list file degree level words list fileand proposition words list file

As userrsquos emotional statistical features userrsquos positive andnegative emotional statistical characteristics are defined as

PT (119894) =119901119899119894

sum119894

NT (119894) =119899119899119894

sum119894

(1)

where PT(119894) andNT(119894) represent proportion of positivewordsand negative words used in user 119894rsquos posts respectively119901119899

119894and

119899119899119894represent the number of positive emotional words and

the number of negative emotional words user 119894 used whichare included in HowNet Knowledge and sum

119894represents the

total number of words in user 119894rsquos contents

314 Topic Features The things user focuses on may have animpact on actions that user has taken Take openness which isone of the Big Five personality traits as an example it reflectsdegree of intellectual curiosity creativity and a preference fornovelty and variety a person hasTherefore wemined a seriesof userrsquos concerned themes from userrsquos status via LDA (LatentDirichlet Allocation) [39] for predicting userrsquos personalitytraits

Since our purpose is to extract all concerned themes of auser rather than to extract specific themes of each post wemerged all microblogs of a user into one document and thenextracted userrsquos concerned themes namely each documentcorresponded to a user The results of LDA model are shownas follows

DT =[[[[[[

[

DT11 DT12 sdot sdot sdot DT1119899

DT21 DT21 sdot sdot sdot DT2119899

d

DT1198981 DT

1198982 sdot sdot sdot DT119898119899

]]]]]]

]

(2)

whereDT stands for an119898times119899matrix which ismainly used forstorage of distribution of themes and documents 119898 standsfor the number of documents 119899 stands for the number ofthemes and element DT

119894119895(119894 = 1 2 119898 and 119895 = 1 2 119899)

in matrix DT stands for probability that the 119894th documentbelongs to the 119895th theme that is degree of attention whichuser 119894 pays to the 119895th theme

32 Weight Distribution of Features Every feature has adifferent impact on userrsquos personality traits prediction It isof great importance to allocate featuresrsquo weights reasonablyso as to be able to perform good prediction based onlyon scant knowledge of personality traits Kendall test is anonparametric hypothesis test which calculates correlationcoefficient to test statistical dependence of two randomvariables Since values of features and scores of personalitytraits in the dataset we used were numeric therefore inorder to quantify importance of each feature we analyzedrelevance between userrsquos personality traits and characteristicsvia Kendall correlation coefficient in which values of featuresand scores of personality traits were treated as two randomvariables Kendall correlation coefficient is calculated as

120591 (119883 119884) =119862 minus 119863

radic(1198733 minus 1198731) times (1198733 minus 1198732) (3)

where 120591(119883 119884) isin [minus1 1] and if random variables 119883 and 119884have a positive correlation then 120591(119883 119884) = 1 else if randomvariables119883 and 119884 have a negative correlation then 120591(119883 119884) =minus1 else if random variables119883 and 119884 are independent of eachother then 120591(119883 119884) = 0 119862 represents the number of tupleswhose two random variables have consistency 119863 representsthe number of tupleswhose two randomvariables donot haveconsistency and 119873

1and 119873

2represent the total number of

repeated elements in random variables119883 and 119884 respectivelyThe calculation of1198731 is shown as

1198731 =119904

sum

119894=1

12119880119894(119880119894minus 1) (4)

where 119904 denotes the number of elements that have repeatedelements in random variable 119883 and 119880

119894denotes the number

of repeated elements of the 119894th element Similarly 1198732is

calculated as

1198732 =1199041015840

sum

119895=1

12119880119895(119880119895minus 1) (5)

where 1199041015840 denotes the number of elements that have repeatedelements in random variable 119884 and 119880

119895denotes the number

of repeated elements of the 119895th element1198733denotes the total

number of merged sequences which is calculated as

1198733 =12119873(119873minus 1) (6)

Mathematical Problems in Engineering 5

Facebook

Social networkfeatures set

Userrsquosstatus set

Analyze correlations

betweenfeatures

and personality

traits

Linguistic features set

Emotional statistical features set

Topic features set

Assign weights

tofeatures

neighbors set

Analyzecorrelations

betweenpersonality

traits

Initializethresholds of personality

traits

Userrsquos personality traits set

Updatethresholds of personality traits

Calculateerror rates of personality traits

Predict userrsquos personality traits set

Minimum error rates

No

Yes

Get k nearest

Figure 1 The architecture of adaptive conditional probability-based predicting model for userrsquos personality traits

where119873 represents dimensions of tuples Thus according toKendall correlation coefficient between features and person-ality traits importance of the 119894th feature 119891

119894is calculated as

119868 (119891119894) =

sum119901119895isin119875120591 (119891119894 119901119895)

5

(7)

where 120591(119891119894 119901119895) stands for Kendall correlation coefficient

between the 119894th feature 119891119894and the 119895th personality trait 119901

119895and

119875 stands for set of Big Five personality traits which will beintroduced in Section 34 After normalizing importance ofthe 119894th feature 119891

119894 weight of 119891

119894is calculated as follows

119882(119891119894) =

119868 (119891119894)

sum119891119895isin119865119895 =119894

119868 (119891119895)

(8)

where 119865 denotes set of personality traits predicting features

33 A Framework of Predicting Userrsquos Personality TraitsFigure 1 shows the architecture of adaptive conditionalprobability-based predicting model for userrsquos personalitytraits An analysis of correlations between userrsquos personalitytraits and features is conducted before assigning weights tolinguistic emotional statistical and topic features which are

extracted from contents that user produces as well as socialnetwork features Meanwhile correlation analysis of userrsquospersonality traits is carried on Then according to weightsof features 119896 nearest neighbors set is obtained Finallygiven 119896 nearest neighbors set dynamic updated thresholds ofpersonality traits and correlations between personality traitsuserrsquos personality traits set is predicted

34 Algorithm of Adaptive Conditional Probability-BasedUserrsquos Personality Traits Prediction Before we proposed ourpredicting algorithm we conducted experiments to analyzecorrelations between userrsquos personality traits which will beshown in detail later in Section 42 And the results ofcorrelation analysis revealed that there were interdependentrelationships between userrsquos personality traits Thus givendynamic updated thresholds of personality traits as wellas considering interdependencies between personality traitsa novel adaptive correlation-based predicting model wasproposed

In psychology the Big Five personality traits are fivebroad domains or dimensions of personality traits that areused to describe human personality The theory based onthe Big Five factors is called Five Factor Model (FFM) [40]The Big Five factors are extraversion (denoted by EXT)

6 Mathematical Problems in Engineering

neuroticism (denoted by NEU) agreeableness (denoted byAGR) conscientiousness (denoted by CON) and openness(denoted by OPN)

Our method aimed to predict user 119894rsquos Big Five personalitytraits set 119875119879

119894 As stated in [41] here each personality trait

was classified into two degrees namely extraversion wasmapped onto extravert and shy neuroticism was mappedonto neurotic and secure agreeableness was mapped ontofriendly and uncooperative conscientiousness was mappedonto precise and careless and openness was mapped ontoinsightful and unimaginative For convenience we denotedthe two degrees of each personality trait by positive leveland negative level in short respectively First we set initialthreshold of each personality trait as 05 and threshold ofthe 120579th personality trait 119901

120579is denoted by 119879(119901

120579) followed by

calculating distance between user 119894 and other users with (9)as follows

Dis (119894 119895) =119898

sum

ℎ=1119882(119891ℎ) times1003816100381610038161003816119865 (119894 119891ℎ) minus 119865 (119895 119891ℎ)

1003816100381610038161003816 (9)

where Dis(119894 119895) presents distance between user 119894 and user 119895119865(119894 119891

ℎ) and 119865(119895 119891

ℎ) present the ℎth feature 119891

ℎrsquos value of user

119894 and user 119895 and119898 presents the number of featuresSecondly we sorted user 119894rsquos distance set in ascending

order and selected top 119896 users as user 119894rsquos neighbors (denotedas119873(119894)) Then for each 119901

120579 we recorded the number of users

who had positive level of 119901120579in119873(119894) as 119899

119901120579

Thirdly we selected a 119901

120579from unvisited personality traits

set UP if user 119894rsquos personality traits set PT119894was empty and then

we calculated probability that user 119894 had positive level of 119901120579as

follows

119891 (119894 119901120579) =

119875 (119867119901120579

| 119862119899119901120579

119901120579

)

119875 (sim 119867119901120579

| 119862119899119901120579

119901120579

)

(10)

where119862119899119901120579

119901120579

presents event that there are exactly119862119899119901120579

119901120579

instanceshaving positive level of 119901

120579in119873(119894) whose 119896 nearest neighbors

also have 119899119901120579

users with positive level of 119901120579 119867119901120579

and sim 119867119901120579

present event that user 119894 has positive level of 119901120579and event that

user 119894 has negative level of 119901120579 respectively And if user 119894rsquos per-

sonality traits set PT119894was not empty considering correlations

between personality traits we calculated probability that user119894 had positive level of 119901

120579on the basis of prior knowledge

about correlations between119901120579and personality traits that have

already been visited in UP as follows

119891 (119894 119901120579) =

119875 (119867119901120579

| 1199011198971 1199011198972 ) 119875 (119862

119899119901120579

119901120579

| 119867119901120579

)

119875 (sim 119867119901120579

| 1199011198971 1199011198972 ) 119875 (119862

119899119901120579

119901120579

| sim 119867119901120579

)

(11)

where 1199011198971 1199011198972 isin PT

119894present personality traits which have

already been visited in UP If 119891(119894 119901120579) was greater than or

equal to 119879(119901120579) then positive level of 119901

120579was added to PT

119894

else negative level of 119901120579was added to PT

119894

And then we calculated error rate of each personalitytrait and error rate of 119901

120579is denoted by Err(119901

120579) which is

calculated as

Err (119901120579) = 1minus

119903120579

to (12)

where 119903120579denotes the number of instances which are classified

correctly to denote the total number of instancesFinally if unvisited personality traits set UP was empty

and error rate of each personality trait fell within a specifiedrange then algorithm was terminated else if unvisited per-sonality traits set UP was not empty then we continued topredict another personality trait in UP else if there was apersonality trait 119901

120579whose error rate did not reach defined

limits then we updated 119879(119901120579) with (12) and added it to

personality traits set UP to predict it again

119879 (119901120579) = 119879 (119901

120579) + 120576 timesErr (119901

120579) (13)

where 120576 denotes a monotonic decreasing learning rate Ouralgorithm can now be defined as Algorithm 1 Assumethat size of dataset is 119879119903 dimension of features is 119889 andsize of the nearest neighbors set is 119896 The complexity ofconditional probability-based predicting model for userrsquospersonality traits is analyzed as follows From step (10) tostep (12) computation time of Kendall correlation coefficientbetween features and personality traits is taking119874(1198791199032) timecalculating weights of features takes 119874(119889) time and thendistributing weights to features can be computed in 119874(119889119879119903)time the total time is119874(1198791199032) as dimension of features 119889 is farless than size of dataset119879119903 Calculating distance between user119894 and other users from step (13) to step (15) will take 119874(119889119879119903)time In step (16) it will take 119874(119879119903log2119879119903) time for sortingset of distances between user 119894 and other users Step (18) tostep (20) take 119874(119896) time to select 119896 nearest neighbors of user119894 If algorithm goes from step (23) to step (25) it will take119874(1198962

) time Else if algorithm goes from step (26) to step (28)it will take 119874(1198791199031198962) time It takes 119874(119879119903) time from step (37)to step (45) Assume that the number of iterations is 119905 Sincesteps (10) (11) and (12) can be done offline and do not needto be calculated repeatedly every time hence from step (13)to step (52) the overall online complexity of our proposedmethod is 119874(119879119903log

2119879119903) as 119896 and 119905 are far less than 119879119903 From

here we see that our proposed method is feasible in a big dataenvironment as in Facebook monitoring

4 Experimental Evaluation

In this section we first describe dataset used in our experi-ments And then we analyze the interdependent relationshipsbetween userrsquos personality traits Finally we conduct experi-ments on different kinds of features and make a comparisonwith other methods based on the same dataset

41 Dataset MyPersonality (httpmypersonalityorg) is apopular Facebook application which is a 336-question testmeasuring the 30 facets underlying the Big Five traits thatallows users to take real psychometric tests The respondentswho come from various age groups backgrounds and cul-tures are highly motivated to answer honestly and carefullyAdditionally usersrsquo psychological and Facebook profilesare ldquoverifiedrdquo by individualsrsquo social circles hence it is hardto lie to the people that know you best Also there is nopressure and one does not have to share hisher profileinformation thus myPersonality avoids deliberate faking

Mathematical Problems in Engineering 7

Input user 119894 users set 119880 size of the nearest neighbors set 119896 unvisited personality traits set UPOutput user 119894rsquos personality traits set PT

119894

(1) 119891 larr 119891119886119897119904119890

lowast 119891 stands for a flag if any of the thresholds of personality traits is updated then 119891 equals to trueotherwise 119891 equals to false lowast

(2) For each personality trait 119901120579isin UP do

(3) set temporary error rate Temp(119901120579) as 0

(4) End for(5) For each personality trait 119901

120579isin UP do

(6) set initial threshold 119879(119901120579) as 05

(7) End for(8) PT

119894larr 0

(9) UPlarr all of the Big Five personality traits(10) For each feature 119891

ℎisin user 119894rsquos feature set do

(11) calculate 119891ℎrsquos weight according to (8)

(12) End for(13) For user 119895 isin 119880 and 119895 = 119894 do(14) calculate Dis(119894 119895) according to (9)(15) End for(16) sort Dis(119894 1)Dis(119894 2) in ascending order(17) 119873(119894) larr select top 119896 users in Dis(119894 1)Dis(119894 2) (18) For each personality trait 119901

120579isin UP do

(19) 119899119901120579

larr the number of users who have positive level of personality trait 119901120579in119873(119894)

(20) End for(21) While UP = 0 do(22) 119901

120579larr UPpoll()

(23) If PT119894= 0 then

(24) calculate 119891(119894 119901120579) according to (10)

(25) End if(26) Else if PT

119894= 0 then

(27) calculate 119891(119894 119901120579) according to (11)

(28) End if(29) If 119891(119894 119901

120579) ge 119879(119901

120579) then

(30) 119875119879119894larr positive level of 119901

120579

(31) End if(32) Else if 119891(119894 119901

120579) lt 119879(119901

120579) then

(33) PT119894larrnegative level of 119901

120579

(34) End if(35) End while(36) 119891 larr 119891119886119897119904119890

(37) For each personality trait 119901120579isin PT119894do

(38) calculate 119901120579rsquos error rate Err(119901

120579) according to (12)

(39) If |Err(119901120579) minus Temp(119901

120579)| gt 120575 then 120575 denotes a constant which is small enough

(40) 119891 larr 119905119903119906119890

(41) Temp(119901120579) larr Err(119901

120579)

(42) update 119901120579rsquos threshold 119879(119901

120579) according to (13)

(43) UPlarr 119901120579

(44) End if(45) End for(46) If 119891 == 119905119903119906119890 then(47) Go to step (21)(48) End if(49) Else if 119891 == 119891119886119897119904119890 then(50) Go to step (52)(51) End if(52) Return PT

119894

Algorithm 1 Adaptive conditional probability-based predicting model for userrsquos personality traits

8 Mathematical Problems in Engineering

of data With consent usersrsquo psychological and Facebookprofiles are recorded for research purposes Currently thedatabase contains more than 6000000 test results togetherwith more than 4000000 individual Facebook profilesIn this paper we adopted the dataset that was provided inthe Workshop on Computational Personality Recognition(Shared Task) (httpmypersonalityorgwikilibexefetchphpmedia=wikimypersonality finalzip) [35] The datasetwas a subset of myPersonality database including 250users which had both information about personality traits(annotated with usersrsquo personality scores and gold standardpersonality labelswhichwere self-assessments obtained usinga 100-item long version of the IPIP personality questionnaire(httpipiporiorgnewFinding Labeling IPIP Scaleshtm))and social network structure (network size betweennesscentrality density brokerage and transitivity) along withtheir 9900 status updates in raw text With the aid of thisstandard dataset we can compare the performance of ourproposed method with othersrsquo personality recognitionsystems on a common benchmark

42 Analysis of Correlations between Personality Traits Pre-vious works were usually predicting userrsquos personality traitswithout considering interdependencies between them In thiscontext we investigatedwhether there had been relationshipsbetween userrsquos personality traits Since not all users had thesame level of interactions consequently we first groupedusers according to different degrees of userrsquos personalitytraits As an example if a user has positive level of agree-ableness another user has positive level of agreeableness aswell then they will be divided into a group else if a user haspositive level of agreeableness another user has negative levelof agreeableness and they will not be divided into a groupThen we explored cooccurrences between personality traitsin each group simultaneously which is shown in Figure 2

It can be observed intuitively that a significant proportionof users have negative level of extraversion positive levelof openness or positive level of agreeableness It is alsonoteworthy that positive level of extraversion has less overlapwith negative level of openness negative level of consci-entiousness and negative level of agreeableness Besidespositive level of neuroticism has less overlap with positivelevel of conscientiousness positive level of agreeableness andnegative level of openness Furthermore there are bits ofusers that have positive level of extraversion and positive levelof neuroticism simultaneously

However the above cooccurrences may be due to the apriori statistics of each trait For instance if there are moreusers with negative level of neuroticism positive level ofopenness and positive level of agreeableness than othersit is more likely to have more users with cooccurrencebetween negative level of neuroticism and positive level ofopenness negative level of neuroticism and positive level ofagreeableness and positive level of openness and positivelevel of agreeableness than others as it is shown in Figure 2Therefore since users were grouped according to differentdegrees of userrsquos personality traits in order to investigatewhether the above phenomenon meant that there werepositive or negative correlations between different degrees

of personality traits in each group we treated scores ofa certain personality trait according to which users weredivided into the group and scores of another personalitytrait with a certain degree as two random variables followedby employing Kendall correlation coefficient which wascalculated in (3) to analyze real correlations between themThe results are presented in Table 1

As it can be seen from Table 1 we can observe anegative correlation between negative level of extraversionand positive level of neuroticism in other words people thatscore high on extraversion have less possibility to score highon neuroticism as well In psychology a person with negativelevel of extraversion can be explained as solitary and a personwith negative level of conscientiousnessmeans being carelessConsequently negative level of extraversion and negativelevel of conscientiousness are not in line with positive level ofneuroticism which tends to experience unpleasant emotionseasily Another factor is social desirability Moreover agree-ableness extraversion conscientiousness and openness aremore or less ldquodesirablerdquo whereas neuroticism is quite clearlynegative Thus generally there are negative correlationsbetween neuroticism and other personality traits

What is more it is inconsistent with Figure 2 that thereis a positive correlation between positive level of neuroticismand positive level of extraversion It may be explained thata person with positive level of extraversion tends to beenthusiastic and talkative which is compatible with positivelevel of neuroticism In addition in line with Figure 2since negative level of openness shows cautiousness it hasa positive correlation with negative level of extraversionand positive level of agreeableness which is a tendency tobe compassionate However there is a positive correlationbetween negative level of openness and positive level ofneuroticism which is not in keeping with Figure 2

Since there are some contradictions between Figure 2and Table 1 we leveraged Jensen-Shannon divergence [45]to further analyze correlations between userrsquos personalitytraits Jensen-Shannon divergence between two probabilitydistributions is calculated as

119863JS (119875 119876) =12(119863KL (119875 119877) +119863KL (119876 119877)) (14)

where 119875 and 119876 denote two probability distributions 119877denotes an average distribution of 119875 and 119876 and 119863KL(119875 119877)denotes Kullback-Leibler divergence [46] between 119875 and 119877which is calculated with

119863KL (119875 119877) = sum119894

119875 (119894) log 119875 (119894)119877 (119894)

(15)

where 119875(119894) and 119877(119894) denote the 119894th value of 119875 and 119877 Theresults are presented in Table 2

In keeping with Figure 2 and Table 1 Jensen-Shannondivergences between negative level of neuroticism and posi-tive level of agreeableness positive level of conscientiousnessand positive level of openness are larger It is because ofthe fact that a person with negative level of neuroticismshows confidence on everything and positive level of open-ness reflects degree of intellectual curiosity creativity and

Mathematical Problems in Engineering 9

Table 1 Kendall correlation coefficients between personality traits 119901 and 119899 stand for positive level and negative level of a certain personalitytrait The bold number is the highest Kendall correlation coefficients and the italic number is the lowest Kendall correlation coefficients Thebigger Kendall correlation coefficient is the better the consistency is

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 022 minus003 009 minus015 minus006 minus013 007 001119899 minus023 minus018 004 004 004 018 007 022

NEU 119901 minus009 minus017 010 minus021 minus003 027119899 minus014 minus013 002 minus001 minus006 minus010

AGR 119901 minus007 007 016 029119899 minus008 016 006 002

CON 119901 minus001 minus010119899 007 minus007

OPN 119901

119899

PEXT and PNEUPEXT and PAGR

PEXT and PCONPEXT and POPNPEXT and NNEUPEXT and NAGRPEXT and NCONPEXT and NOPNPNEU and PAGR

PNEU and PCONPNEU and POPNPNEU and NEXTPNEU and NAGRPNEU and NCONPNEU and NOPNPAGR and PCONPAGR and POPNPAGR and NEXTPAGR and NNEUPAGR and NCONPAGR and NOPNPCON and POPNPCON and NEXTPCON and NNEUPCON and NAGRPCON and NOPNPOPN and NEXTPOPN and NNEUPOPN and NAGRPOPN and NCONNEXT and NNEUNEXT and NAGRNEXT and NCONNEXT and NOPNNNEU and NAGRNNEU and NCONNNEU and NOPNNAGR and NCONNAGR and NOPNNCON and NOPN

Proportion ()

000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000()

Figure 2The proportion of users who have a certain pair of personality traits simultaneously PEXT PNEU PAGR PCON and POPN standfor positive level of extraversion neuroticism agreeableness conscientiousness and openness NEXT NNEU NAGR NCON and NOPNstand for negative level of extraversion neuroticism agreeableness conscientiousness and openness

10 Mathematical Problems in Engineering

Table 2 Jensen-Shannon divergence between personality traits 119901and 119899 stand for positive level and negative level of a certain personal-ity traitThe bold number is the highest Jensen-Shannon divergenceand the italic number is the lowest Jensen-Shannon divergenceSmaller Jensen-Shannon divergence means better consistency

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 058 1479 044 219 070 236 073 084119899 479 449 521 236 561 191 1043 211

NEU 119901 080 212 110 246 298 040119899 1978 294 1697 305 2473 361

AGR 119901 077 329 082 091119899 340 086 463 125

CON 119901 115 117119899 616 140

OPN 119901

119899

a preference for novelty and variety a person has alongwith positive level of agreeableness and positive level ofconscientiousness which are compatible Furthermore it isin conformity with Table 1 that positive level of extraversionhas smaller Jensen-Shannon divergences with positive levelof neuroticism and positive level of agreeableness besidespositive level of neuroticism has a smaller Jensen-Shannondivergence with negative level of openness

In addition Soto et al [29] demonstrated convergent anddiscriminate correlations for the Big Five Inventory Soto andJohn [30] also illustrated that there was strong convergencebetween each BFI facet scale and corresponding NEO PI-Rfacet Moreover Park et al [28] demonstrated convergencewith self-reports of personality at the domain- and facet-levelIn summary although utilizing different Big Five measuresit can be found that correlations in myPersonality datasetare also in keeping with the findings from other samplesHence correlations between personality traits are expectedand actually prove that myPersonality results are validSo leveraging prior knowledge about relationships betweenpersonality traits may contribute to a better prediction onpersonality traits The results provide a better theoreticalsupport for our proposed method

43 Evaluation Metric Since most researchers exploringon personality recognition leveraged different measures toevaluate their experiments on various datasets it was hard tocompletely appraise their performance and quality Howevera common dataset which was annotated with gold standardpersonality labels was available in the Workshop on Com-putational Personality Recognition (Shared Task) users werefree to split the training and test sets as they wish andprecision recall and 1198651-measure were suggested to be usedto evaluate results of predictions So in this paper for morereliable results the proposed method was also evaluated withstratified 5-fold cross-validation Since there was no trainingset in unsupervised learning the common dataset was splitinto folds and in order to avoid overfitting the same author

Table 3 Precision recall and 1198651-measure of adaptive conditionalprobability-based predicting model (in bold best performance)

Precision Recall 1198651-measureEXT 093 076 083NEU 086 067 075AGR 096 074 083CON 089 070 078OPN 091 079 084Mean 091 0732 0806

did not appear twofold at the same timeThemean precisionrecall and 1198651-measure are calculated as follows

Precision = tptp + fp

Recall = tptp + fn

1198651-measure = 2 times 119875 times 119877119875 + 119877

(16)

where tp stands for the number of positive samples thatare classified correctly fp stands for the number of positivesamples that are classified incorrectly and fn stands for thenumber of negative samples that are classified incorrectly

44 Analysis of Impacts of Different Factors First we carriedon experiments with our proposed method Scores of preci-sion recall and 1198651-measure are shown in Table 3 includingmean scores of all personality traits

441 Impacts of Different Categories of Feature Sets In thissection we conducted experiments on social network fea-tures linguistic features emotional statistical features topicfeatures and all features respectively Due to space restric-tions Figure 3 only illustrates 1198651-measure of all personalitytraits along with mean 1198651-measure of them

It can be observed that experiments which are conductedon linguistic features emotional statistical features and topicfeatures result in unsatisfactory performance with respectto social network features and social network features havethe best classification performance for extraversion which isconsistent with the conclusion in [33] Since social networkfeatures reflect a userrsquos pattern of social status and habitswhich are relatively important decisive factors in userrsquospersonality traits as a result they provide more informationthan other types of features Nevertheless whether more orless each kind of characteristic provides unique informationin userrsquos personality traits prediction Therefore carryingon experiment with combination of social network featureslinguistic features emotional statistical features and topicfeatures can achieve better performance

442 Impact of Weighted Features Additionally we con-ducted experiments on unweighted features Here we onlypresented the most successful results in Table 4 includingmean scores of all personality traits

Mathematical Problems in Engineering 11

1

featuresEssential

featuresLinguistic

featuresEmotional

featuresTopic

featuresAll

Categories of features

EXTNEUAGR

CONOPNMEAN

09

08

07

06

05

04

03

F-s

core

s

Figure 3 1198651-measures of five personality traits along with mean1198651-measures of them with different categories of feature sets

Table 4 Precision recall and 1198651-measure of predictingmodel withunweighted features (in bold best performance)

Precision Recall 1198651-measureEXT 085 076 082NEU 092 068 076AGR 094 064 076CON 091 057 068OPN 087 078 083Mean 090 0686 077

Table 5 Precision recall and 1198651-measure of predictingmodel withunified thresholds (in bold best performance)

Precision Recall 1198651-measureEXT 062 054 057NEU 080 016 025AGR 057 073 063CON 057 077 060OPN 070 100 082Mean 0652 064 0574

From Tables 3 and 4 we can observe that the best resultsfor myPersonality dataset are achieved after distributingweights to features It illustrates that considering correlationsbetween features and userrsquos personality traits can bring per-formance gain to the prediction task since there are limitedfeatures that may remain useful for userrsquos personality traitsprediction

443 Impact of Dynamic Updated Thresholds of PersonalityTraits In addition we predicted userrsquos personality traits withunified thresholds as well The results are summarized inTable 5

Multilabel learning task is to predict one or more cat-egories for each instance In the existing algorithms suchas PT5 method proposed by Tsoumakas and Katakis [34]

Table 6 Precision of different methods on Facebook dataset Thebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 093 086 096 089 091 0910Verhoevenet al [42] SVM 079 071 067 072 087 0752

Farnadi etal [33]

SVM kNNNB 058 054 050 055 060 0554

Alam et al[43]

SVM BLRmNB 058 059 059 059 060 0590

Tomlinsonet al [44] LR NA NA NA NA NA NA

generally thresholdswere unified to the same valueHoweverall categories employing a unified minimum threshold arenot appropriate If threshold is set too high not all categoriestags will be predicted on the other hand if threshold is settoo low too many classes will be predicted in results Hencein this paper we updated thresholds of personality traitsdynamically according to error rates of them Comparedto Table 3 based on thresholds obtained through updatingdynamically our method can achieve better results

45 Performance EvaluationwithDifferentMethods Asmen-tioned in Section 41 in the Workshop on ComputationalPersonality Recognition (Shared Task) different systems forpersonality recognition from text and network features ona common benchmark have been compared Verhoeven etal [42] proposed a meta-learning approach which can beextended to certain component classifiers from other genreswith other class systems or even from other languagesFarnadi et al [33] leveraged SVM 119896 nearest neighbors(kNN) [17] and Naıve Bayes (NB) for automatic recognitionof personality traits from usersrsquo Facebook statues updatesrespectively even with a small set of training dataset it couldachieve better results than most baseline algorithms On thebasis of a set of features extracted from Facebook datasetAlam et al [43] explored suitability and performance ofseveral classification techniques which were SVM BayesianLogistic Regression (BLR) [47] andMultinomial Naıve Bayes(mNB) [48] Tomlinson et al [44] used ranking algorithmsfor feature selection and Logistic Regression (LR) [49] aslearning algorithms which achieved a high performance Inthis paper we tested our proposed method with the sameset than the participants in the Workshop on ComputationalPersonality Recognition (Shared Task) In terms of precisionrecall and 1198651-measure a comparison of the works describedabove as well as our proposed method (denoted by CP) issummarized in Tables 6 7 and 8

From Tables 6 7 and 8 the experimental results indicatethat openness is the easiest personality trait to be predictedby Facebook features with the above methods because thereare more users with positive level of openness than otherpersonality traits in Facebook dataset Since Verhoeven et al[42] trained an ensemble SVM model based on both Face-book and Essays personality traits dataset as a consequence

12 Mathematical Problems in Engineering

Table 7 Recall of different methods on Facebook dataset The bestperformance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 076 067 074 070 079 0732Verhoevenet al [42] SVM 079 072 068 072 087 0756

Farnadi etal [33]

SVMkNN NB 061 053 050 054 070 0576

Alam et al[43]

SVMBLR mNB 058 058 059 059 060 0588

Tomlinsonet al [44] LR NA NA NA NA NA NA

Table 8 1198651-measure of different methods on Facebook datasetThebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 083 075 083 078 084 0806Verhoevenet al [42] SVM 079 070 067 072 086 0748

Farnadi etal [33]

SVMkNN NB 056 052 050 054 061 0546

Alam et al[43]

SVMBLR mNB 058 058 058 059 060 0586

Tomlinsonet al [44] LR NA NA NA NA NA 0630

when tested on the same Facebook dataset it can achievebetter results than our proposed method Furthermorewhen trained and tested on the same Facebook dataset ourmethod outperforms the methods in [33 43 44] which donot take weighted features dynamic updated thresholds ofpersonality traits and correlations between personality traitsinto consideration

5 Conclusion

In this paper we studied the problem of exploiting interde-pendencies between userrsquos personality traits for predictinguserrsquos personality traits First after analyzing importance offeatures we conducted experiments on Facebook datasetto demonstrate the existence of correlations between userrsquospersonality traits Bearing this in mind an unsupervisedframework adaptive conditional probability-based predict-ing model was then proposed to predict userrsquos Big Fivepersonality traits based on importance of features dynamicupdated thresholds of personality traits and prior knowledgeabout correlations between personality traits Furthermorewe compared our results with the ones achieved by othersin the Workshop on Computational Personality Recognition(Shared Task) on the same dataset In general the experimen-tal results demonstrated the effectiveness of our proposedframework

In future work we will speculate on what directions canbe undertaken to ameliorate its performance with respect

to time complexity so as to better apply it to a big dataenvironment as in Facebook monitoring Besides in orderto make our framework applicable to dynamic networksbetter we will explore combining time series analysis withpersonality traits predicting algorithm to capture dynamicevolution process of information and network structureFurthermore since social network features (described inSection 311) are specific for our dataset they may be difficult(or even impossible) to obtain in another dataset hencewe will conduct experiments on other datasets such as theTweeter corpus of PAN shared task (httppanwebisde)on author profiling where also personality is considered tofurther validate the effectiveness of the proposed frameworkbased on different features

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China under Grant no 61300148 the Scientificand Technological Break-Through Program of Jilin ProvinceunderGrant no 20130206051GX the Science andTechnologyDevelopment Program of Jilin Province under Grant no20130522112JH the Science Foundation for China Postdoc-tor under Grant no 2012M510879 and the Basic Scien-tific Research Foundation for the Interdisciplinary Researchand Innovation Project of Jilin University under Grant no201103129

References

[1] M Oussalah F Bhat K Challis and T Schnier ldquoA softwarearchitecture for Twitter collection search and geolocationservicesrdquo Knowledge-Based Systems vol 37 pp 105ndash120 2013

[2] R R McCrae and P T Costa ldquoValidation of the 5-factor modelof personality across instruments and observersrdquo Journal ofPersonality and Social Psychology vol 52 no 1 pp 81ndash90 1987

[3] G Saucier and L R Goldberg ldquoThe structure of personalityattributesrdquo in Personality and Work Reconsidering the Role ofPersonality in Organizations M R Barrick and A M RyanEds pp 1ndash29 Jossey-Bass 2003

[4] B Caci M Cardaci M E Tabacchi and F Scrima ldquoPersonalityvariables as predictors of facebook usagerdquoPsychological Reportsvol 114 no 2 pp 528ndash539 2014

[5] M Kosinski Y Bachrach P Kohli D Stillwell and T GraepelldquoManifestations of user personality in website choice andbehaviour on online social networksrdquoMachine Learning vol 95no 3 pp 357ndash380 2014

[6] M-G Cojocaru H Thille E Thommes D Nelson and SGreenhalgh ldquoSocial influence and dynamic demand for newproductsrdquoEnvironmentalModellingamp Software vol 50 pp 169ndash185 2013

[7] F Durupinar N Pelechano J Allbeck U Gudukbay andN Badler ldquoThe impact of the OCEAN personality modelon the perception of crowdsrdquo IEEE Computer Graphics andApplications vol 31 no 3 pp 22ndash31 2011

Mathematical Problems in Engineering 13

[8] X Feng ldquoStudy on customer personality characteristics andrelationship outcomes using SEM analysisrdquo in Proceedings ofthe International Conference on Mechatronics and InformationTechnology pp 841ndash844 2013

[9] J G Lee SMoon andK Salamatian ldquoModeling and predictingthe popularity of online contents with Cox proportional hazardregression modelrdquo Neurocomputing vol 76 no 1 pp 134ndash1452012

[10] J W Stoughton L F Thompson and A W Meade ldquoBigfive personality traits reflected in job applicantsrsquo social mediapostingsrdquo Cyberpsychology Behavior and Social Networkingvol 16 no 11 pp 800ndash805 2013

[11] D A Cobb-Clark and S Schurer ldquoThe stability of big-fivepersonality traitsrdquo Economics Letters vol 115 no 1 pp 11ndash152012

[12] W Mischel ldquoToward an integrative science of the personrdquoAnnual Review of Psychology vol 55 pp 1ndash22 2004

[13] S Argamon S Dhawle M Koppel and J Pennebaker ldquoLexicalpredictors 19 of personality typerdquo in Proceedings of the JointAnnual Meeting of the Interface and the Classification Society ofNorth America 2005

[14] F Mairesse M AWalker M R Mehl and R K Moore ldquoUsinglinguistic cues for the automatic recognition of personality inconversation and textrdquo Journal of Artificial Intelligence Researchvol 30 pp 457ndash500 2007

[15] K Moore and J C McElroy ldquoThe influence of personalityon Facebook usage wall postings and regretrdquo Computers inHuman Behavior vol 28 no 1 pp 267ndash274 2012

[16] J W Pennebaker and L A King ldquoLinguistic styles languageuse as an individual differencerdquo Journal of Personality and SocialPsychology vol 77 no 6 pp 1296ndash1312 1999

[17] J Han and M Kamber Data Mining Concepts and TechniquesThe Morgan Kaufmann Series in Data Management Systemsedited by J Gray Morgan Kaufmann Boston Mass USA 3rdedition 2011

[18] J Golbeck C Robles and K Turner ldquoPredicting personalitywith social mediardquo in Proceedings of the 29th Annual CHIConference on Human Factors in Computing Systems (CHI rsquo11)pp 253ndash262 ACM May 2011

[19] J RQuinlan ldquoLearningwith continuous classesrdquo inProceedingsof the 5thAustralian Joint Conference onArtificial Intelligence (Airsquo92) pp 343ndash348 Hobart Australia November 1992

[20] S Roweis and Z Ghahramani ldquoA unifying review of lineargaussian modelsrdquo Neural Computation vol 11 no 2 pp 305ndash345 1999

[21] S Bai T Zhu and L Cheng ldquoBig-five personality pre-diction based on user behaviors at social network sitesrdquohttparxivorgpdf12044809v1pdf

[22] J Oberlander and S Nowson ldquoWhose thumb is it anywayclassifying author personality from weblog textrdquo in Proceedingsof the 44th Annual Meeting of the Association for ComputationalLinguistics pp 627ndash634 2006

[23] T Nguyen D Q Phung B Adams and S Venkatesh ldquoTowardsdiscovery of influence and personality traits through social linkpredictionrdquo in Proceedings of the International Conference onWeblogs and Social Media pp 566ndash569 Barcelona Spain July2011

[24] S T Bai B B Hao A Li S Yuan R Gao and T S ZhuldquoPredicting big five personality traits of microblog usersrdquo inProceedings of the 12th IEEEWICACM International JointConferences on Web Intelligence and Intelligent Agent Technolog(WI rsquo13) pp 501ndash508 November 2013

[25] R Sun and N Wilson ldquoA model of personality should be acognitive architecture itselfrdquo Cognitive Systems Research vol29-30 no 1 pp 1ndash30 2014

[26] A Ortigosa R M Carro and J I Quiroga ldquoPredicting userpersonality by mining social interactions in Facebookrdquo Journalof Computer and System Sciences vol 80 no 1 pp 57ndash71 2014

[27] O P John L P Naumann and C J Soto ldquoParadigm shift tothe integrative Big Five trait taxonomy history measurementand conceptualissuesrdquo in Handbook of Personality Theory andResearch O P John R W Robins and L A Pervin Eds pp114ndash158 Guilford Press New York NY USA 3rd edition 2008

[28] G Park H A Schwartz J C Eichstaedt et al ldquoAutomaticpersonality assessment through social media languagerdquo Journalof Personality and Social Psychology vol 108 no 6 pp 934ndash9522015

[29] C J Soto O P John S D Gosling and J Potter ldquoAge differencesin personality traits from 10 to 65 big five domains and facets ina large cross-sectional samplerdquo Journal of Personality and SocialPsychology vol 100 no 2 pp 330ndash348 2011

[30] C J Soto and O P John ldquoTen facet scales for the BigFive Inventory convergence with NEO PI-R facets self-peeragreement and discriminant validityrdquo Journal of Research inPersonality vol 43 no 1 pp 84ndash90 2009

[31] M V Kosti R Feldt and L Angelis ldquoPersonality emotionalintelligence and work preferences in software engineering anempirical studyrdquo Information and Software Technology vol 56no 8 pp 973ndash990 2014

[32] H A Schwartz J C Eichstaedt M L Kern et al ldquoPersonalitygender and age in the language of social media the open-vocabulary approachrdquo PLoS ONE vol 8 no 9 Article IDe73791 2013

[33] G Farnadi S ZoghbiM FMoens andMD Cock ldquoRecognis-ing personality traits using facebook status updatesrdquo in Proceed-ings of the Workshop on Computational Personality RecognitionJuly 2013 httpcliccimecunitnitfabiowcpr13farnadi wcpr-13pdf

[34] G Tsoumakas and I Katakis ldquoMulti-label classification anoverviewrdquo International Journal of Data Warehousing and Min-ing vol 3 no 3 pp 1ndash13 2007

[35] M Kosinski D Stillwell and T Graepel ldquoPrivate traits andattributes are predictable from digital records of human behav-iorrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 110 no 15 pp 5802ndash5805 2013

[36] M Belkin and P Niyogi ldquoLaplacian eigenmaps for dimension-ality reduction and data representationrdquo Neural Computationvol 15 no 6 pp 1373ndash1396 2003

[37] R Socher J Bauer C D Manning and Y N Andrew ldquoParsingwith compositional vector grammarsrdquo in Proceedings of the 51stAnnual Meeting of the Association for Computational Linguistics(ACL rsquo13) pp 455ndash465 Sofia Bulgaria August 2013

[38] K Kazama M Imada and K Kashiwagi ldquoCharacteristics ofinformation diffusion in blogs in relation to information sourcetyperdquo Neurocomputing vol 76 no 1 pp 84ndash92 2012

[39] M B David Y N Andrew and I J Michael ldquoLatent Dirichletallocationrdquo Journal ofMachine Learning Research vol 3 no 4-5pp 993ndash1022 2003

[40] P T Costa andR RMcCraeRevisedNEOPersonality Inventory(NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Profes-sional Manual Psychological Assessment Resources 1992

[41] R L Atkinson C A Richard E S Edward J B Daryland S Nolen-Hoeksema Hilgardrsquos Introduction to Psychology

14 Mathematical Problems in Engineering

Harcourt College Publishers Orlando Fla USA 13th edition2000

[42] B Verhoeven W Daelemans and T D Smedt ldquoEnsemblemethods for personality recognitionrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 35ndash38Boston Mass USA July 2013

[43] F Alam A Stepanov and G Riccardi ldquoPersonality traitsrecognition on social networkmdashfacebookrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 6ndash9Boston Mass USA July 2013

[44] M T Tomlinson D Hinote and D B Bracewell ldquoPredictingconscientiousness through semantic analysis of facebook postsrdquoin Proceedings of the Workshop on Computational PersonalityRecognition pp 31ndash34 July 2013

[45] D M Endres and J E Schindelin ldquoA newmetric for probabilitydistributionsrdquo IEEETransactions on InformationTheory vol 49no 7 pp 1858ndash1860 2003

[46] S Kullback and R A Leibler ldquoOn information and sufficiencyrdquoThe Annals of Mathematical Statistics vol 22 no 1 pp 79ndash861951

[47] A Genkin D D Lewis and D Madigan ldquoLarge-scale Bayesianlogistic regression for text categorizationrdquo Technometrics vol49 no 3 pp 291ndash304 2007

[48] A K McCallum and K Nigam ldquoA comparison of event modelsfor naive bayes text classificationrdquo in Proceedings of the AAAI-98 Workshop on Learning for Text Categorization pp 137ndash142Madison Wis USA July 1998

[49] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article A Novel Adaptive Conditional Probability ...downloads.hindawi.com/journals/mpe/2015/472917.pdf · Research Article A Novel Adaptive Conditional Probability-Based

Mathematical Problems in Engineering 3

traits through leveraging Native Bayes model on account ofdiffering sets of n-gram features They also demonstratedthat with respect to feature selection policies automaticselection generally outperformed ldquomanualrdquo selection On thebasis of automatically derived psycholinguistic and mood-based features of a userrsquos textual messages Nguyen et al[23] utilized SVM classifier for examining two two-classclassification problems influential versus noninfluential andextraversion versus introversion They experimented withthree subcorpora of 10000 users each and presented the mosteffective predictors for each category The best classificationresult at 80 was achieved using psycholinguistic featuresHowever they did not predict personality traits in finer grainBai et al [24] proposed a multitask regression algorithm andan incremental regression algorithm to predict usersrsquo BigFive personality traits from their usages of Sina microblogobjectively The results indicated that personality traits canbe predicted in a high accuracy through online microblogusages Besides the average mean absolute error of multitaskregression model was 1384 which got about 5 percentagepoints reduction compared to incremental regression Sunand Wilson [25] demonstrated that without any significantaddition or modification a cognitive architecture can serveas a generic model of personality traits Besides integratingpersonality modeling with generic computational cognitivemodeling was shown to be feasible and useful

However some drawbacks can be pointed out in pre-vious work on userrsquos personality traits prediction (1) someresearchers have made an assumption that there had beenlittle or no correlations between userrsquos personality traits[26] however the massive approaches have been exploredto examine personality psychology and have revealed thecorrelations between the Big Five dimensions instead of littleor no correlations [27ndash30] (2) Although different featuresplayed different roles in predicting userrsquos personality traits[18 31 32] only a few researchers considered the correlationsbetween features and personality traits [33] (3) In multilabellearning task such as PT5 method proposed by Tsoumakasand Katakis [34] thresholds were usually unified to the samevalue which was not appropriate In this light we propose anadaptive conditional probability-based model to improve theperformance of userrsquos personality traits prediction task

3 Adaptive Conditional Probability-BasedPredicting Model for Userrsquos PersonalityTraits

In this section we present predicting model adopted inour work Initially we make a definition of preliminary fea-tures (Section 31) followed by measurement of distributingweights to characteristics (Section 32) And then we outlineframework of adaptive conditional probability-based userrsquospersonality traits prediction model (Section 33) Finallyin Section 34 we depict our algorithm and analyze timecomplexity of our proposed model

31 Definition of Features As a personrsquos unique pattern oflong-term thoughts emotions and behaviors personality

traits are reflected in userrsquos attitudes towards things andactions taken by user Therefore aside from social networkfeatures which were provided in Facebook dataset [35]directly we introduced linguistic features emotional statis-tical features and topic features to predict userrsquos personalitytraits Linguistic features emotional statistical features andtopic features can bemeasured via analysis of userrsquos Facebookstatus updates

311 Social Network Features Facebook dataset includesseven social network features namely date of userrsquos registernetwork size ego betweenness centrality normalized egobetweenness centrality density brokerage normalized bro-kerage and transitivity which reflect userrsquos behavior patternsjust through userrsquos network structure Similar to the clusterassumption [36] in this work we took the above features intoconsideration to complete prediction task and assumed thatthe more similar were the network structures between twousers the more likely they shared the same label

312 Linguistic Features Since each user showed a particularmode of expression some researchers held the view thatcorrelations between personality traits and spoken or writtenlinguistic cues were significant [14 16] so language-basedassessments can constitute valid personality measures [28]Hence in this paper we regarded linguistic cues as factors soas to mine userrsquos personality traits through userrsquos means ofexpression

A natural language parser is used to work out gram-matical structure of sentences such as grouping wordstogether as ldquophrasesrdquo and obtaining subject or object of averb Probabilistic parsers try to produce the most likelyanalysis of new sentences via leveraging knowledge of lan-guage gained from hand-parsed sentences Stanford Parser(httpnlpstanfordedusoftwarelex-parsershtmlAbout) isa probabilistic natural language open source parser whichimplements a factored product model with separate PCFGphrase structure and lexical dependency experts whosepreferences are combined by efficient exact inference usingan Alowast algorithm Either of these yields a good performancestatistical parsing system [37] A GUI is provided for utilizingit simply as an accurate unlexicalized stochastic context-freegrammar parser and viewing the phrase structure tree outputof the parser Thus in order to learn traits of contents thatuser yields we got word frequency statistics of 35 kinds ofparts of speech with Stanford Parser that were conjunctionnumeral determiner existential there foreign word modalauxiliary singular or mass noun plural noun proper nounplural proper noun predeterminer genitive marker personalpronoun plural personal pronoun ordinal adverb compara-tive adverb superlative adverb particle symbol interjectionverb in base form verb in past tense gerund verb in pastparticiple verb in present tense (not 3rd person singular)verb in present tense (3rd person singular) subordinat-ing conjunction ordinal adjective comparative adjectivesuperlative adjective list itemmarkerWH-determinerWH-pronoun WH-plural pronoun and WH-adverb Besideswe defined another six linguistic features the total numberof words and frequency statistics of punctuation comma

4 Mathematical Problems in Engineering

period exclamation and question Nevertheless we filteredhyperlinks in usersrsquo contents as blog services tended to pro-duce many hyperlinks for navigation and advertisement [38]which had no relationship with personality traits prediction

313 Emotional Statistical Features Userrsquos attitudes towardsthings which show userrsquos different personality traits reflectuserrsquos unique pattern of long-term emotions For instancea neurotic person may have a tendency to experienceunpleasant emotions easily such as anger anxiety depressionand vulnerability Hence userrsquos statistics of emotion can becharacteristics in userrsquos personality traits predicting modelIn this paper userrsquos emotional statistical characteristicsincluded proportion of positive words and negative wordsused in userrsquos posts On the basis of adjectives and theirvariants obtained in Section 312 userrsquos emotional statisticalcharacteristics were calculated with the corpus of HowNetKnowledge (httpwwwkeenagecomdownloadsentimentrar) HowNet Knowledge which includes 8945 words andphrases consists of six files positive emotional words list filenegative emotional words list file positive review words listfile negative review words list file degree level words list fileand proposition words list file

As userrsquos emotional statistical features userrsquos positive andnegative emotional statistical characteristics are defined as

PT (119894) =119901119899119894

sum119894

NT (119894) =119899119899119894

sum119894

(1)

where PT(119894) andNT(119894) represent proportion of positivewordsand negative words used in user 119894rsquos posts respectively119901119899

119894and

119899119899119894represent the number of positive emotional words and

the number of negative emotional words user 119894 used whichare included in HowNet Knowledge and sum

119894represents the

total number of words in user 119894rsquos contents

314 Topic Features The things user focuses on may have animpact on actions that user has taken Take openness which isone of the Big Five personality traits as an example it reflectsdegree of intellectual curiosity creativity and a preference fornovelty and variety a person hasTherefore wemined a seriesof userrsquos concerned themes from userrsquos status via LDA (LatentDirichlet Allocation) [39] for predicting userrsquos personalitytraits

Since our purpose is to extract all concerned themes of auser rather than to extract specific themes of each post wemerged all microblogs of a user into one document and thenextracted userrsquos concerned themes namely each documentcorresponded to a user The results of LDA model are shownas follows

DT =[[[[[[

[

DT11 DT12 sdot sdot sdot DT1119899

DT21 DT21 sdot sdot sdot DT2119899

d

DT1198981 DT

1198982 sdot sdot sdot DT119898119899

]]]]]]

]

(2)

whereDT stands for an119898times119899matrix which ismainly used forstorage of distribution of themes and documents 119898 standsfor the number of documents 119899 stands for the number ofthemes and element DT

119894119895(119894 = 1 2 119898 and 119895 = 1 2 119899)

in matrix DT stands for probability that the 119894th documentbelongs to the 119895th theme that is degree of attention whichuser 119894 pays to the 119895th theme

32 Weight Distribution of Features Every feature has adifferent impact on userrsquos personality traits prediction It isof great importance to allocate featuresrsquo weights reasonablyso as to be able to perform good prediction based onlyon scant knowledge of personality traits Kendall test is anonparametric hypothesis test which calculates correlationcoefficient to test statistical dependence of two randomvariables Since values of features and scores of personalitytraits in the dataset we used were numeric therefore inorder to quantify importance of each feature we analyzedrelevance between userrsquos personality traits and characteristicsvia Kendall correlation coefficient in which values of featuresand scores of personality traits were treated as two randomvariables Kendall correlation coefficient is calculated as

120591 (119883 119884) =119862 minus 119863

radic(1198733 minus 1198731) times (1198733 minus 1198732) (3)

where 120591(119883 119884) isin [minus1 1] and if random variables 119883 and 119884have a positive correlation then 120591(119883 119884) = 1 else if randomvariables119883 and 119884 have a negative correlation then 120591(119883 119884) =minus1 else if random variables119883 and 119884 are independent of eachother then 120591(119883 119884) = 0 119862 represents the number of tupleswhose two random variables have consistency 119863 representsthe number of tupleswhose two randomvariables donot haveconsistency and 119873

1and 119873

2represent the total number of

repeated elements in random variables119883 and 119884 respectivelyThe calculation of1198731 is shown as

1198731 =119904

sum

119894=1

12119880119894(119880119894minus 1) (4)

where 119904 denotes the number of elements that have repeatedelements in random variable 119883 and 119880

119894denotes the number

of repeated elements of the 119894th element Similarly 1198732is

calculated as

1198732 =1199041015840

sum

119895=1

12119880119895(119880119895minus 1) (5)

where 1199041015840 denotes the number of elements that have repeatedelements in random variable 119884 and 119880

119895denotes the number

of repeated elements of the 119895th element1198733denotes the total

number of merged sequences which is calculated as

1198733 =12119873(119873minus 1) (6)

Mathematical Problems in Engineering 5

Facebook

Social networkfeatures set

Userrsquosstatus set

Analyze correlations

betweenfeatures

and personality

traits

Linguistic features set

Emotional statistical features set

Topic features set

Assign weights

tofeatures

neighbors set

Analyzecorrelations

betweenpersonality

traits

Initializethresholds of personality

traits

Userrsquos personality traits set

Updatethresholds of personality traits

Calculateerror rates of personality traits

Predict userrsquos personality traits set

Minimum error rates

No

Yes

Get k nearest

Figure 1 The architecture of adaptive conditional probability-based predicting model for userrsquos personality traits

where119873 represents dimensions of tuples Thus according toKendall correlation coefficient between features and person-ality traits importance of the 119894th feature 119891

119894is calculated as

119868 (119891119894) =

sum119901119895isin119875120591 (119891119894 119901119895)

5

(7)

where 120591(119891119894 119901119895) stands for Kendall correlation coefficient

between the 119894th feature 119891119894and the 119895th personality trait 119901

119895and

119875 stands for set of Big Five personality traits which will beintroduced in Section 34 After normalizing importance ofthe 119894th feature 119891

119894 weight of 119891

119894is calculated as follows

119882(119891119894) =

119868 (119891119894)

sum119891119895isin119865119895 =119894

119868 (119891119895)

(8)

where 119865 denotes set of personality traits predicting features

33 A Framework of Predicting Userrsquos Personality TraitsFigure 1 shows the architecture of adaptive conditionalprobability-based predicting model for userrsquos personalitytraits An analysis of correlations between userrsquos personalitytraits and features is conducted before assigning weights tolinguistic emotional statistical and topic features which are

extracted from contents that user produces as well as socialnetwork features Meanwhile correlation analysis of userrsquospersonality traits is carried on Then according to weightsof features 119896 nearest neighbors set is obtained Finallygiven 119896 nearest neighbors set dynamic updated thresholds ofpersonality traits and correlations between personality traitsuserrsquos personality traits set is predicted

34 Algorithm of Adaptive Conditional Probability-BasedUserrsquos Personality Traits Prediction Before we proposed ourpredicting algorithm we conducted experiments to analyzecorrelations between userrsquos personality traits which will beshown in detail later in Section 42 And the results ofcorrelation analysis revealed that there were interdependentrelationships between userrsquos personality traits Thus givendynamic updated thresholds of personality traits as wellas considering interdependencies between personality traitsa novel adaptive correlation-based predicting model wasproposed

In psychology the Big Five personality traits are fivebroad domains or dimensions of personality traits that areused to describe human personality The theory based onthe Big Five factors is called Five Factor Model (FFM) [40]The Big Five factors are extraversion (denoted by EXT)

6 Mathematical Problems in Engineering

neuroticism (denoted by NEU) agreeableness (denoted byAGR) conscientiousness (denoted by CON) and openness(denoted by OPN)

Our method aimed to predict user 119894rsquos Big Five personalitytraits set 119875119879

119894 As stated in [41] here each personality trait

was classified into two degrees namely extraversion wasmapped onto extravert and shy neuroticism was mappedonto neurotic and secure agreeableness was mapped ontofriendly and uncooperative conscientiousness was mappedonto precise and careless and openness was mapped ontoinsightful and unimaginative For convenience we denotedthe two degrees of each personality trait by positive leveland negative level in short respectively First we set initialthreshold of each personality trait as 05 and threshold ofthe 120579th personality trait 119901

120579is denoted by 119879(119901

120579) followed by

calculating distance between user 119894 and other users with (9)as follows

Dis (119894 119895) =119898

sum

ℎ=1119882(119891ℎ) times1003816100381610038161003816119865 (119894 119891ℎ) minus 119865 (119895 119891ℎ)

1003816100381610038161003816 (9)

where Dis(119894 119895) presents distance between user 119894 and user 119895119865(119894 119891

ℎ) and 119865(119895 119891

ℎ) present the ℎth feature 119891

ℎrsquos value of user

119894 and user 119895 and119898 presents the number of featuresSecondly we sorted user 119894rsquos distance set in ascending

order and selected top 119896 users as user 119894rsquos neighbors (denotedas119873(119894)) Then for each 119901

120579 we recorded the number of users

who had positive level of 119901120579in119873(119894) as 119899

119901120579

Thirdly we selected a 119901

120579from unvisited personality traits

set UP if user 119894rsquos personality traits set PT119894was empty and then

we calculated probability that user 119894 had positive level of 119901120579as

follows

119891 (119894 119901120579) =

119875 (119867119901120579

| 119862119899119901120579

119901120579

)

119875 (sim 119867119901120579

| 119862119899119901120579

119901120579

)

(10)

where119862119899119901120579

119901120579

presents event that there are exactly119862119899119901120579

119901120579

instanceshaving positive level of 119901

120579in119873(119894) whose 119896 nearest neighbors

also have 119899119901120579

users with positive level of 119901120579 119867119901120579

and sim 119867119901120579

present event that user 119894 has positive level of 119901120579and event that

user 119894 has negative level of 119901120579 respectively And if user 119894rsquos per-

sonality traits set PT119894was not empty considering correlations

between personality traits we calculated probability that user119894 had positive level of 119901

120579on the basis of prior knowledge

about correlations between119901120579and personality traits that have

already been visited in UP as follows

119891 (119894 119901120579) =

119875 (119867119901120579

| 1199011198971 1199011198972 ) 119875 (119862

119899119901120579

119901120579

| 119867119901120579

)

119875 (sim 119867119901120579

| 1199011198971 1199011198972 ) 119875 (119862

119899119901120579

119901120579

| sim 119867119901120579

)

(11)

where 1199011198971 1199011198972 isin PT

119894present personality traits which have

already been visited in UP If 119891(119894 119901120579) was greater than or

equal to 119879(119901120579) then positive level of 119901

120579was added to PT

119894

else negative level of 119901120579was added to PT

119894

And then we calculated error rate of each personalitytrait and error rate of 119901

120579is denoted by Err(119901

120579) which is

calculated as

Err (119901120579) = 1minus

119903120579

to (12)

where 119903120579denotes the number of instances which are classified

correctly to denote the total number of instancesFinally if unvisited personality traits set UP was empty

and error rate of each personality trait fell within a specifiedrange then algorithm was terminated else if unvisited per-sonality traits set UP was not empty then we continued topredict another personality trait in UP else if there was apersonality trait 119901

120579whose error rate did not reach defined

limits then we updated 119879(119901120579) with (12) and added it to

personality traits set UP to predict it again

119879 (119901120579) = 119879 (119901

120579) + 120576 timesErr (119901

120579) (13)

where 120576 denotes a monotonic decreasing learning rate Ouralgorithm can now be defined as Algorithm 1 Assumethat size of dataset is 119879119903 dimension of features is 119889 andsize of the nearest neighbors set is 119896 The complexity ofconditional probability-based predicting model for userrsquospersonality traits is analyzed as follows From step (10) tostep (12) computation time of Kendall correlation coefficientbetween features and personality traits is taking119874(1198791199032) timecalculating weights of features takes 119874(119889) time and thendistributing weights to features can be computed in 119874(119889119879119903)time the total time is119874(1198791199032) as dimension of features 119889 is farless than size of dataset119879119903 Calculating distance between user119894 and other users from step (13) to step (15) will take 119874(119889119879119903)time In step (16) it will take 119874(119879119903log2119879119903) time for sortingset of distances between user 119894 and other users Step (18) tostep (20) take 119874(119896) time to select 119896 nearest neighbors of user119894 If algorithm goes from step (23) to step (25) it will take119874(1198962

) time Else if algorithm goes from step (26) to step (28)it will take 119874(1198791199031198962) time It takes 119874(119879119903) time from step (37)to step (45) Assume that the number of iterations is 119905 Sincesteps (10) (11) and (12) can be done offline and do not needto be calculated repeatedly every time hence from step (13)to step (52) the overall online complexity of our proposedmethod is 119874(119879119903log

2119879119903) as 119896 and 119905 are far less than 119879119903 From

here we see that our proposed method is feasible in a big dataenvironment as in Facebook monitoring

4 Experimental Evaluation

In this section we first describe dataset used in our experi-ments And then we analyze the interdependent relationshipsbetween userrsquos personality traits Finally we conduct experi-ments on different kinds of features and make a comparisonwith other methods based on the same dataset

41 Dataset MyPersonality (httpmypersonalityorg) is apopular Facebook application which is a 336-question testmeasuring the 30 facets underlying the Big Five traits thatallows users to take real psychometric tests The respondentswho come from various age groups backgrounds and cul-tures are highly motivated to answer honestly and carefullyAdditionally usersrsquo psychological and Facebook profilesare ldquoverifiedrdquo by individualsrsquo social circles hence it is hardto lie to the people that know you best Also there is nopressure and one does not have to share hisher profileinformation thus myPersonality avoids deliberate faking

Mathematical Problems in Engineering 7

Input user 119894 users set 119880 size of the nearest neighbors set 119896 unvisited personality traits set UPOutput user 119894rsquos personality traits set PT

119894

(1) 119891 larr 119891119886119897119904119890

lowast 119891 stands for a flag if any of the thresholds of personality traits is updated then 119891 equals to trueotherwise 119891 equals to false lowast

(2) For each personality trait 119901120579isin UP do

(3) set temporary error rate Temp(119901120579) as 0

(4) End for(5) For each personality trait 119901

120579isin UP do

(6) set initial threshold 119879(119901120579) as 05

(7) End for(8) PT

119894larr 0

(9) UPlarr all of the Big Five personality traits(10) For each feature 119891

ℎisin user 119894rsquos feature set do

(11) calculate 119891ℎrsquos weight according to (8)

(12) End for(13) For user 119895 isin 119880 and 119895 = 119894 do(14) calculate Dis(119894 119895) according to (9)(15) End for(16) sort Dis(119894 1)Dis(119894 2) in ascending order(17) 119873(119894) larr select top 119896 users in Dis(119894 1)Dis(119894 2) (18) For each personality trait 119901

120579isin UP do

(19) 119899119901120579

larr the number of users who have positive level of personality trait 119901120579in119873(119894)

(20) End for(21) While UP = 0 do(22) 119901

120579larr UPpoll()

(23) If PT119894= 0 then

(24) calculate 119891(119894 119901120579) according to (10)

(25) End if(26) Else if PT

119894= 0 then

(27) calculate 119891(119894 119901120579) according to (11)

(28) End if(29) If 119891(119894 119901

120579) ge 119879(119901

120579) then

(30) 119875119879119894larr positive level of 119901

120579

(31) End if(32) Else if 119891(119894 119901

120579) lt 119879(119901

120579) then

(33) PT119894larrnegative level of 119901

120579

(34) End if(35) End while(36) 119891 larr 119891119886119897119904119890

(37) For each personality trait 119901120579isin PT119894do

(38) calculate 119901120579rsquos error rate Err(119901

120579) according to (12)

(39) If |Err(119901120579) minus Temp(119901

120579)| gt 120575 then 120575 denotes a constant which is small enough

(40) 119891 larr 119905119903119906119890

(41) Temp(119901120579) larr Err(119901

120579)

(42) update 119901120579rsquos threshold 119879(119901

120579) according to (13)

(43) UPlarr 119901120579

(44) End if(45) End for(46) If 119891 == 119905119903119906119890 then(47) Go to step (21)(48) End if(49) Else if 119891 == 119891119886119897119904119890 then(50) Go to step (52)(51) End if(52) Return PT

119894

Algorithm 1 Adaptive conditional probability-based predicting model for userrsquos personality traits

8 Mathematical Problems in Engineering

of data With consent usersrsquo psychological and Facebookprofiles are recorded for research purposes Currently thedatabase contains more than 6000000 test results togetherwith more than 4000000 individual Facebook profilesIn this paper we adopted the dataset that was provided inthe Workshop on Computational Personality Recognition(Shared Task) (httpmypersonalityorgwikilibexefetchphpmedia=wikimypersonality finalzip) [35] The datasetwas a subset of myPersonality database including 250users which had both information about personality traits(annotated with usersrsquo personality scores and gold standardpersonality labelswhichwere self-assessments obtained usinga 100-item long version of the IPIP personality questionnaire(httpipiporiorgnewFinding Labeling IPIP Scaleshtm))and social network structure (network size betweennesscentrality density brokerage and transitivity) along withtheir 9900 status updates in raw text With the aid of thisstandard dataset we can compare the performance of ourproposed method with othersrsquo personality recognitionsystems on a common benchmark

42 Analysis of Correlations between Personality Traits Pre-vious works were usually predicting userrsquos personality traitswithout considering interdependencies between them In thiscontext we investigatedwhether there had been relationshipsbetween userrsquos personality traits Since not all users had thesame level of interactions consequently we first groupedusers according to different degrees of userrsquos personalitytraits As an example if a user has positive level of agree-ableness another user has positive level of agreeableness aswell then they will be divided into a group else if a user haspositive level of agreeableness another user has negative levelof agreeableness and they will not be divided into a groupThen we explored cooccurrences between personality traitsin each group simultaneously which is shown in Figure 2

It can be observed intuitively that a significant proportionof users have negative level of extraversion positive levelof openness or positive level of agreeableness It is alsonoteworthy that positive level of extraversion has less overlapwith negative level of openness negative level of consci-entiousness and negative level of agreeableness Besidespositive level of neuroticism has less overlap with positivelevel of conscientiousness positive level of agreeableness andnegative level of openness Furthermore there are bits ofusers that have positive level of extraversion and positive levelof neuroticism simultaneously

However the above cooccurrences may be due to the apriori statistics of each trait For instance if there are moreusers with negative level of neuroticism positive level ofopenness and positive level of agreeableness than othersit is more likely to have more users with cooccurrencebetween negative level of neuroticism and positive level ofopenness negative level of neuroticism and positive level ofagreeableness and positive level of openness and positivelevel of agreeableness than others as it is shown in Figure 2Therefore since users were grouped according to differentdegrees of userrsquos personality traits in order to investigatewhether the above phenomenon meant that there werepositive or negative correlations between different degrees

of personality traits in each group we treated scores ofa certain personality trait according to which users weredivided into the group and scores of another personalitytrait with a certain degree as two random variables followedby employing Kendall correlation coefficient which wascalculated in (3) to analyze real correlations between themThe results are presented in Table 1

As it can be seen from Table 1 we can observe anegative correlation between negative level of extraversionand positive level of neuroticism in other words people thatscore high on extraversion have less possibility to score highon neuroticism as well In psychology a person with negativelevel of extraversion can be explained as solitary and a personwith negative level of conscientiousnessmeans being carelessConsequently negative level of extraversion and negativelevel of conscientiousness are not in line with positive level ofneuroticism which tends to experience unpleasant emotionseasily Another factor is social desirability Moreover agree-ableness extraversion conscientiousness and openness aremore or less ldquodesirablerdquo whereas neuroticism is quite clearlynegative Thus generally there are negative correlationsbetween neuroticism and other personality traits

What is more it is inconsistent with Figure 2 that thereis a positive correlation between positive level of neuroticismand positive level of extraversion It may be explained thata person with positive level of extraversion tends to beenthusiastic and talkative which is compatible with positivelevel of neuroticism In addition in line with Figure 2since negative level of openness shows cautiousness it hasa positive correlation with negative level of extraversionand positive level of agreeableness which is a tendency tobe compassionate However there is a positive correlationbetween negative level of openness and positive level ofneuroticism which is not in keeping with Figure 2

Since there are some contradictions between Figure 2and Table 1 we leveraged Jensen-Shannon divergence [45]to further analyze correlations between userrsquos personalitytraits Jensen-Shannon divergence between two probabilitydistributions is calculated as

119863JS (119875 119876) =12(119863KL (119875 119877) +119863KL (119876 119877)) (14)

where 119875 and 119876 denote two probability distributions 119877denotes an average distribution of 119875 and 119876 and 119863KL(119875 119877)denotes Kullback-Leibler divergence [46] between 119875 and 119877which is calculated with

119863KL (119875 119877) = sum119894

119875 (119894) log 119875 (119894)119877 (119894)

(15)

where 119875(119894) and 119877(119894) denote the 119894th value of 119875 and 119877 Theresults are presented in Table 2

In keeping with Figure 2 and Table 1 Jensen-Shannondivergences between negative level of neuroticism and posi-tive level of agreeableness positive level of conscientiousnessand positive level of openness are larger It is because ofthe fact that a person with negative level of neuroticismshows confidence on everything and positive level of open-ness reflects degree of intellectual curiosity creativity and

Mathematical Problems in Engineering 9

Table 1 Kendall correlation coefficients between personality traits 119901 and 119899 stand for positive level and negative level of a certain personalitytrait The bold number is the highest Kendall correlation coefficients and the italic number is the lowest Kendall correlation coefficients Thebigger Kendall correlation coefficient is the better the consistency is

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 022 minus003 009 minus015 minus006 minus013 007 001119899 minus023 minus018 004 004 004 018 007 022

NEU 119901 minus009 minus017 010 minus021 minus003 027119899 minus014 minus013 002 minus001 minus006 minus010

AGR 119901 minus007 007 016 029119899 minus008 016 006 002

CON 119901 minus001 minus010119899 007 minus007

OPN 119901

119899

PEXT and PNEUPEXT and PAGR

PEXT and PCONPEXT and POPNPEXT and NNEUPEXT and NAGRPEXT and NCONPEXT and NOPNPNEU and PAGR

PNEU and PCONPNEU and POPNPNEU and NEXTPNEU and NAGRPNEU and NCONPNEU and NOPNPAGR and PCONPAGR and POPNPAGR and NEXTPAGR and NNEUPAGR and NCONPAGR and NOPNPCON and POPNPCON and NEXTPCON and NNEUPCON and NAGRPCON and NOPNPOPN and NEXTPOPN and NNEUPOPN and NAGRPOPN and NCONNEXT and NNEUNEXT and NAGRNEXT and NCONNEXT and NOPNNNEU and NAGRNNEU and NCONNNEU and NOPNNAGR and NCONNAGR and NOPNNCON and NOPN

Proportion ()

000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000()

Figure 2The proportion of users who have a certain pair of personality traits simultaneously PEXT PNEU PAGR PCON and POPN standfor positive level of extraversion neuroticism agreeableness conscientiousness and openness NEXT NNEU NAGR NCON and NOPNstand for negative level of extraversion neuroticism agreeableness conscientiousness and openness

10 Mathematical Problems in Engineering

Table 2 Jensen-Shannon divergence between personality traits 119901and 119899 stand for positive level and negative level of a certain personal-ity traitThe bold number is the highest Jensen-Shannon divergenceand the italic number is the lowest Jensen-Shannon divergenceSmaller Jensen-Shannon divergence means better consistency

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 058 1479 044 219 070 236 073 084119899 479 449 521 236 561 191 1043 211

NEU 119901 080 212 110 246 298 040119899 1978 294 1697 305 2473 361

AGR 119901 077 329 082 091119899 340 086 463 125

CON 119901 115 117119899 616 140

OPN 119901

119899

a preference for novelty and variety a person has alongwith positive level of agreeableness and positive level ofconscientiousness which are compatible Furthermore it isin conformity with Table 1 that positive level of extraversionhas smaller Jensen-Shannon divergences with positive levelof neuroticism and positive level of agreeableness besidespositive level of neuroticism has a smaller Jensen-Shannondivergence with negative level of openness

In addition Soto et al [29] demonstrated convergent anddiscriminate correlations for the Big Five Inventory Soto andJohn [30] also illustrated that there was strong convergencebetween each BFI facet scale and corresponding NEO PI-Rfacet Moreover Park et al [28] demonstrated convergencewith self-reports of personality at the domain- and facet-levelIn summary although utilizing different Big Five measuresit can be found that correlations in myPersonality datasetare also in keeping with the findings from other samplesHence correlations between personality traits are expectedand actually prove that myPersonality results are validSo leveraging prior knowledge about relationships betweenpersonality traits may contribute to a better prediction onpersonality traits The results provide a better theoreticalsupport for our proposed method

43 Evaluation Metric Since most researchers exploringon personality recognition leveraged different measures toevaluate their experiments on various datasets it was hard tocompletely appraise their performance and quality Howevera common dataset which was annotated with gold standardpersonality labels was available in the Workshop on Com-putational Personality Recognition (Shared Task) users werefree to split the training and test sets as they wish andprecision recall and 1198651-measure were suggested to be usedto evaluate results of predictions So in this paper for morereliable results the proposed method was also evaluated withstratified 5-fold cross-validation Since there was no trainingset in unsupervised learning the common dataset was splitinto folds and in order to avoid overfitting the same author

Table 3 Precision recall and 1198651-measure of adaptive conditionalprobability-based predicting model (in bold best performance)

Precision Recall 1198651-measureEXT 093 076 083NEU 086 067 075AGR 096 074 083CON 089 070 078OPN 091 079 084Mean 091 0732 0806

did not appear twofold at the same timeThemean precisionrecall and 1198651-measure are calculated as follows

Precision = tptp + fp

Recall = tptp + fn

1198651-measure = 2 times 119875 times 119877119875 + 119877

(16)

where tp stands for the number of positive samples thatare classified correctly fp stands for the number of positivesamples that are classified incorrectly and fn stands for thenumber of negative samples that are classified incorrectly

44 Analysis of Impacts of Different Factors First we carriedon experiments with our proposed method Scores of preci-sion recall and 1198651-measure are shown in Table 3 includingmean scores of all personality traits

441 Impacts of Different Categories of Feature Sets In thissection we conducted experiments on social network fea-tures linguistic features emotional statistical features topicfeatures and all features respectively Due to space restric-tions Figure 3 only illustrates 1198651-measure of all personalitytraits along with mean 1198651-measure of them

It can be observed that experiments which are conductedon linguistic features emotional statistical features and topicfeatures result in unsatisfactory performance with respectto social network features and social network features havethe best classification performance for extraversion which isconsistent with the conclusion in [33] Since social networkfeatures reflect a userrsquos pattern of social status and habitswhich are relatively important decisive factors in userrsquospersonality traits as a result they provide more informationthan other types of features Nevertheless whether more orless each kind of characteristic provides unique informationin userrsquos personality traits prediction Therefore carryingon experiment with combination of social network featureslinguistic features emotional statistical features and topicfeatures can achieve better performance

442 Impact of Weighted Features Additionally we con-ducted experiments on unweighted features Here we onlypresented the most successful results in Table 4 includingmean scores of all personality traits

Mathematical Problems in Engineering 11

1

featuresEssential

featuresLinguistic

featuresEmotional

featuresTopic

featuresAll

Categories of features

EXTNEUAGR

CONOPNMEAN

09

08

07

06

05

04

03

F-s

core

s

Figure 3 1198651-measures of five personality traits along with mean1198651-measures of them with different categories of feature sets

Table 4 Precision recall and 1198651-measure of predictingmodel withunweighted features (in bold best performance)

Precision Recall 1198651-measureEXT 085 076 082NEU 092 068 076AGR 094 064 076CON 091 057 068OPN 087 078 083Mean 090 0686 077

Table 5 Precision recall and 1198651-measure of predictingmodel withunified thresholds (in bold best performance)

Precision Recall 1198651-measureEXT 062 054 057NEU 080 016 025AGR 057 073 063CON 057 077 060OPN 070 100 082Mean 0652 064 0574

From Tables 3 and 4 we can observe that the best resultsfor myPersonality dataset are achieved after distributingweights to features It illustrates that considering correlationsbetween features and userrsquos personality traits can bring per-formance gain to the prediction task since there are limitedfeatures that may remain useful for userrsquos personality traitsprediction

443 Impact of Dynamic Updated Thresholds of PersonalityTraits In addition we predicted userrsquos personality traits withunified thresholds as well The results are summarized inTable 5

Multilabel learning task is to predict one or more cat-egories for each instance In the existing algorithms suchas PT5 method proposed by Tsoumakas and Katakis [34]

Table 6 Precision of different methods on Facebook dataset Thebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 093 086 096 089 091 0910Verhoevenet al [42] SVM 079 071 067 072 087 0752

Farnadi etal [33]

SVM kNNNB 058 054 050 055 060 0554

Alam et al[43]

SVM BLRmNB 058 059 059 059 060 0590

Tomlinsonet al [44] LR NA NA NA NA NA NA

generally thresholdswere unified to the same valueHoweverall categories employing a unified minimum threshold arenot appropriate If threshold is set too high not all categoriestags will be predicted on the other hand if threshold is settoo low too many classes will be predicted in results Hencein this paper we updated thresholds of personality traitsdynamically according to error rates of them Comparedto Table 3 based on thresholds obtained through updatingdynamically our method can achieve better results

45 Performance EvaluationwithDifferentMethods Asmen-tioned in Section 41 in the Workshop on ComputationalPersonality Recognition (Shared Task) different systems forpersonality recognition from text and network features ona common benchmark have been compared Verhoeven etal [42] proposed a meta-learning approach which can beextended to certain component classifiers from other genreswith other class systems or even from other languagesFarnadi et al [33] leveraged SVM 119896 nearest neighbors(kNN) [17] and Naıve Bayes (NB) for automatic recognitionof personality traits from usersrsquo Facebook statues updatesrespectively even with a small set of training dataset it couldachieve better results than most baseline algorithms On thebasis of a set of features extracted from Facebook datasetAlam et al [43] explored suitability and performance ofseveral classification techniques which were SVM BayesianLogistic Regression (BLR) [47] andMultinomial Naıve Bayes(mNB) [48] Tomlinson et al [44] used ranking algorithmsfor feature selection and Logistic Regression (LR) [49] aslearning algorithms which achieved a high performance Inthis paper we tested our proposed method with the sameset than the participants in the Workshop on ComputationalPersonality Recognition (Shared Task) In terms of precisionrecall and 1198651-measure a comparison of the works describedabove as well as our proposed method (denoted by CP) issummarized in Tables 6 7 and 8

From Tables 6 7 and 8 the experimental results indicatethat openness is the easiest personality trait to be predictedby Facebook features with the above methods because thereare more users with positive level of openness than otherpersonality traits in Facebook dataset Since Verhoeven et al[42] trained an ensemble SVM model based on both Face-book and Essays personality traits dataset as a consequence

12 Mathematical Problems in Engineering

Table 7 Recall of different methods on Facebook dataset The bestperformance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 076 067 074 070 079 0732Verhoevenet al [42] SVM 079 072 068 072 087 0756

Farnadi etal [33]

SVMkNN NB 061 053 050 054 070 0576

Alam et al[43]

SVMBLR mNB 058 058 059 059 060 0588

Tomlinsonet al [44] LR NA NA NA NA NA NA

Table 8 1198651-measure of different methods on Facebook datasetThebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 083 075 083 078 084 0806Verhoevenet al [42] SVM 079 070 067 072 086 0748

Farnadi etal [33]

SVMkNN NB 056 052 050 054 061 0546

Alam et al[43]

SVMBLR mNB 058 058 058 059 060 0586

Tomlinsonet al [44] LR NA NA NA NA NA 0630

when tested on the same Facebook dataset it can achievebetter results than our proposed method Furthermorewhen trained and tested on the same Facebook dataset ourmethod outperforms the methods in [33 43 44] which donot take weighted features dynamic updated thresholds ofpersonality traits and correlations between personality traitsinto consideration

5 Conclusion

In this paper we studied the problem of exploiting interde-pendencies between userrsquos personality traits for predictinguserrsquos personality traits First after analyzing importance offeatures we conducted experiments on Facebook datasetto demonstrate the existence of correlations between userrsquospersonality traits Bearing this in mind an unsupervisedframework adaptive conditional probability-based predict-ing model was then proposed to predict userrsquos Big Fivepersonality traits based on importance of features dynamicupdated thresholds of personality traits and prior knowledgeabout correlations between personality traits Furthermorewe compared our results with the ones achieved by othersin the Workshop on Computational Personality Recognition(Shared Task) on the same dataset In general the experimen-tal results demonstrated the effectiveness of our proposedframework

In future work we will speculate on what directions canbe undertaken to ameliorate its performance with respect

to time complexity so as to better apply it to a big dataenvironment as in Facebook monitoring Besides in orderto make our framework applicable to dynamic networksbetter we will explore combining time series analysis withpersonality traits predicting algorithm to capture dynamicevolution process of information and network structureFurthermore since social network features (described inSection 311) are specific for our dataset they may be difficult(or even impossible) to obtain in another dataset hencewe will conduct experiments on other datasets such as theTweeter corpus of PAN shared task (httppanwebisde)on author profiling where also personality is considered tofurther validate the effectiveness of the proposed frameworkbased on different features

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China under Grant no 61300148 the Scientificand Technological Break-Through Program of Jilin ProvinceunderGrant no 20130206051GX the Science andTechnologyDevelopment Program of Jilin Province under Grant no20130522112JH the Science Foundation for China Postdoc-tor under Grant no 2012M510879 and the Basic Scien-tific Research Foundation for the Interdisciplinary Researchand Innovation Project of Jilin University under Grant no201103129

References

[1] M Oussalah F Bhat K Challis and T Schnier ldquoA softwarearchitecture for Twitter collection search and geolocationservicesrdquo Knowledge-Based Systems vol 37 pp 105ndash120 2013

[2] R R McCrae and P T Costa ldquoValidation of the 5-factor modelof personality across instruments and observersrdquo Journal ofPersonality and Social Psychology vol 52 no 1 pp 81ndash90 1987

[3] G Saucier and L R Goldberg ldquoThe structure of personalityattributesrdquo in Personality and Work Reconsidering the Role ofPersonality in Organizations M R Barrick and A M RyanEds pp 1ndash29 Jossey-Bass 2003

[4] B Caci M Cardaci M E Tabacchi and F Scrima ldquoPersonalityvariables as predictors of facebook usagerdquoPsychological Reportsvol 114 no 2 pp 528ndash539 2014

[5] M Kosinski Y Bachrach P Kohli D Stillwell and T GraepelldquoManifestations of user personality in website choice andbehaviour on online social networksrdquoMachine Learning vol 95no 3 pp 357ndash380 2014

[6] M-G Cojocaru H Thille E Thommes D Nelson and SGreenhalgh ldquoSocial influence and dynamic demand for newproductsrdquoEnvironmentalModellingamp Software vol 50 pp 169ndash185 2013

[7] F Durupinar N Pelechano J Allbeck U Gudukbay andN Badler ldquoThe impact of the OCEAN personality modelon the perception of crowdsrdquo IEEE Computer Graphics andApplications vol 31 no 3 pp 22ndash31 2011

Mathematical Problems in Engineering 13

[8] X Feng ldquoStudy on customer personality characteristics andrelationship outcomes using SEM analysisrdquo in Proceedings ofthe International Conference on Mechatronics and InformationTechnology pp 841ndash844 2013

[9] J G Lee SMoon andK Salamatian ldquoModeling and predictingthe popularity of online contents with Cox proportional hazardregression modelrdquo Neurocomputing vol 76 no 1 pp 134ndash1452012

[10] J W Stoughton L F Thompson and A W Meade ldquoBigfive personality traits reflected in job applicantsrsquo social mediapostingsrdquo Cyberpsychology Behavior and Social Networkingvol 16 no 11 pp 800ndash805 2013

[11] D A Cobb-Clark and S Schurer ldquoThe stability of big-fivepersonality traitsrdquo Economics Letters vol 115 no 1 pp 11ndash152012

[12] W Mischel ldquoToward an integrative science of the personrdquoAnnual Review of Psychology vol 55 pp 1ndash22 2004

[13] S Argamon S Dhawle M Koppel and J Pennebaker ldquoLexicalpredictors 19 of personality typerdquo in Proceedings of the JointAnnual Meeting of the Interface and the Classification Society ofNorth America 2005

[14] F Mairesse M AWalker M R Mehl and R K Moore ldquoUsinglinguistic cues for the automatic recognition of personality inconversation and textrdquo Journal of Artificial Intelligence Researchvol 30 pp 457ndash500 2007

[15] K Moore and J C McElroy ldquoThe influence of personalityon Facebook usage wall postings and regretrdquo Computers inHuman Behavior vol 28 no 1 pp 267ndash274 2012

[16] J W Pennebaker and L A King ldquoLinguistic styles languageuse as an individual differencerdquo Journal of Personality and SocialPsychology vol 77 no 6 pp 1296ndash1312 1999

[17] J Han and M Kamber Data Mining Concepts and TechniquesThe Morgan Kaufmann Series in Data Management Systemsedited by J Gray Morgan Kaufmann Boston Mass USA 3rdedition 2011

[18] J Golbeck C Robles and K Turner ldquoPredicting personalitywith social mediardquo in Proceedings of the 29th Annual CHIConference on Human Factors in Computing Systems (CHI rsquo11)pp 253ndash262 ACM May 2011

[19] J RQuinlan ldquoLearningwith continuous classesrdquo inProceedingsof the 5thAustralian Joint Conference onArtificial Intelligence (Airsquo92) pp 343ndash348 Hobart Australia November 1992

[20] S Roweis and Z Ghahramani ldquoA unifying review of lineargaussian modelsrdquo Neural Computation vol 11 no 2 pp 305ndash345 1999

[21] S Bai T Zhu and L Cheng ldquoBig-five personality pre-diction based on user behaviors at social network sitesrdquohttparxivorgpdf12044809v1pdf

[22] J Oberlander and S Nowson ldquoWhose thumb is it anywayclassifying author personality from weblog textrdquo in Proceedingsof the 44th Annual Meeting of the Association for ComputationalLinguistics pp 627ndash634 2006

[23] T Nguyen D Q Phung B Adams and S Venkatesh ldquoTowardsdiscovery of influence and personality traits through social linkpredictionrdquo in Proceedings of the International Conference onWeblogs and Social Media pp 566ndash569 Barcelona Spain July2011

[24] S T Bai B B Hao A Li S Yuan R Gao and T S ZhuldquoPredicting big five personality traits of microblog usersrdquo inProceedings of the 12th IEEEWICACM International JointConferences on Web Intelligence and Intelligent Agent Technolog(WI rsquo13) pp 501ndash508 November 2013

[25] R Sun and N Wilson ldquoA model of personality should be acognitive architecture itselfrdquo Cognitive Systems Research vol29-30 no 1 pp 1ndash30 2014

[26] A Ortigosa R M Carro and J I Quiroga ldquoPredicting userpersonality by mining social interactions in Facebookrdquo Journalof Computer and System Sciences vol 80 no 1 pp 57ndash71 2014

[27] O P John L P Naumann and C J Soto ldquoParadigm shift tothe integrative Big Five trait taxonomy history measurementand conceptualissuesrdquo in Handbook of Personality Theory andResearch O P John R W Robins and L A Pervin Eds pp114ndash158 Guilford Press New York NY USA 3rd edition 2008

[28] G Park H A Schwartz J C Eichstaedt et al ldquoAutomaticpersonality assessment through social media languagerdquo Journalof Personality and Social Psychology vol 108 no 6 pp 934ndash9522015

[29] C J Soto O P John S D Gosling and J Potter ldquoAge differencesin personality traits from 10 to 65 big five domains and facets ina large cross-sectional samplerdquo Journal of Personality and SocialPsychology vol 100 no 2 pp 330ndash348 2011

[30] C J Soto and O P John ldquoTen facet scales for the BigFive Inventory convergence with NEO PI-R facets self-peeragreement and discriminant validityrdquo Journal of Research inPersonality vol 43 no 1 pp 84ndash90 2009

[31] M V Kosti R Feldt and L Angelis ldquoPersonality emotionalintelligence and work preferences in software engineering anempirical studyrdquo Information and Software Technology vol 56no 8 pp 973ndash990 2014

[32] H A Schwartz J C Eichstaedt M L Kern et al ldquoPersonalitygender and age in the language of social media the open-vocabulary approachrdquo PLoS ONE vol 8 no 9 Article IDe73791 2013

[33] G Farnadi S ZoghbiM FMoens andMD Cock ldquoRecognis-ing personality traits using facebook status updatesrdquo in Proceed-ings of the Workshop on Computational Personality RecognitionJuly 2013 httpcliccimecunitnitfabiowcpr13farnadi wcpr-13pdf

[34] G Tsoumakas and I Katakis ldquoMulti-label classification anoverviewrdquo International Journal of Data Warehousing and Min-ing vol 3 no 3 pp 1ndash13 2007

[35] M Kosinski D Stillwell and T Graepel ldquoPrivate traits andattributes are predictable from digital records of human behav-iorrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 110 no 15 pp 5802ndash5805 2013

[36] M Belkin and P Niyogi ldquoLaplacian eigenmaps for dimension-ality reduction and data representationrdquo Neural Computationvol 15 no 6 pp 1373ndash1396 2003

[37] R Socher J Bauer C D Manning and Y N Andrew ldquoParsingwith compositional vector grammarsrdquo in Proceedings of the 51stAnnual Meeting of the Association for Computational Linguistics(ACL rsquo13) pp 455ndash465 Sofia Bulgaria August 2013

[38] K Kazama M Imada and K Kashiwagi ldquoCharacteristics ofinformation diffusion in blogs in relation to information sourcetyperdquo Neurocomputing vol 76 no 1 pp 84ndash92 2012

[39] M B David Y N Andrew and I J Michael ldquoLatent Dirichletallocationrdquo Journal ofMachine Learning Research vol 3 no 4-5pp 993ndash1022 2003

[40] P T Costa andR RMcCraeRevisedNEOPersonality Inventory(NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Profes-sional Manual Psychological Assessment Resources 1992

[41] R L Atkinson C A Richard E S Edward J B Daryland S Nolen-Hoeksema Hilgardrsquos Introduction to Psychology

14 Mathematical Problems in Engineering

Harcourt College Publishers Orlando Fla USA 13th edition2000

[42] B Verhoeven W Daelemans and T D Smedt ldquoEnsemblemethods for personality recognitionrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 35ndash38Boston Mass USA July 2013

[43] F Alam A Stepanov and G Riccardi ldquoPersonality traitsrecognition on social networkmdashfacebookrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 6ndash9Boston Mass USA July 2013

[44] M T Tomlinson D Hinote and D B Bracewell ldquoPredictingconscientiousness through semantic analysis of facebook postsrdquoin Proceedings of the Workshop on Computational PersonalityRecognition pp 31ndash34 July 2013

[45] D M Endres and J E Schindelin ldquoA newmetric for probabilitydistributionsrdquo IEEETransactions on InformationTheory vol 49no 7 pp 1858ndash1860 2003

[46] S Kullback and R A Leibler ldquoOn information and sufficiencyrdquoThe Annals of Mathematical Statistics vol 22 no 1 pp 79ndash861951

[47] A Genkin D D Lewis and D Madigan ldquoLarge-scale Bayesianlogistic regression for text categorizationrdquo Technometrics vol49 no 3 pp 291ndash304 2007

[48] A K McCallum and K Nigam ldquoA comparison of event modelsfor naive bayes text classificationrdquo in Proceedings of the AAAI-98 Workshop on Learning for Text Categorization pp 137ndash142Madison Wis USA July 1998

[49] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article A Novel Adaptive Conditional Probability ...downloads.hindawi.com/journals/mpe/2015/472917.pdf · Research Article A Novel Adaptive Conditional Probability-Based

4 Mathematical Problems in Engineering

period exclamation and question Nevertheless we filteredhyperlinks in usersrsquo contents as blog services tended to pro-duce many hyperlinks for navigation and advertisement [38]which had no relationship with personality traits prediction

313 Emotional Statistical Features Userrsquos attitudes towardsthings which show userrsquos different personality traits reflectuserrsquos unique pattern of long-term emotions For instancea neurotic person may have a tendency to experienceunpleasant emotions easily such as anger anxiety depressionand vulnerability Hence userrsquos statistics of emotion can becharacteristics in userrsquos personality traits predicting modelIn this paper userrsquos emotional statistical characteristicsincluded proportion of positive words and negative wordsused in userrsquos posts On the basis of adjectives and theirvariants obtained in Section 312 userrsquos emotional statisticalcharacteristics were calculated with the corpus of HowNetKnowledge (httpwwwkeenagecomdownloadsentimentrar) HowNet Knowledge which includes 8945 words andphrases consists of six files positive emotional words list filenegative emotional words list file positive review words listfile negative review words list file degree level words list fileand proposition words list file

As userrsquos emotional statistical features userrsquos positive andnegative emotional statistical characteristics are defined as

PT (119894) =119901119899119894

sum119894

NT (119894) =119899119899119894

sum119894

(1)

where PT(119894) andNT(119894) represent proportion of positivewordsand negative words used in user 119894rsquos posts respectively119901119899

119894and

119899119899119894represent the number of positive emotional words and

the number of negative emotional words user 119894 used whichare included in HowNet Knowledge and sum

119894represents the

total number of words in user 119894rsquos contents

314 Topic Features The things user focuses on may have animpact on actions that user has taken Take openness which isone of the Big Five personality traits as an example it reflectsdegree of intellectual curiosity creativity and a preference fornovelty and variety a person hasTherefore wemined a seriesof userrsquos concerned themes from userrsquos status via LDA (LatentDirichlet Allocation) [39] for predicting userrsquos personalitytraits

Since our purpose is to extract all concerned themes of auser rather than to extract specific themes of each post wemerged all microblogs of a user into one document and thenextracted userrsquos concerned themes namely each documentcorresponded to a user The results of LDA model are shownas follows

DT =[[[[[[

[

DT11 DT12 sdot sdot sdot DT1119899

DT21 DT21 sdot sdot sdot DT2119899

d

DT1198981 DT

1198982 sdot sdot sdot DT119898119899

]]]]]]

]

(2)

whereDT stands for an119898times119899matrix which ismainly used forstorage of distribution of themes and documents 119898 standsfor the number of documents 119899 stands for the number ofthemes and element DT

119894119895(119894 = 1 2 119898 and 119895 = 1 2 119899)

in matrix DT stands for probability that the 119894th documentbelongs to the 119895th theme that is degree of attention whichuser 119894 pays to the 119895th theme

32 Weight Distribution of Features Every feature has adifferent impact on userrsquos personality traits prediction It isof great importance to allocate featuresrsquo weights reasonablyso as to be able to perform good prediction based onlyon scant knowledge of personality traits Kendall test is anonparametric hypothesis test which calculates correlationcoefficient to test statistical dependence of two randomvariables Since values of features and scores of personalitytraits in the dataset we used were numeric therefore inorder to quantify importance of each feature we analyzedrelevance between userrsquos personality traits and characteristicsvia Kendall correlation coefficient in which values of featuresand scores of personality traits were treated as two randomvariables Kendall correlation coefficient is calculated as

120591 (119883 119884) =119862 minus 119863

radic(1198733 minus 1198731) times (1198733 minus 1198732) (3)

where 120591(119883 119884) isin [minus1 1] and if random variables 119883 and 119884have a positive correlation then 120591(119883 119884) = 1 else if randomvariables119883 and 119884 have a negative correlation then 120591(119883 119884) =minus1 else if random variables119883 and 119884 are independent of eachother then 120591(119883 119884) = 0 119862 represents the number of tupleswhose two random variables have consistency 119863 representsthe number of tupleswhose two randomvariables donot haveconsistency and 119873

1and 119873

2represent the total number of

repeated elements in random variables119883 and 119884 respectivelyThe calculation of1198731 is shown as

1198731 =119904

sum

119894=1

12119880119894(119880119894minus 1) (4)

where 119904 denotes the number of elements that have repeatedelements in random variable 119883 and 119880

119894denotes the number

of repeated elements of the 119894th element Similarly 1198732is

calculated as

1198732 =1199041015840

sum

119895=1

12119880119895(119880119895minus 1) (5)

where 1199041015840 denotes the number of elements that have repeatedelements in random variable 119884 and 119880

119895denotes the number

of repeated elements of the 119895th element1198733denotes the total

number of merged sequences which is calculated as

1198733 =12119873(119873minus 1) (6)

Mathematical Problems in Engineering 5

Facebook

Social networkfeatures set

Userrsquosstatus set

Analyze correlations

betweenfeatures

and personality

traits

Linguistic features set

Emotional statistical features set

Topic features set

Assign weights

tofeatures

neighbors set

Analyzecorrelations

betweenpersonality

traits

Initializethresholds of personality

traits

Userrsquos personality traits set

Updatethresholds of personality traits

Calculateerror rates of personality traits

Predict userrsquos personality traits set

Minimum error rates

No

Yes

Get k nearest

Figure 1 The architecture of adaptive conditional probability-based predicting model for userrsquos personality traits

where119873 represents dimensions of tuples Thus according toKendall correlation coefficient between features and person-ality traits importance of the 119894th feature 119891

119894is calculated as

119868 (119891119894) =

sum119901119895isin119875120591 (119891119894 119901119895)

5

(7)

where 120591(119891119894 119901119895) stands for Kendall correlation coefficient

between the 119894th feature 119891119894and the 119895th personality trait 119901

119895and

119875 stands for set of Big Five personality traits which will beintroduced in Section 34 After normalizing importance ofthe 119894th feature 119891

119894 weight of 119891

119894is calculated as follows

119882(119891119894) =

119868 (119891119894)

sum119891119895isin119865119895 =119894

119868 (119891119895)

(8)

where 119865 denotes set of personality traits predicting features

33 A Framework of Predicting Userrsquos Personality TraitsFigure 1 shows the architecture of adaptive conditionalprobability-based predicting model for userrsquos personalitytraits An analysis of correlations between userrsquos personalitytraits and features is conducted before assigning weights tolinguistic emotional statistical and topic features which are

extracted from contents that user produces as well as socialnetwork features Meanwhile correlation analysis of userrsquospersonality traits is carried on Then according to weightsof features 119896 nearest neighbors set is obtained Finallygiven 119896 nearest neighbors set dynamic updated thresholds ofpersonality traits and correlations between personality traitsuserrsquos personality traits set is predicted

34 Algorithm of Adaptive Conditional Probability-BasedUserrsquos Personality Traits Prediction Before we proposed ourpredicting algorithm we conducted experiments to analyzecorrelations between userrsquos personality traits which will beshown in detail later in Section 42 And the results ofcorrelation analysis revealed that there were interdependentrelationships between userrsquos personality traits Thus givendynamic updated thresholds of personality traits as wellas considering interdependencies between personality traitsa novel adaptive correlation-based predicting model wasproposed

In psychology the Big Five personality traits are fivebroad domains or dimensions of personality traits that areused to describe human personality The theory based onthe Big Five factors is called Five Factor Model (FFM) [40]The Big Five factors are extraversion (denoted by EXT)

6 Mathematical Problems in Engineering

neuroticism (denoted by NEU) agreeableness (denoted byAGR) conscientiousness (denoted by CON) and openness(denoted by OPN)

Our method aimed to predict user 119894rsquos Big Five personalitytraits set 119875119879

119894 As stated in [41] here each personality trait

was classified into two degrees namely extraversion wasmapped onto extravert and shy neuroticism was mappedonto neurotic and secure agreeableness was mapped ontofriendly and uncooperative conscientiousness was mappedonto precise and careless and openness was mapped ontoinsightful and unimaginative For convenience we denotedthe two degrees of each personality trait by positive leveland negative level in short respectively First we set initialthreshold of each personality trait as 05 and threshold ofthe 120579th personality trait 119901

120579is denoted by 119879(119901

120579) followed by

calculating distance between user 119894 and other users with (9)as follows

Dis (119894 119895) =119898

sum

ℎ=1119882(119891ℎ) times1003816100381610038161003816119865 (119894 119891ℎ) minus 119865 (119895 119891ℎ)

1003816100381610038161003816 (9)

where Dis(119894 119895) presents distance between user 119894 and user 119895119865(119894 119891

ℎ) and 119865(119895 119891

ℎ) present the ℎth feature 119891

ℎrsquos value of user

119894 and user 119895 and119898 presents the number of featuresSecondly we sorted user 119894rsquos distance set in ascending

order and selected top 119896 users as user 119894rsquos neighbors (denotedas119873(119894)) Then for each 119901

120579 we recorded the number of users

who had positive level of 119901120579in119873(119894) as 119899

119901120579

Thirdly we selected a 119901

120579from unvisited personality traits

set UP if user 119894rsquos personality traits set PT119894was empty and then

we calculated probability that user 119894 had positive level of 119901120579as

follows

119891 (119894 119901120579) =

119875 (119867119901120579

| 119862119899119901120579

119901120579

)

119875 (sim 119867119901120579

| 119862119899119901120579

119901120579

)

(10)

where119862119899119901120579

119901120579

presents event that there are exactly119862119899119901120579

119901120579

instanceshaving positive level of 119901

120579in119873(119894) whose 119896 nearest neighbors

also have 119899119901120579

users with positive level of 119901120579 119867119901120579

and sim 119867119901120579

present event that user 119894 has positive level of 119901120579and event that

user 119894 has negative level of 119901120579 respectively And if user 119894rsquos per-

sonality traits set PT119894was not empty considering correlations

between personality traits we calculated probability that user119894 had positive level of 119901

120579on the basis of prior knowledge

about correlations between119901120579and personality traits that have

already been visited in UP as follows

119891 (119894 119901120579) =

119875 (119867119901120579

| 1199011198971 1199011198972 ) 119875 (119862

119899119901120579

119901120579

| 119867119901120579

)

119875 (sim 119867119901120579

| 1199011198971 1199011198972 ) 119875 (119862

119899119901120579

119901120579

| sim 119867119901120579

)

(11)

where 1199011198971 1199011198972 isin PT

119894present personality traits which have

already been visited in UP If 119891(119894 119901120579) was greater than or

equal to 119879(119901120579) then positive level of 119901

120579was added to PT

119894

else negative level of 119901120579was added to PT

119894

And then we calculated error rate of each personalitytrait and error rate of 119901

120579is denoted by Err(119901

120579) which is

calculated as

Err (119901120579) = 1minus

119903120579

to (12)

where 119903120579denotes the number of instances which are classified

correctly to denote the total number of instancesFinally if unvisited personality traits set UP was empty

and error rate of each personality trait fell within a specifiedrange then algorithm was terminated else if unvisited per-sonality traits set UP was not empty then we continued topredict another personality trait in UP else if there was apersonality trait 119901

120579whose error rate did not reach defined

limits then we updated 119879(119901120579) with (12) and added it to

personality traits set UP to predict it again

119879 (119901120579) = 119879 (119901

120579) + 120576 timesErr (119901

120579) (13)

where 120576 denotes a monotonic decreasing learning rate Ouralgorithm can now be defined as Algorithm 1 Assumethat size of dataset is 119879119903 dimension of features is 119889 andsize of the nearest neighbors set is 119896 The complexity ofconditional probability-based predicting model for userrsquospersonality traits is analyzed as follows From step (10) tostep (12) computation time of Kendall correlation coefficientbetween features and personality traits is taking119874(1198791199032) timecalculating weights of features takes 119874(119889) time and thendistributing weights to features can be computed in 119874(119889119879119903)time the total time is119874(1198791199032) as dimension of features 119889 is farless than size of dataset119879119903 Calculating distance between user119894 and other users from step (13) to step (15) will take 119874(119889119879119903)time In step (16) it will take 119874(119879119903log2119879119903) time for sortingset of distances between user 119894 and other users Step (18) tostep (20) take 119874(119896) time to select 119896 nearest neighbors of user119894 If algorithm goes from step (23) to step (25) it will take119874(1198962

) time Else if algorithm goes from step (26) to step (28)it will take 119874(1198791199031198962) time It takes 119874(119879119903) time from step (37)to step (45) Assume that the number of iterations is 119905 Sincesteps (10) (11) and (12) can be done offline and do not needto be calculated repeatedly every time hence from step (13)to step (52) the overall online complexity of our proposedmethod is 119874(119879119903log

2119879119903) as 119896 and 119905 are far less than 119879119903 From

here we see that our proposed method is feasible in a big dataenvironment as in Facebook monitoring

4 Experimental Evaluation

In this section we first describe dataset used in our experi-ments And then we analyze the interdependent relationshipsbetween userrsquos personality traits Finally we conduct experi-ments on different kinds of features and make a comparisonwith other methods based on the same dataset

41 Dataset MyPersonality (httpmypersonalityorg) is apopular Facebook application which is a 336-question testmeasuring the 30 facets underlying the Big Five traits thatallows users to take real psychometric tests The respondentswho come from various age groups backgrounds and cul-tures are highly motivated to answer honestly and carefullyAdditionally usersrsquo psychological and Facebook profilesare ldquoverifiedrdquo by individualsrsquo social circles hence it is hardto lie to the people that know you best Also there is nopressure and one does not have to share hisher profileinformation thus myPersonality avoids deliberate faking

Mathematical Problems in Engineering 7

Input user 119894 users set 119880 size of the nearest neighbors set 119896 unvisited personality traits set UPOutput user 119894rsquos personality traits set PT

119894

(1) 119891 larr 119891119886119897119904119890

lowast 119891 stands for a flag if any of the thresholds of personality traits is updated then 119891 equals to trueotherwise 119891 equals to false lowast

(2) For each personality trait 119901120579isin UP do

(3) set temporary error rate Temp(119901120579) as 0

(4) End for(5) For each personality trait 119901

120579isin UP do

(6) set initial threshold 119879(119901120579) as 05

(7) End for(8) PT

119894larr 0

(9) UPlarr all of the Big Five personality traits(10) For each feature 119891

ℎisin user 119894rsquos feature set do

(11) calculate 119891ℎrsquos weight according to (8)

(12) End for(13) For user 119895 isin 119880 and 119895 = 119894 do(14) calculate Dis(119894 119895) according to (9)(15) End for(16) sort Dis(119894 1)Dis(119894 2) in ascending order(17) 119873(119894) larr select top 119896 users in Dis(119894 1)Dis(119894 2) (18) For each personality trait 119901

120579isin UP do

(19) 119899119901120579

larr the number of users who have positive level of personality trait 119901120579in119873(119894)

(20) End for(21) While UP = 0 do(22) 119901

120579larr UPpoll()

(23) If PT119894= 0 then

(24) calculate 119891(119894 119901120579) according to (10)

(25) End if(26) Else if PT

119894= 0 then

(27) calculate 119891(119894 119901120579) according to (11)

(28) End if(29) If 119891(119894 119901

120579) ge 119879(119901

120579) then

(30) 119875119879119894larr positive level of 119901

120579

(31) End if(32) Else if 119891(119894 119901

120579) lt 119879(119901

120579) then

(33) PT119894larrnegative level of 119901

120579

(34) End if(35) End while(36) 119891 larr 119891119886119897119904119890

(37) For each personality trait 119901120579isin PT119894do

(38) calculate 119901120579rsquos error rate Err(119901

120579) according to (12)

(39) If |Err(119901120579) minus Temp(119901

120579)| gt 120575 then 120575 denotes a constant which is small enough

(40) 119891 larr 119905119903119906119890

(41) Temp(119901120579) larr Err(119901

120579)

(42) update 119901120579rsquos threshold 119879(119901

120579) according to (13)

(43) UPlarr 119901120579

(44) End if(45) End for(46) If 119891 == 119905119903119906119890 then(47) Go to step (21)(48) End if(49) Else if 119891 == 119891119886119897119904119890 then(50) Go to step (52)(51) End if(52) Return PT

119894

Algorithm 1 Adaptive conditional probability-based predicting model for userrsquos personality traits

8 Mathematical Problems in Engineering

of data With consent usersrsquo psychological and Facebookprofiles are recorded for research purposes Currently thedatabase contains more than 6000000 test results togetherwith more than 4000000 individual Facebook profilesIn this paper we adopted the dataset that was provided inthe Workshop on Computational Personality Recognition(Shared Task) (httpmypersonalityorgwikilibexefetchphpmedia=wikimypersonality finalzip) [35] The datasetwas a subset of myPersonality database including 250users which had both information about personality traits(annotated with usersrsquo personality scores and gold standardpersonality labelswhichwere self-assessments obtained usinga 100-item long version of the IPIP personality questionnaire(httpipiporiorgnewFinding Labeling IPIP Scaleshtm))and social network structure (network size betweennesscentrality density brokerage and transitivity) along withtheir 9900 status updates in raw text With the aid of thisstandard dataset we can compare the performance of ourproposed method with othersrsquo personality recognitionsystems on a common benchmark

42 Analysis of Correlations between Personality Traits Pre-vious works were usually predicting userrsquos personality traitswithout considering interdependencies between them In thiscontext we investigatedwhether there had been relationshipsbetween userrsquos personality traits Since not all users had thesame level of interactions consequently we first groupedusers according to different degrees of userrsquos personalitytraits As an example if a user has positive level of agree-ableness another user has positive level of agreeableness aswell then they will be divided into a group else if a user haspositive level of agreeableness another user has negative levelof agreeableness and they will not be divided into a groupThen we explored cooccurrences between personality traitsin each group simultaneously which is shown in Figure 2

It can be observed intuitively that a significant proportionof users have negative level of extraversion positive levelof openness or positive level of agreeableness It is alsonoteworthy that positive level of extraversion has less overlapwith negative level of openness negative level of consci-entiousness and negative level of agreeableness Besidespositive level of neuroticism has less overlap with positivelevel of conscientiousness positive level of agreeableness andnegative level of openness Furthermore there are bits ofusers that have positive level of extraversion and positive levelof neuroticism simultaneously

However the above cooccurrences may be due to the apriori statistics of each trait For instance if there are moreusers with negative level of neuroticism positive level ofopenness and positive level of agreeableness than othersit is more likely to have more users with cooccurrencebetween negative level of neuroticism and positive level ofopenness negative level of neuroticism and positive level ofagreeableness and positive level of openness and positivelevel of agreeableness than others as it is shown in Figure 2Therefore since users were grouped according to differentdegrees of userrsquos personality traits in order to investigatewhether the above phenomenon meant that there werepositive or negative correlations between different degrees

of personality traits in each group we treated scores ofa certain personality trait according to which users weredivided into the group and scores of another personalitytrait with a certain degree as two random variables followedby employing Kendall correlation coefficient which wascalculated in (3) to analyze real correlations between themThe results are presented in Table 1

As it can be seen from Table 1 we can observe anegative correlation between negative level of extraversionand positive level of neuroticism in other words people thatscore high on extraversion have less possibility to score highon neuroticism as well In psychology a person with negativelevel of extraversion can be explained as solitary and a personwith negative level of conscientiousnessmeans being carelessConsequently negative level of extraversion and negativelevel of conscientiousness are not in line with positive level ofneuroticism which tends to experience unpleasant emotionseasily Another factor is social desirability Moreover agree-ableness extraversion conscientiousness and openness aremore or less ldquodesirablerdquo whereas neuroticism is quite clearlynegative Thus generally there are negative correlationsbetween neuroticism and other personality traits

What is more it is inconsistent with Figure 2 that thereis a positive correlation between positive level of neuroticismand positive level of extraversion It may be explained thata person with positive level of extraversion tends to beenthusiastic and talkative which is compatible with positivelevel of neuroticism In addition in line with Figure 2since negative level of openness shows cautiousness it hasa positive correlation with negative level of extraversionand positive level of agreeableness which is a tendency tobe compassionate However there is a positive correlationbetween negative level of openness and positive level ofneuroticism which is not in keeping with Figure 2

Since there are some contradictions between Figure 2and Table 1 we leveraged Jensen-Shannon divergence [45]to further analyze correlations between userrsquos personalitytraits Jensen-Shannon divergence between two probabilitydistributions is calculated as

119863JS (119875 119876) =12(119863KL (119875 119877) +119863KL (119876 119877)) (14)

where 119875 and 119876 denote two probability distributions 119877denotes an average distribution of 119875 and 119876 and 119863KL(119875 119877)denotes Kullback-Leibler divergence [46] between 119875 and 119877which is calculated with

119863KL (119875 119877) = sum119894

119875 (119894) log 119875 (119894)119877 (119894)

(15)

where 119875(119894) and 119877(119894) denote the 119894th value of 119875 and 119877 Theresults are presented in Table 2

In keeping with Figure 2 and Table 1 Jensen-Shannondivergences between negative level of neuroticism and posi-tive level of agreeableness positive level of conscientiousnessand positive level of openness are larger It is because ofthe fact that a person with negative level of neuroticismshows confidence on everything and positive level of open-ness reflects degree of intellectual curiosity creativity and

Mathematical Problems in Engineering 9

Table 1 Kendall correlation coefficients between personality traits 119901 and 119899 stand for positive level and negative level of a certain personalitytrait The bold number is the highest Kendall correlation coefficients and the italic number is the lowest Kendall correlation coefficients Thebigger Kendall correlation coefficient is the better the consistency is

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 022 minus003 009 minus015 minus006 minus013 007 001119899 minus023 minus018 004 004 004 018 007 022

NEU 119901 minus009 minus017 010 minus021 minus003 027119899 minus014 minus013 002 minus001 minus006 minus010

AGR 119901 minus007 007 016 029119899 minus008 016 006 002

CON 119901 minus001 minus010119899 007 minus007

OPN 119901

119899

PEXT and PNEUPEXT and PAGR

PEXT and PCONPEXT and POPNPEXT and NNEUPEXT and NAGRPEXT and NCONPEXT and NOPNPNEU and PAGR

PNEU and PCONPNEU and POPNPNEU and NEXTPNEU and NAGRPNEU and NCONPNEU and NOPNPAGR and PCONPAGR and POPNPAGR and NEXTPAGR and NNEUPAGR and NCONPAGR and NOPNPCON and POPNPCON and NEXTPCON and NNEUPCON and NAGRPCON and NOPNPOPN and NEXTPOPN and NNEUPOPN and NAGRPOPN and NCONNEXT and NNEUNEXT and NAGRNEXT and NCONNEXT and NOPNNNEU and NAGRNNEU and NCONNNEU and NOPNNAGR and NCONNAGR and NOPNNCON and NOPN

Proportion ()

000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000()

Figure 2The proportion of users who have a certain pair of personality traits simultaneously PEXT PNEU PAGR PCON and POPN standfor positive level of extraversion neuroticism agreeableness conscientiousness and openness NEXT NNEU NAGR NCON and NOPNstand for negative level of extraversion neuroticism agreeableness conscientiousness and openness

10 Mathematical Problems in Engineering

Table 2 Jensen-Shannon divergence between personality traits 119901and 119899 stand for positive level and negative level of a certain personal-ity traitThe bold number is the highest Jensen-Shannon divergenceand the italic number is the lowest Jensen-Shannon divergenceSmaller Jensen-Shannon divergence means better consistency

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 058 1479 044 219 070 236 073 084119899 479 449 521 236 561 191 1043 211

NEU 119901 080 212 110 246 298 040119899 1978 294 1697 305 2473 361

AGR 119901 077 329 082 091119899 340 086 463 125

CON 119901 115 117119899 616 140

OPN 119901

119899

a preference for novelty and variety a person has alongwith positive level of agreeableness and positive level ofconscientiousness which are compatible Furthermore it isin conformity with Table 1 that positive level of extraversionhas smaller Jensen-Shannon divergences with positive levelof neuroticism and positive level of agreeableness besidespositive level of neuroticism has a smaller Jensen-Shannondivergence with negative level of openness

In addition Soto et al [29] demonstrated convergent anddiscriminate correlations for the Big Five Inventory Soto andJohn [30] also illustrated that there was strong convergencebetween each BFI facet scale and corresponding NEO PI-Rfacet Moreover Park et al [28] demonstrated convergencewith self-reports of personality at the domain- and facet-levelIn summary although utilizing different Big Five measuresit can be found that correlations in myPersonality datasetare also in keeping with the findings from other samplesHence correlations between personality traits are expectedand actually prove that myPersonality results are validSo leveraging prior knowledge about relationships betweenpersonality traits may contribute to a better prediction onpersonality traits The results provide a better theoreticalsupport for our proposed method

43 Evaluation Metric Since most researchers exploringon personality recognition leveraged different measures toevaluate their experiments on various datasets it was hard tocompletely appraise their performance and quality Howevera common dataset which was annotated with gold standardpersonality labels was available in the Workshop on Com-putational Personality Recognition (Shared Task) users werefree to split the training and test sets as they wish andprecision recall and 1198651-measure were suggested to be usedto evaluate results of predictions So in this paper for morereliable results the proposed method was also evaluated withstratified 5-fold cross-validation Since there was no trainingset in unsupervised learning the common dataset was splitinto folds and in order to avoid overfitting the same author

Table 3 Precision recall and 1198651-measure of adaptive conditionalprobability-based predicting model (in bold best performance)

Precision Recall 1198651-measureEXT 093 076 083NEU 086 067 075AGR 096 074 083CON 089 070 078OPN 091 079 084Mean 091 0732 0806

did not appear twofold at the same timeThemean precisionrecall and 1198651-measure are calculated as follows

Precision = tptp + fp

Recall = tptp + fn

1198651-measure = 2 times 119875 times 119877119875 + 119877

(16)

where tp stands for the number of positive samples thatare classified correctly fp stands for the number of positivesamples that are classified incorrectly and fn stands for thenumber of negative samples that are classified incorrectly

44 Analysis of Impacts of Different Factors First we carriedon experiments with our proposed method Scores of preci-sion recall and 1198651-measure are shown in Table 3 includingmean scores of all personality traits

441 Impacts of Different Categories of Feature Sets In thissection we conducted experiments on social network fea-tures linguistic features emotional statistical features topicfeatures and all features respectively Due to space restric-tions Figure 3 only illustrates 1198651-measure of all personalitytraits along with mean 1198651-measure of them

It can be observed that experiments which are conductedon linguistic features emotional statistical features and topicfeatures result in unsatisfactory performance with respectto social network features and social network features havethe best classification performance for extraversion which isconsistent with the conclusion in [33] Since social networkfeatures reflect a userrsquos pattern of social status and habitswhich are relatively important decisive factors in userrsquospersonality traits as a result they provide more informationthan other types of features Nevertheless whether more orless each kind of characteristic provides unique informationin userrsquos personality traits prediction Therefore carryingon experiment with combination of social network featureslinguistic features emotional statistical features and topicfeatures can achieve better performance

442 Impact of Weighted Features Additionally we con-ducted experiments on unweighted features Here we onlypresented the most successful results in Table 4 includingmean scores of all personality traits

Mathematical Problems in Engineering 11

1

featuresEssential

featuresLinguistic

featuresEmotional

featuresTopic

featuresAll

Categories of features

EXTNEUAGR

CONOPNMEAN

09

08

07

06

05

04

03

F-s

core

s

Figure 3 1198651-measures of five personality traits along with mean1198651-measures of them with different categories of feature sets

Table 4 Precision recall and 1198651-measure of predictingmodel withunweighted features (in bold best performance)

Precision Recall 1198651-measureEXT 085 076 082NEU 092 068 076AGR 094 064 076CON 091 057 068OPN 087 078 083Mean 090 0686 077

Table 5 Precision recall and 1198651-measure of predictingmodel withunified thresholds (in bold best performance)

Precision Recall 1198651-measureEXT 062 054 057NEU 080 016 025AGR 057 073 063CON 057 077 060OPN 070 100 082Mean 0652 064 0574

From Tables 3 and 4 we can observe that the best resultsfor myPersonality dataset are achieved after distributingweights to features It illustrates that considering correlationsbetween features and userrsquos personality traits can bring per-formance gain to the prediction task since there are limitedfeatures that may remain useful for userrsquos personality traitsprediction

443 Impact of Dynamic Updated Thresholds of PersonalityTraits In addition we predicted userrsquos personality traits withunified thresholds as well The results are summarized inTable 5

Multilabel learning task is to predict one or more cat-egories for each instance In the existing algorithms suchas PT5 method proposed by Tsoumakas and Katakis [34]

Table 6 Precision of different methods on Facebook dataset Thebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 093 086 096 089 091 0910Verhoevenet al [42] SVM 079 071 067 072 087 0752

Farnadi etal [33]

SVM kNNNB 058 054 050 055 060 0554

Alam et al[43]

SVM BLRmNB 058 059 059 059 060 0590

Tomlinsonet al [44] LR NA NA NA NA NA NA

generally thresholdswere unified to the same valueHoweverall categories employing a unified minimum threshold arenot appropriate If threshold is set too high not all categoriestags will be predicted on the other hand if threshold is settoo low too many classes will be predicted in results Hencein this paper we updated thresholds of personality traitsdynamically according to error rates of them Comparedto Table 3 based on thresholds obtained through updatingdynamically our method can achieve better results

45 Performance EvaluationwithDifferentMethods Asmen-tioned in Section 41 in the Workshop on ComputationalPersonality Recognition (Shared Task) different systems forpersonality recognition from text and network features ona common benchmark have been compared Verhoeven etal [42] proposed a meta-learning approach which can beextended to certain component classifiers from other genreswith other class systems or even from other languagesFarnadi et al [33] leveraged SVM 119896 nearest neighbors(kNN) [17] and Naıve Bayes (NB) for automatic recognitionof personality traits from usersrsquo Facebook statues updatesrespectively even with a small set of training dataset it couldachieve better results than most baseline algorithms On thebasis of a set of features extracted from Facebook datasetAlam et al [43] explored suitability and performance ofseveral classification techniques which were SVM BayesianLogistic Regression (BLR) [47] andMultinomial Naıve Bayes(mNB) [48] Tomlinson et al [44] used ranking algorithmsfor feature selection and Logistic Regression (LR) [49] aslearning algorithms which achieved a high performance Inthis paper we tested our proposed method with the sameset than the participants in the Workshop on ComputationalPersonality Recognition (Shared Task) In terms of precisionrecall and 1198651-measure a comparison of the works describedabove as well as our proposed method (denoted by CP) issummarized in Tables 6 7 and 8

From Tables 6 7 and 8 the experimental results indicatethat openness is the easiest personality trait to be predictedby Facebook features with the above methods because thereare more users with positive level of openness than otherpersonality traits in Facebook dataset Since Verhoeven et al[42] trained an ensemble SVM model based on both Face-book and Essays personality traits dataset as a consequence

12 Mathematical Problems in Engineering

Table 7 Recall of different methods on Facebook dataset The bestperformance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 076 067 074 070 079 0732Verhoevenet al [42] SVM 079 072 068 072 087 0756

Farnadi etal [33]

SVMkNN NB 061 053 050 054 070 0576

Alam et al[43]

SVMBLR mNB 058 058 059 059 060 0588

Tomlinsonet al [44] LR NA NA NA NA NA NA

Table 8 1198651-measure of different methods on Facebook datasetThebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 083 075 083 078 084 0806Verhoevenet al [42] SVM 079 070 067 072 086 0748

Farnadi etal [33]

SVMkNN NB 056 052 050 054 061 0546

Alam et al[43]

SVMBLR mNB 058 058 058 059 060 0586

Tomlinsonet al [44] LR NA NA NA NA NA 0630

when tested on the same Facebook dataset it can achievebetter results than our proposed method Furthermorewhen trained and tested on the same Facebook dataset ourmethod outperforms the methods in [33 43 44] which donot take weighted features dynamic updated thresholds ofpersonality traits and correlations between personality traitsinto consideration

5 Conclusion

In this paper we studied the problem of exploiting interde-pendencies between userrsquos personality traits for predictinguserrsquos personality traits First after analyzing importance offeatures we conducted experiments on Facebook datasetto demonstrate the existence of correlations between userrsquospersonality traits Bearing this in mind an unsupervisedframework adaptive conditional probability-based predict-ing model was then proposed to predict userrsquos Big Fivepersonality traits based on importance of features dynamicupdated thresholds of personality traits and prior knowledgeabout correlations between personality traits Furthermorewe compared our results with the ones achieved by othersin the Workshop on Computational Personality Recognition(Shared Task) on the same dataset In general the experimen-tal results demonstrated the effectiveness of our proposedframework

In future work we will speculate on what directions canbe undertaken to ameliorate its performance with respect

to time complexity so as to better apply it to a big dataenvironment as in Facebook monitoring Besides in orderto make our framework applicable to dynamic networksbetter we will explore combining time series analysis withpersonality traits predicting algorithm to capture dynamicevolution process of information and network structureFurthermore since social network features (described inSection 311) are specific for our dataset they may be difficult(or even impossible) to obtain in another dataset hencewe will conduct experiments on other datasets such as theTweeter corpus of PAN shared task (httppanwebisde)on author profiling where also personality is considered tofurther validate the effectiveness of the proposed frameworkbased on different features

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China under Grant no 61300148 the Scientificand Technological Break-Through Program of Jilin ProvinceunderGrant no 20130206051GX the Science andTechnologyDevelopment Program of Jilin Province under Grant no20130522112JH the Science Foundation for China Postdoc-tor under Grant no 2012M510879 and the Basic Scien-tific Research Foundation for the Interdisciplinary Researchand Innovation Project of Jilin University under Grant no201103129

References

[1] M Oussalah F Bhat K Challis and T Schnier ldquoA softwarearchitecture for Twitter collection search and geolocationservicesrdquo Knowledge-Based Systems vol 37 pp 105ndash120 2013

[2] R R McCrae and P T Costa ldquoValidation of the 5-factor modelof personality across instruments and observersrdquo Journal ofPersonality and Social Psychology vol 52 no 1 pp 81ndash90 1987

[3] G Saucier and L R Goldberg ldquoThe structure of personalityattributesrdquo in Personality and Work Reconsidering the Role ofPersonality in Organizations M R Barrick and A M RyanEds pp 1ndash29 Jossey-Bass 2003

[4] B Caci M Cardaci M E Tabacchi and F Scrima ldquoPersonalityvariables as predictors of facebook usagerdquoPsychological Reportsvol 114 no 2 pp 528ndash539 2014

[5] M Kosinski Y Bachrach P Kohli D Stillwell and T GraepelldquoManifestations of user personality in website choice andbehaviour on online social networksrdquoMachine Learning vol 95no 3 pp 357ndash380 2014

[6] M-G Cojocaru H Thille E Thommes D Nelson and SGreenhalgh ldquoSocial influence and dynamic demand for newproductsrdquoEnvironmentalModellingamp Software vol 50 pp 169ndash185 2013

[7] F Durupinar N Pelechano J Allbeck U Gudukbay andN Badler ldquoThe impact of the OCEAN personality modelon the perception of crowdsrdquo IEEE Computer Graphics andApplications vol 31 no 3 pp 22ndash31 2011

Mathematical Problems in Engineering 13

[8] X Feng ldquoStudy on customer personality characteristics andrelationship outcomes using SEM analysisrdquo in Proceedings ofthe International Conference on Mechatronics and InformationTechnology pp 841ndash844 2013

[9] J G Lee SMoon andK Salamatian ldquoModeling and predictingthe popularity of online contents with Cox proportional hazardregression modelrdquo Neurocomputing vol 76 no 1 pp 134ndash1452012

[10] J W Stoughton L F Thompson and A W Meade ldquoBigfive personality traits reflected in job applicantsrsquo social mediapostingsrdquo Cyberpsychology Behavior and Social Networkingvol 16 no 11 pp 800ndash805 2013

[11] D A Cobb-Clark and S Schurer ldquoThe stability of big-fivepersonality traitsrdquo Economics Letters vol 115 no 1 pp 11ndash152012

[12] W Mischel ldquoToward an integrative science of the personrdquoAnnual Review of Psychology vol 55 pp 1ndash22 2004

[13] S Argamon S Dhawle M Koppel and J Pennebaker ldquoLexicalpredictors 19 of personality typerdquo in Proceedings of the JointAnnual Meeting of the Interface and the Classification Society ofNorth America 2005

[14] F Mairesse M AWalker M R Mehl and R K Moore ldquoUsinglinguistic cues for the automatic recognition of personality inconversation and textrdquo Journal of Artificial Intelligence Researchvol 30 pp 457ndash500 2007

[15] K Moore and J C McElroy ldquoThe influence of personalityon Facebook usage wall postings and regretrdquo Computers inHuman Behavior vol 28 no 1 pp 267ndash274 2012

[16] J W Pennebaker and L A King ldquoLinguistic styles languageuse as an individual differencerdquo Journal of Personality and SocialPsychology vol 77 no 6 pp 1296ndash1312 1999

[17] J Han and M Kamber Data Mining Concepts and TechniquesThe Morgan Kaufmann Series in Data Management Systemsedited by J Gray Morgan Kaufmann Boston Mass USA 3rdedition 2011

[18] J Golbeck C Robles and K Turner ldquoPredicting personalitywith social mediardquo in Proceedings of the 29th Annual CHIConference on Human Factors in Computing Systems (CHI rsquo11)pp 253ndash262 ACM May 2011

[19] J RQuinlan ldquoLearningwith continuous classesrdquo inProceedingsof the 5thAustralian Joint Conference onArtificial Intelligence (Airsquo92) pp 343ndash348 Hobart Australia November 1992

[20] S Roweis and Z Ghahramani ldquoA unifying review of lineargaussian modelsrdquo Neural Computation vol 11 no 2 pp 305ndash345 1999

[21] S Bai T Zhu and L Cheng ldquoBig-five personality pre-diction based on user behaviors at social network sitesrdquohttparxivorgpdf12044809v1pdf

[22] J Oberlander and S Nowson ldquoWhose thumb is it anywayclassifying author personality from weblog textrdquo in Proceedingsof the 44th Annual Meeting of the Association for ComputationalLinguistics pp 627ndash634 2006

[23] T Nguyen D Q Phung B Adams and S Venkatesh ldquoTowardsdiscovery of influence and personality traits through social linkpredictionrdquo in Proceedings of the International Conference onWeblogs and Social Media pp 566ndash569 Barcelona Spain July2011

[24] S T Bai B B Hao A Li S Yuan R Gao and T S ZhuldquoPredicting big five personality traits of microblog usersrdquo inProceedings of the 12th IEEEWICACM International JointConferences on Web Intelligence and Intelligent Agent Technolog(WI rsquo13) pp 501ndash508 November 2013

[25] R Sun and N Wilson ldquoA model of personality should be acognitive architecture itselfrdquo Cognitive Systems Research vol29-30 no 1 pp 1ndash30 2014

[26] A Ortigosa R M Carro and J I Quiroga ldquoPredicting userpersonality by mining social interactions in Facebookrdquo Journalof Computer and System Sciences vol 80 no 1 pp 57ndash71 2014

[27] O P John L P Naumann and C J Soto ldquoParadigm shift tothe integrative Big Five trait taxonomy history measurementand conceptualissuesrdquo in Handbook of Personality Theory andResearch O P John R W Robins and L A Pervin Eds pp114ndash158 Guilford Press New York NY USA 3rd edition 2008

[28] G Park H A Schwartz J C Eichstaedt et al ldquoAutomaticpersonality assessment through social media languagerdquo Journalof Personality and Social Psychology vol 108 no 6 pp 934ndash9522015

[29] C J Soto O P John S D Gosling and J Potter ldquoAge differencesin personality traits from 10 to 65 big five domains and facets ina large cross-sectional samplerdquo Journal of Personality and SocialPsychology vol 100 no 2 pp 330ndash348 2011

[30] C J Soto and O P John ldquoTen facet scales for the BigFive Inventory convergence with NEO PI-R facets self-peeragreement and discriminant validityrdquo Journal of Research inPersonality vol 43 no 1 pp 84ndash90 2009

[31] M V Kosti R Feldt and L Angelis ldquoPersonality emotionalintelligence and work preferences in software engineering anempirical studyrdquo Information and Software Technology vol 56no 8 pp 973ndash990 2014

[32] H A Schwartz J C Eichstaedt M L Kern et al ldquoPersonalitygender and age in the language of social media the open-vocabulary approachrdquo PLoS ONE vol 8 no 9 Article IDe73791 2013

[33] G Farnadi S ZoghbiM FMoens andMD Cock ldquoRecognis-ing personality traits using facebook status updatesrdquo in Proceed-ings of the Workshop on Computational Personality RecognitionJuly 2013 httpcliccimecunitnitfabiowcpr13farnadi wcpr-13pdf

[34] G Tsoumakas and I Katakis ldquoMulti-label classification anoverviewrdquo International Journal of Data Warehousing and Min-ing vol 3 no 3 pp 1ndash13 2007

[35] M Kosinski D Stillwell and T Graepel ldquoPrivate traits andattributes are predictable from digital records of human behav-iorrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 110 no 15 pp 5802ndash5805 2013

[36] M Belkin and P Niyogi ldquoLaplacian eigenmaps for dimension-ality reduction and data representationrdquo Neural Computationvol 15 no 6 pp 1373ndash1396 2003

[37] R Socher J Bauer C D Manning and Y N Andrew ldquoParsingwith compositional vector grammarsrdquo in Proceedings of the 51stAnnual Meeting of the Association for Computational Linguistics(ACL rsquo13) pp 455ndash465 Sofia Bulgaria August 2013

[38] K Kazama M Imada and K Kashiwagi ldquoCharacteristics ofinformation diffusion in blogs in relation to information sourcetyperdquo Neurocomputing vol 76 no 1 pp 84ndash92 2012

[39] M B David Y N Andrew and I J Michael ldquoLatent Dirichletallocationrdquo Journal ofMachine Learning Research vol 3 no 4-5pp 993ndash1022 2003

[40] P T Costa andR RMcCraeRevisedNEOPersonality Inventory(NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Profes-sional Manual Psychological Assessment Resources 1992

[41] R L Atkinson C A Richard E S Edward J B Daryland S Nolen-Hoeksema Hilgardrsquos Introduction to Psychology

14 Mathematical Problems in Engineering

Harcourt College Publishers Orlando Fla USA 13th edition2000

[42] B Verhoeven W Daelemans and T D Smedt ldquoEnsemblemethods for personality recognitionrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 35ndash38Boston Mass USA July 2013

[43] F Alam A Stepanov and G Riccardi ldquoPersonality traitsrecognition on social networkmdashfacebookrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 6ndash9Boston Mass USA July 2013

[44] M T Tomlinson D Hinote and D B Bracewell ldquoPredictingconscientiousness through semantic analysis of facebook postsrdquoin Proceedings of the Workshop on Computational PersonalityRecognition pp 31ndash34 July 2013

[45] D M Endres and J E Schindelin ldquoA newmetric for probabilitydistributionsrdquo IEEETransactions on InformationTheory vol 49no 7 pp 1858ndash1860 2003

[46] S Kullback and R A Leibler ldquoOn information and sufficiencyrdquoThe Annals of Mathematical Statistics vol 22 no 1 pp 79ndash861951

[47] A Genkin D D Lewis and D Madigan ldquoLarge-scale Bayesianlogistic regression for text categorizationrdquo Technometrics vol49 no 3 pp 291ndash304 2007

[48] A K McCallum and K Nigam ldquoA comparison of event modelsfor naive bayes text classificationrdquo in Proceedings of the AAAI-98 Workshop on Learning for Text Categorization pp 137ndash142Madison Wis USA July 1998

[49] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article A Novel Adaptive Conditional Probability ...downloads.hindawi.com/journals/mpe/2015/472917.pdf · Research Article A Novel Adaptive Conditional Probability-Based

Mathematical Problems in Engineering 5

Facebook

Social networkfeatures set

Userrsquosstatus set

Analyze correlations

betweenfeatures

and personality

traits

Linguistic features set

Emotional statistical features set

Topic features set

Assign weights

tofeatures

neighbors set

Analyzecorrelations

betweenpersonality

traits

Initializethresholds of personality

traits

Userrsquos personality traits set

Updatethresholds of personality traits

Calculateerror rates of personality traits

Predict userrsquos personality traits set

Minimum error rates

No

Yes

Get k nearest

Figure 1 The architecture of adaptive conditional probability-based predicting model for userrsquos personality traits

where119873 represents dimensions of tuples Thus according toKendall correlation coefficient between features and person-ality traits importance of the 119894th feature 119891

119894is calculated as

119868 (119891119894) =

sum119901119895isin119875120591 (119891119894 119901119895)

5

(7)

where 120591(119891119894 119901119895) stands for Kendall correlation coefficient

between the 119894th feature 119891119894and the 119895th personality trait 119901

119895and

119875 stands for set of Big Five personality traits which will beintroduced in Section 34 After normalizing importance ofthe 119894th feature 119891

119894 weight of 119891

119894is calculated as follows

119882(119891119894) =

119868 (119891119894)

sum119891119895isin119865119895 =119894

119868 (119891119895)

(8)

where 119865 denotes set of personality traits predicting features

33 A Framework of Predicting Userrsquos Personality TraitsFigure 1 shows the architecture of adaptive conditionalprobability-based predicting model for userrsquos personalitytraits An analysis of correlations between userrsquos personalitytraits and features is conducted before assigning weights tolinguistic emotional statistical and topic features which are

extracted from contents that user produces as well as socialnetwork features Meanwhile correlation analysis of userrsquospersonality traits is carried on Then according to weightsof features 119896 nearest neighbors set is obtained Finallygiven 119896 nearest neighbors set dynamic updated thresholds ofpersonality traits and correlations between personality traitsuserrsquos personality traits set is predicted

34 Algorithm of Adaptive Conditional Probability-BasedUserrsquos Personality Traits Prediction Before we proposed ourpredicting algorithm we conducted experiments to analyzecorrelations between userrsquos personality traits which will beshown in detail later in Section 42 And the results ofcorrelation analysis revealed that there were interdependentrelationships between userrsquos personality traits Thus givendynamic updated thresholds of personality traits as wellas considering interdependencies between personality traitsa novel adaptive correlation-based predicting model wasproposed

In psychology the Big Five personality traits are fivebroad domains or dimensions of personality traits that areused to describe human personality The theory based onthe Big Five factors is called Five Factor Model (FFM) [40]The Big Five factors are extraversion (denoted by EXT)

6 Mathematical Problems in Engineering

neuroticism (denoted by NEU) agreeableness (denoted byAGR) conscientiousness (denoted by CON) and openness(denoted by OPN)

Our method aimed to predict user 119894rsquos Big Five personalitytraits set 119875119879

119894 As stated in [41] here each personality trait

was classified into two degrees namely extraversion wasmapped onto extravert and shy neuroticism was mappedonto neurotic and secure agreeableness was mapped ontofriendly and uncooperative conscientiousness was mappedonto precise and careless and openness was mapped ontoinsightful and unimaginative For convenience we denotedthe two degrees of each personality trait by positive leveland negative level in short respectively First we set initialthreshold of each personality trait as 05 and threshold ofthe 120579th personality trait 119901

120579is denoted by 119879(119901

120579) followed by

calculating distance between user 119894 and other users with (9)as follows

Dis (119894 119895) =119898

sum

ℎ=1119882(119891ℎ) times1003816100381610038161003816119865 (119894 119891ℎ) minus 119865 (119895 119891ℎ)

1003816100381610038161003816 (9)

where Dis(119894 119895) presents distance between user 119894 and user 119895119865(119894 119891

ℎ) and 119865(119895 119891

ℎ) present the ℎth feature 119891

ℎrsquos value of user

119894 and user 119895 and119898 presents the number of featuresSecondly we sorted user 119894rsquos distance set in ascending

order and selected top 119896 users as user 119894rsquos neighbors (denotedas119873(119894)) Then for each 119901

120579 we recorded the number of users

who had positive level of 119901120579in119873(119894) as 119899

119901120579

Thirdly we selected a 119901

120579from unvisited personality traits

set UP if user 119894rsquos personality traits set PT119894was empty and then

we calculated probability that user 119894 had positive level of 119901120579as

follows

119891 (119894 119901120579) =

119875 (119867119901120579

| 119862119899119901120579

119901120579

)

119875 (sim 119867119901120579

| 119862119899119901120579

119901120579

)

(10)

where119862119899119901120579

119901120579

presents event that there are exactly119862119899119901120579

119901120579

instanceshaving positive level of 119901

120579in119873(119894) whose 119896 nearest neighbors

also have 119899119901120579

users with positive level of 119901120579 119867119901120579

and sim 119867119901120579

present event that user 119894 has positive level of 119901120579and event that

user 119894 has negative level of 119901120579 respectively And if user 119894rsquos per-

sonality traits set PT119894was not empty considering correlations

between personality traits we calculated probability that user119894 had positive level of 119901

120579on the basis of prior knowledge

about correlations between119901120579and personality traits that have

already been visited in UP as follows

119891 (119894 119901120579) =

119875 (119867119901120579

| 1199011198971 1199011198972 ) 119875 (119862

119899119901120579

119901120579

| 119867119901120579

)

119875 (sim 119867119901120579

| 1199011198971 1199011198972 ) 119875 (119862

119899119901120579

119901120579

| sim 119867119901120579

)

(11)

where 1199011198971 1199011198972 isin PT

119894present personality traits which have

already been visited in UP If 119891(119894 119901120579) was greater than or

equal to 119879(119901120579) then positive level of 119901

120579was added to PT

119894

else negative level of 119901120579was added to PT

119894

And then we calculated error rate of each personalitytrait and error rate of 119901

120579is denoted by Err(119901

120579) which is

calculated as

Err (119901120579) = 1minus

119903120579

to (12)

where 119903120579denotes the number of instances which are classified

correctly to denote the total number of instancesFinally if unvisited personality traits set UP was empty

and error rate of each personality trait fell within a specifiedrange then algorithm was terminated else if unvisited per-sonality traits set UP was not empty then we continued topredict another personality trait in UP else if there was apersonality trait 119901

120579whose error rate did not reach defined

limits then we updated 119879(119901120579) with (12) and added it to

personality traits set UP to predict it again

119879 (119901120579) = 119879 (119901

120579) + 120576 timesErr (119901

120579) (13)

where 120576 denotes a monotonic decreasing learning rate Ouralgorithm can now be defined as Algorithm 1 Assumethat size of dataset is 119879119903 dimension of features is 119889 andsize of the nearest neighbors set is 119896 The complexity ofconditional probability-based predicting model for userrsquospersonality traits is analyzed as follows From step (10) tostep (12) computation time of Kendall correlation coefficientbetween features and personality traits is taking119874(1198791199032) timecalculating weights of features takes 119874(119889) time and thendistributing weights to features can be computed in 119874(119889119879119903)time the total time is119874(1198791199032) as dimension of features 119889 is farless than size of dataset119879119903 Calculating distance between user119894 and other users from step (13) to step (15) will take 119874(119889119879119903)time In step (16) it will take 119874(119879119903log2119879119903) time for sortingset of distances between user 119894 and other users Step (18) tostep (20) take 119874(119896) time to select 119896 nearest neighbors of user119894 If algorithm goes from step (23) to step (25) it will take119874(1198962

) time Else if algorithm goes from step (26) to step (28)it will take 119874(1198791199031198962) time It takes 119874(119879119903) time from step (37)to step (45) Assume that the number of iterations is 119905 Sincesteps (10) (11) and (12) can be done offline and do not needto be calculated repeatedly every time hence from step (13)to step (52) the overall online complexity of our proposedmethod is 119874(119879119903log

2119879119903) as 119896 and 119905 are far less than 119879119903 From

here we see that our proposed method is feasible in a big dataenvironment as in Facebook monitoring

4 Experimental Evaluation

In this section we first describe dataset used in our experi-ments And then we analyze the interdependent relationshipsbetween userrsquos personality traits Finally we conduct experi-ments on different kinds of features and make a comparisonwith other methods based on the same dataset

41 Dataset MyPersonality (httpmypersonalityorg) is apopular Facebook application which is a 336-question testmeasuring the 30 facets underlying the Big Five traits thatallows users to take real psychometric tests The respondentswho come from various age groups backgrounds and cul-tures are highly motivated to answer honestly and carefullyAdditionally usersrsquo psychological and Facebook profilesare ldquoverifiedrdquo by individualsrsquo social circles hence it is hardto lie to the people that know you best Also there is nopressure and one does not have to share hisher profileinformation thus myPersonality avoids deliberate faking

Mathematical Problems in Engineering 7

Input user 119894 users set 119880 size of the nearest neighbors set 119896 unvisited personality traits set UPOutput user 119894rsquos personality traits set PT

119894

(1) 119891 larr 119891119886119897119904119890

lowast 119891 stands for a flag if any of the thresholds of personality traits is updated then 119891 equals to trueotherwise 119891 equals to false lowast

(2) For each personality trait 119901120579isin UP do

(3) set temporary error rate Temp(119901120579) as 0

(4) End for(5) For each personality trait 119901

120579isin UP do

(6) set initial threshold 119879(119901120579) as 05

(7) End for(8) PT

119894larr 0

(9) UPlarr all of the Big Five personality traits(10) For each feature 119891

ℎisin user 119894rsquos feature set do

(11) calculate 119891ℎrsquos weight according to (8)

(12) End for(13) For user 119895 isin 119880 and 119895 = 119894 do(14) calculate Dis(119894 119895) according to (9)(15) End for(16) sort Dis(119894 1)Dis(119894 2) in ascending order(17) 119873(119894) larr select top 119896 users in Dis(119894 1)Dis(119894 2) (18) For each personality trait 119901

120579isin UP do

(19) 119899119901120579

larr the number of users who have positive level of personality trait 119901120579in119873(119894)

(20) End for(21) While UP = 0 do(22) 119901

120579larr UPpoll()

(23) If PT119894= 0 then

(24) calculate 119891(119894 119901120579) according to (10)

(25) End if(26) Else if PT

119894= 0 then

(27) calculate 119891(119894 119901120579) according to (11)

(28) End if(29) If 119891(119894 119901

120579) ge 119879(119901

120579) then

(30) 119875119879119894larr positive level of 119901

120579

(31) End if(32) Else if 119891(119894 119901

120579) lt 119879(119901

120579) then

(33) PT119894larrnegative level of 119901

120579

(34) End if(35) End while(36) 119891 larr 119891119886119897119904119890

(37) For each personality trait 119901120579isin PT119894do

(38) calculate 119901120579rsquos error rate Err(119901

120579) according to (12)

(39) If |Err(119901120579) minus Temp(119901

120579)| gt 120575 then 120575 denotes a constant which is small enough

(40) 119891 larr 119905119903119906119890

(41) Temp(119901120579) larr Err(119901

120579)

(42) update 119901120579rsquos threshold 119879(119901

120579) according to (13)

(43) UPlarr 119901120579

(44) End if(45) End for(46) If 119891 == 119905119903119906119890 then(47) Go to step (21)(48) End if(49) Else if 119891 == 119891119886119897119904119890 then(50) Go to step (52)(51) End if(52) Return PT

119894

Algorithm 1 Adaptive conditional probability-based predicting model for userrsquos personality traits

8 Mathematical Problems in Engineering

of data With consent usersrsquo psychological and Facebookprofiles are recorded for research purposes Currently thedatabase contains more than 6000000 test results togetherwith more than 4000000 individual Facebook profilesIn this paper we adopted the dataset that was provided inthe Workshop on Computational Personality Recognition(Shared Task) (httpmypersonalityorgwikilibexefetchphpmedia=wikimypersonality finalzip) [35] The datasetwas a subset of myPersonality database including 250users which had both information about personality traits(annotated with usersrsquo personality scores and gold standardpersonality labelswhichwere self-assessments obtained usinga 100-item long version of the IPIP personality questionnaire(httpipiporiorgnewFinding Labeling IPIP Scaleshtm))and social network structure (network size betweennesscentrality density brokerage and transitivity) along withtheir 9900 status updates in raw text With the aid of thisstandard dataset we can compare the performance of ourproposed method with othersrsquo personality recognitionsystems on a common benchmark

42 Analysis of Correlations between Personality Traits Pre-vious works were usually predicting userrsquos personality traitswithout considering interdependencies between them In thiscontext we investigatedwhether there had been relationshipsbetween userrsquos personality traits Since not all users had thesame level of interactions consequently we first groupedusers according to different degrees of userrsquos personalitytraits As an example if a user has positive level of agree-ableness another user has positive level of agreeableness aswell then they will be divided into a group else if a user haspositive level of agreeableness another user has negative levelof agreeableness and they will not be divided into a groupThen we explored cooccurrences between personality traitsin each group simultaneously which is shown in Figure 2

It can be observed intuitively that a significant proportionof users have negative level of extraversion positive levelof openness or positive level of agreeableness It is alsonoteworthy that positive level of extraversion has less overlapwith negative level of openness negative level of consci-entiousness and negative level of agreeableness Besidespositive level of neuroticism has less overlap with positivelevel of conscientiousness positive level of agreeableness andnegative level of openness Furthermore there are bits ofusers that have positive level of extraversion and positive levelof neuroticism simultaneously

However the above cooccurrences may be due to the apriori statistics of each trait For instance if there are moreusers with negative level of neuroticism positive level ofopenness and positive level of agreeableness than othersit is more likely to have more users with cooccurrencebetween negative level of neuroticism and positive level ofopenness negative level of neuroticism and positive level ofagreeableness and positive level of openness and positivelevel of agreeableness than others as it is shown in Figure 2Therefore since users were grouped according to differentdegrees of userrsquos personality traits in order to investigatewhether the above phenomenon meant that there werepositive or negative correlations between different degrees

of personality traits in each group we treated scores ofa certain personality trait according to which users weredivided into the group and scores of another personalitytrait with a certain degree as two random variables followedby employing Kendall correlation coefficient which wascalculated in (3) to analyze real correlations between themThe results are presented in Table 1

As it can be seen from Table 1 we can observe anegative correlation between negative level of extraversionand positive level of neuroticism in other words people thatscore high on extraversion have less possibility to score highon neuroticism as well In psychology a person with negativelevel of extraversion can be explained as solitary and a personwith negative level of conscientiousnessmeans being carelessConsequently negative level of extraversion and negativelevel of conscientiousness are not in line with positive level ofneuroticism which tends to experience unpleasant emotionseasily Another factor is social desirability Moreover agree-ableness extraversion conscientiousness and openness aremore or less ldquodesirablerdquo whereas neuroticism is quite clearlynegative Thus generally there are negative correlationsbetween neuroticism and other personality traits

What is more it is inconsistent with Figure 2 that thereis a positive correlation between positive level of neuroticismand positive level of extraversion It may be explained thata person with positive level of extraversion tends to beenthusiastic and talkative which is compatible with positivelevel of neuroticism In addition in line with Figure 2since negative level of openness shows cautiousness it hasa positive correlation with negative level of extraversionand positive level of agreeableness which is a tendency tobe compassionate However there is a positive correlationbetween negative level of openness and positive level ofneuroticism which is not in keeping with Figure 2

Since there are some contradictions between Figure 2and Table 1 we leveraged Jensen-Shannon divergence [45]to further analyze correlations between userrsquos personalitytraits Jensen-Shannon divergence between two probabilitydistributions is calculated as

119863JS (119875 119876) =12(119863KL (119875 119877) +119863KL (119876 119877)) (14)

where 119875 and 119876 denote two probability distributions 119877denotes an average distribution of 119875 and 119876 and 119863KL(119875 119877)denotes Kullback-Leibler divergence [46] between 119875 and 119877which is calculated with

119863KL (119875 119877) = sum119894

119875 (119894) log 119875 (119894)119877 (119894)

(15)

where 119875(119894) and 119877(119894) denote the 119894th value of 119875 and 119877 Theresults are presented in Table 2

In keeping with Figure 2 and Table 1 Jensen-Shannondivergences between negative level of neuroticism and posi-tive level of agreeableness positive level of conscientiousnessand positive level of openness are larger It is because ofthe fact that a person with negative level of neuroticismshows confidence on everything and positive level of open-ness reflects degree of intellectual curiosity creativity and

Mathematical Problems in Engineering 9

Table 1 Kendall correlation coefficients between personality traits 119901 and 119899 stand for positive level and negative level of a certain personalitytrait The bold number is the highest Kendall correlation coefficients and the italic number is the lowest Kendall correlation coefficients Thebigger Kendall correlation coefficient is the better the consistency is

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 022 minus003 009 minus015 minus006 minus013 007 001119899 minus023 minus018 004 004 004 018 007 022

NEU 119901 minus009 minus017 010 minus021 minus003 027119899 minus014 minus013 002 minus001 minus006 minus010

AGR 119901 minus007 007 016 029119899 minus008 016 006 002

CON 119901 minus001 minus010119899 007 minus007

OPN 119901

119899

PEXT and PNEUPEXT and PAGR

PEXT and PCONPEXT and POPNPEXT and NNEUPEXT and NAGRPEXT and NCONPEXT and NOPNPNEU and PAGR

PNEU and PCONPNEU and POPNPNEU and NEXTPNEU and NAGRPNEU and NCONPNEU and NOPNPAGR and PCONPAGR and POPNPAGR and NEXTPAGR and NNEUPAGR and NCONPAGR and NOPNPCON and POPNPCON and NEXTPCON and NNEUPCON and NAGRPCON and NOPNPOPN and NEXTPOPN and NNEUPOPN and NAGRPOPN and NCONNEXT and NNEUNEXT and NAGRNEXT and NCONNEXT and NOPNNNEU and NAGRNNEU and NCONNNEU and NOPNNAGR and NCONNAGR and NOPNNCON and NOPN

Proportion ()

000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000()

Figure 2The proportion of users who have a certain pair of personality traits simultaneously PEXT PNEU PAGR PCON and POPN standfor positive level of extraversion neuroticism agreeableness conscientiousness and openness NEXT NNEU NAGR NCON and NOPNstand for negative level of extraversion neuroticism agreeableness conscientiousness and openness

10 Mathematical Problems in Engineering

Table 2 Jensen-Shannon divergence between personality traits 119901and 119899 stand for positive level and negative level of a certain personal-ity traitThe bold number is the highest Jensen-Shannon divergenceand the italic number is the lowest Jensen-Shannon divergenceSmaller Jensen-Shannon divergence means better consistency

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 058 1479 044 219 070 236 073 084119899 479 449 521 236 561 191 1043 211

NEU 119901 080 212 110 246 298 040119899 1978 294 1697 305 2473 361

AGR 119901 077 329 082 091119899 340 086 463 125

CON 119901 115 117119899 616 140

OPN 119901

119899

a preference for novelty and variety a person has alongwith positive level of agreeableness and positive level ofconscientiousness which are compatible Furthermore it isin conformity with Table 1 that positive level of extraversionhas smaller Jensen-Shannon divergences with positive levelof neuroticism and positive level of agreeableness besidespositive level of neuroticism has a smaller Jensen-Shannondivergence with negative level of openness

In addition Soto et al [29] demonstrated convergent anddiscriminate correlations for the Big Five Inventory Soto andJohn [30] also illustrated that there was strong convergencebetween each BFI facet scale and corresponding NEO PI-Rfacet Moreover Park et al [28] demonstrated convergencewith self-reports of personality at the domain- and facet-levelIn summary although utilizing different Big Five measuresit can be found that correlations in myPersonality datasetare also in keeping with the findings from other samplesHence correlations between personality traits are expectedand actually prove that myPersonality results are validSo leveraging prior knowledge about relationships betweenpersonality traits may contribute to a better prediction onpersonality traits The results provide a better theoreticalsupport for our proposed method

43 Evaluation Metric Since most researchers exploringon personality recognition leveraged different measures toevaluate their experiments on various datasets it was hard tocompletely appraise their performance and quality Howevera common dataset which was annotated with gold standardpersonality labels was available in the Workshop on Com-putational Personality Recognition (Shared Task) users werefree to split the training and test sets as they wish andprecision recall and 1198651-measure were suggested to be usedto evaluate results of predictions So in this paper for morereliable results the proposed method was also evaluated withstratified 5-fold cross-validation Since there was no trainingset in unsupervised learning the common dataset was splitinto folds and in order to avoid overfitting the same author

Table 3 Precision recall and 1198651-measure of adaptive conditionalprobability-based predicting model (in bold best performance)

Precision Recall 1198651-measureEXT 093 076 083NEU 086 067 075AGR 096 074 083CON 089 070 078OPN 091 079 084Mean 091 0732 0806

did not appear twofold at the same timeThemean precisionrecall and 1198651-measure are calculated as follows

Precision = tptp + fp

Recall = tptp + fn

1198651-measure = 2 times 119875 times 119877119875 + 119877

(16)

where tp stands for the number of positive samples thatare classified correctly fp stands for the number of positivesamples that are classified incorrectly and fn stands for thenumber of negative samples that are classified incorrectly

44 Analysis of Impacts of Different Factors First we carriedon experiments with our proposed method Scores of preci-sion recall and 1198651-measure are shown in Table 3 includingmean scores of all personality traits

441 Impacts of Different Categories of Feature Sets In thissection we conducted experiments on social network fea-tures linguistic features emotional statistical features topicfeatures and all features respectively Due to space restric-tions Figure 3 only illustrates 1198651-measure of all personalitytraits along with mean 1198651-measure of them

It can be observed that experiments which are conductedon linguistic features emotional statistical features and topicfeatures result in unsatisfactory performance with respectto social network features and social network features havethe best classification performance for extraversion which isconsistent with the conclusion in [33] Since social networkfeatures reflect a userrsquos pattern of social status and habitswhich are relatively important decisive factors in userrsquospersonality traits as a result they provide more informationthan other types of features Nevertheless whether more orless each kind of characteristic provides unique informationin userrsquos personality traits prediction Therefore carryingon experiment with combination of social network featureslinguistic features emotional statistical features and topicfeatures can achieve better performance

442 Impact of Weighted Features Additionally we con-ducted experiments on unweighted features Here we onlypresented the most successful results in Table 4 includingmean scores of all personality traits

Mathematical Problems in Engineering 11

1

featuresEssential

featuresLinguistic

featuresEmotional

featuresTopic

featuresAll

Categories of features

EXTNEUAGR

CONOPNMEAN

09

08

07

06

05

04

03

F-s

core

s

Figure 3 1198651-measures of five personality traits along with mean1198651-measures of them with different categories of feature sets

Table 4 Precision recall and 1198651-measure of predictingmodel withunweighted features (in bold best performance)

Precision Recall 1198651-measureEXT 085 076 082NEU 092 068 076AGR 094 064 076CON 091 057 068OPN 087 078 083Mean 090 0686 077

Table 5 Precision recall and 1198651-measure of predictingmodel withunified thresholds (in bold best performance)

Precision Recall 1198651-measureEXT 062 054 057NEU 080 016 025AGR 057 073 063CON 057 077 060OPN 070 100 082Mean 0652 064 0574

From Tables 3 and 4 we can observe that the best resultsfor myPersonality dataset are achieved after distributingweights to features It illustrates that considering correlationsbetween features and userrsquos personality traits can bring per-formance gain to the prediction task since there are limitedfeatures that may remain useful for userrsquos personality traitsprediction

443 Impact of Dynamic Updated Thresholds of PersonalityTraits In addition we predicted userrsquos personality traits withunified thresholds as well The results are summarized inTable 5

Multilabel learning task is to predict one or more cat-egories for each instance In the existing algorithms suchas PT5 method proposed by Tsoumakas and Katakis [34]

Table 6 Precision of different methods on Facebook dataset Thebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 093 086 096 089 091 0910Verhoevenet al [42] SVM 079 071 067 072 087 0752

Farnadi etal [33]

SVM kNNNB 058 054 050 055 060 0554

Alam et al[43]

SVM BLRmNB 058 059 059 059 060 0590

Tomlinsonet al [44] LR NA NA NA NA NA NA

generally thresholdswere unified to the same valueHoweverall categories employing a unified minimum threshold arenot appropriate If threshold is set too high not all categoriestags will be predicted on the other hand if threshold is settoo low too many classes will be predicted in results Hencein this paper we updated thresholds of personality traitsdynamically according to error rates of them Comparedto Table 3 based on thresholds obtained through updatingdynamically our method can achieve better results

45 Performance EvaluationwithDifferentMethods Asmen-tioned in Section 41 in the Workshop on ComputationalPersonality Recognition (Shared Task) different systems forpersonality recognition from text and network features ona common benchmark have been compared Verhoeven etal [42] proposed a meta-learning approach which can beextended to certain component classifiers from other genreswith other class systems or even from other languagesFarnadi et al [33] leveraged SVM 119896 nearest neighbors(kNN) [17] and Naıve Bayes (NB) for automatic recognitionof personality traits from usersrsquo Facebook statues updatesrespectively even with a small set of training dataset it couldachieve better results than most baseline algorithms On thebasis of a set of features extracted from Facebook datasetAlam et al [43] explored suitability and performance ofseveral classification techniques which were SVM BayesianLogistic Regression (BLR) [47] andMultinomial Naıve Bayes(mNB) [48] Tomlinson et al [44] used ranking algorithmsfor feature selection and Logistic Regression (LR) [49] aslearning algorithms which achieved a high performance Inthis paper we tested our proposed method with the sameset than the participants in the Workshop on ComputationalPersonality Recognition (Shared Task) In terms of precisionrecall and 1198651-measure a comparison of the works describedabove as well as our proposed method (denoted by CP) issummarized in Tables 6 7 and 8

From Tables 6 7 and 8 the experimental results indicatethat openness is the easiest personality trait to be predictedby Facebook features with the above methods because thereare more users with positive level of openness than otherpersonality traits in Facebook dataset Since Verhoeven et al[42] trained an ensemble SVM model based on both Face-book and Essays personality traits dataset as a consequence

12 Mathematical Problems in Engineering

Table 7 Recall of different methods on Facebook dataset The bestperformance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 076 067 074 070 079 0732Verhoevenet al [42] SVM 079 072 068 072 087 0756

Farnadi etal [33]

SVMkNN NB 061 053 050 054 070 0576

Alam et al[43]

SVMBLR mNB 058 058 059 059 060 0588

Tomlinsonet al [44] LR NA NA NA NA NA NA

Table 8 1198651-measure of different methods on Facebook datasetThebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 083 075 083 078 084 0806Verhoevenet al [42] SVM 079 070 067 072 086 0748

Farnadi etal [33]

SVMkNN NB 056 052 050 054 061 0546

Alam et al[43]

SVMBLR mNB 058 058 058 059 060 0586

Tomlinsonet al [44] LR NA NA NA NA NA 0630

when tested on the same Facebook dataset it can achievebetter results than our proposed method Furthermorewhen trained and tested on the same Facebook dataset ourmethod outperforms the methods in [33 43 44] which donot take weighted features dynamic updated thresholds ofpersonality traits and correlations between personality traitsinto consideration

5 Conclusion

In this paper we studied the problem of exploiting interde-pendencies between userrsquos personality traits for predictinguserrsquos personality traits First after analyzing importance offeatures we conducted experiments on Facebook datasetto demonstrate the existence of correlations between userrsquospersonality traits Bearing this in mind an unsupervisedframework adaptive conditional probability-based predict-ing model was then proposed to predict userrsquos Big Fivepersonality traits based on importance of features dynamicupdated thresholds of personality traits and prior knowledgeabout correlations between personality traits Furthermorewe compared our results with the ones achieved by othersin the Workshop on Computational Personality Recognition(Shared Task) on the same dataset In general the experimen-tal results demonstrated the effectiveness of our proposedframework

In future work we will speculate on what directions canbe undertaken to ameliorate its performance with respect

to time complexity so as to better apply it to a big dataenvironment as in Facebook monitoring Besides in orderto make our framework applicable to dynamic networksbetter we will explore combining time series analysis withpersonality traits predicting algorithm to capture dynamicevolution process of information and network structureFurthermore since social network features (described inSection 311) are specific for our dataset they may be difficult(or even impossible) to obtain in another dataset hencewe will conduct experiments on other datasets such as theTweeter corpus of PAN shared task (httppanwebisde)on author profiling where also personality is considered tofurther validate the effectiveness of the proposed frameworkbased on different features

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China under Grant no 61300148 the Scientificand Technological Break-Through Program of Jilin ProvinceunderGrant no 20130206051GX the Science andTechnologyDevelopment Program of Jilin Province under Grant no20130522112JH the Science Foundation for China Postdoc-tor under Grant no 2012M510879 and the Basic Scien-tific Research Foundation for the Interdisciplinary Researchand Innovation Project of Jilin University under Grant no201103129

References

[1] M Oussalah F Bhat K Challis and T Schnier ldquoA softwarearchitecture for Twitter collection search and geolocationservicesrdquo Knowledge-Based Systems vol 37 pp 105ndash120 2013

[2] R R McCrae and P T Costa ldquoValidation of the 5-factor modelof personality across instruments and observersrdquo Journal ofPersonality and Social Psychology vol 52 no 1 pp 81ndash90 1987

[3] G Saucier and L R Goldberg ldquoThe structure of personalityattributesrdquo in Personality and Work Reconsidering the Role ofPersonality in Organizations M R Barrick and A M RyanEds pp 1ndash29 Jossey-Bass 2003

[4] B Caci M Cardaci M E Tabacchi and F Scrima ldquoPersonalityvariables as predictors of facebook usagerdquoPsychological Reportsvol 114 no 2 pp 528ndash539 2014

[5] M Kosinski Y Bachrach P Kohli D Stillwell and T GraepelldquoManifestations of user personality in website choice andbehaviour on online social networksrdquoMachine Learning vol 95no 3 pp 357ndash380 2014

[6] M-G Cojocaru H Thille E Thommes D Nelson and SGreenhalgh ldquoSocial influence and dynamic demand for newproductsrdquoEnvironmentalModellingamp Software vol 50 pp 169ndash185 2013

[7] F Durupinar N Pelechano J Allbeck U Gudukbay andN Badler ldquoThe impact of the OCEAN personality modelon the perception of crowdsrdquo IEEE Computer Graphics andApplications vol 31 no 3 pp 22ndash31 2011

Mathematical Problems in Engineering 13

[8] X Feng ldquoStudy on customer personality characteristics andrelationship outcomes using SEM analysisrdquo in Proceedings ofthe International Conference on Mechatronics and InformationTechnology pp 841ndash844 2013

[9] J G Lee SMoon andK Salamatian ldquoModeling and predictingthe popularity of online contents with Cox proportional hazardregression modelrdquo Neurocomputing vol 76 no 1 pp 134ndash1452012

[10] J W Stoughton L F Thompson and A W Meade ldquoBigfive personality traits reflected in job applicantsrsquo social mediapostingsrdquo Cyberpsychology Behavior and Social Networkingvol 16 no 11 pp 800ndash805 2013

[11] D A Cobb-Clark and S Schurer ldquoThe stability of big-fivepersonality traitsrdquo Economics Letters vol 115 no 1 pp 11ndash152012

[12] W Mischel ldquoToward an integrative science of the personrdquoAnnual Review of Psychology vol 55 pp 1ndash22 2004

[13] S Argamon S Dhawle M Koppel and J Pennebaker ldquoLexicalpredictors 19 of personality typerdquo in Proceedings of the JointAnnual Meeting of the Interface and the Classification Society ofNorth America 2005

[14] F Mairesse M AWalker M R Mehl and R K Moore ldquoUsinglinguistic cues for the automatic recognition of personality inconversation and textrdquo Journal of Artificial Intelligence Researchvol 30 pp 457ndash500 2007

[15] K Moore and J C McElroy ldquoThe influence of personalityon Facebook usage wall postings and regretrdquo Computers inHuman Behavior vol 28 no 1 pp 267ndash274 2012

[16] J W Pennebaker and L A King ldquoLinguistic styles languageuse as an individual differencerdquo Journal of Personality and SocialPsychology vol 77 no 6 pp 1296ndash1312 1999

[17] J Han and M Kamber Data Mining Concepts and TechniquesThe Morgan Kaufmann Series in Data Management Systemsedited by J Gray Morgan Kaufmann Boston Mass USA 3rdedition 2011

[18] J Golbeck C Robles and K Turner ldquoPredicting personalitywith social mediardquo in Proceedings of the 29th Annual CHIConference on Human Factors in Computing Systems (CHI rsquo11)pp 253ndash262 ACM May 2011

[19] J RQuinlan ldquoLearningwith continuous classesrdquo inProceedingsof the 5thAustralian Joint Conference onArtificial Intelligence (Airsquo92) pp 343ndash348 Hobart Australia November 1992

[20] S Roweis and Z Ghahramani ldquoA unifying review of lineargaussian modelsrdquo Neural Computation vol 11 no 2 pp 305ndash345 1999

[21] S Bai T Zhu and L Cheng ldquoBig-five personality pre-diction based on user behaviors at social network sitesrdquohttparxivorgpdf12044809v1pdf

[22] J Oberlander and S Nowson ldquoWhose thumb is it anywayclassifying author personality from weblog textrdquo in Proceedingsof the 44th Annual Meeting of the Association for ComputationalLinguistics pp 627ndash634 2006

[23] T Nguyen D Q Phung B Adams and S Venkatesh ldquoTowardsdiscovery of influence and personality traits through social linkpredictionrdquo in Proceedings of the International Conference onWeblogs and Social Media pp 566ndash569 Barcelona Spain July2011

[24] S T Bai B B Hao A Li S Yuan R Gao and T S ZhuldquoPredicting big five personality traits of microblog usersrdquo inProceedings of the 12th IEEEWICACM International JointConferences on Web Intelligence and Intelligent Agent Technolog(WI rsquo13) pp 501ndash508 November 2013

[25] R Sun and N Wilson ldquoA model of personality should be acognitive architecture itselfrdquo Cognitive Systems Research vol29-30 no 1 pp 1ndash30 2014

[26] A Ortigosa R M Carro and J I Quiroga ldquoPredicting userpersonality by mining social interactions in Facebookrdquo Journalof Computer and System Sciences vol 80 no 1 pp 57ndash71 2014

[27] O P John L P Naumann and C J Soto ldquoParadigm shift tothe integrative Big Five trait taxonomy history measurementand conceptualissuesrdquo in Handbook of Personality Theory andResearch O P John R W Robins and L A Pervin Eds pp114ndash158 Guilford Press New York NY USA 3rd edition 2008

[28] G Park H A Schwartz J C Eichstaedt et al ldquoAutomaticpersonality assessment through social media languagerdquo Journalof Personality and Social Psychology vol 108 no 6 pp 934ndash9522015

[29] C J Soto O P John S D Gosling and J Potter ldquoAge differencesin personality traits from 10 to 65 big five domains and facets ina large cross-sectional samplerdquo Journal of Personality and SocialPsychology vol 100 no 2 pp 330ndash348 2011

[30] C J Soto and O P John ldquoTen facet scales for the BigFive Inventory convergence with NEO PI-R facets self-peeragreement and discriminant validityrdquo Journal of Research inPersonality vol 43 no 1 pp 84ndash90 2009

[31] M V Kosti R Feldt and L Angelis ldquoPersonality emotionalintelligence and work preferences in software engineering anempirical studyrdquo Information and Software Technology vol 56no 8 pp 973ndash990 2014

[32] H A Schwartz J C Eichstaedt M L Kern et al ldquoPersonalitygender and age in the language of social media the open-vocabulary approachrdquo PLoS ONE vol 8 no 9 Article IDe73791 2013

[33] G Farnadi S ZoghbiM FMoens andMD Cock ldquoRecognis-ing personality traits using facebook status updatesrdquo in Proceed-ings of the Workshop on Computational Personality RecognitionJuly 2013 httpcliccimecunitnitfabiowcpr13farnadi wcpr-13pdf

[34] G Tsoumakas and I Katakis ldquoMulti-label classification anoverviewrdquo International Journal of Data Warehousing and Min-ing vol 3 no 3 pp 1ndash13 2007

[35] M Kosinski D Stillwell and T Graepel ldquoPrivate traits andattributes are predictable from digital records of human behav-iorrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 110 no 15 pp 5802ndash5805 2013

[36] M Belkin and P Niyogi ldquoLaplacian eigenmaps for dimension-ality reduction and data representationrdquo Neural Computationvol 15 no 6 pp 1373ndash1396 2003

[37] R Socher J Bauer C D Manning and Y N Andrew ldquoParsingwith compositional vector grammarsrdquo in Proceedings of the 51stAnnual Meeting of the Association for Computational Linguistics(ACL rsquo13) pp 455ndash465 Sofia Bulgaria August 2013

[38] K Kazama M Imada and K Kashiwagi ldquoCharacteristics ofinformation diffusion in blogs in relation to information sourcetyperdquo Neurocomputing vol 76 no 1 pp 84ndash92 2012

[39] M B David Y N Andrew and I J Michael ldquoLatent Dirichletallocationrdquo Journal ofMachine Learning Research vol 3 no 4-5pp 993ndash1022 2003

[40] P T Costa andR RMcCraeRevisedNEOPersonality Inventory(NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Profes-sional Manual Psychological Assessment Resources 1992

[41] R L Atkinson C A Richard E S Edward J B Daryland S Nolen-Hoeksema Hilgardrsquos Introduction to Psychology

14 Mathematical Problems in Engineering

Harcourt College Publishers Orlando Fla USA 13th edition2000

[42] B Verhoeven W Daelemans and T D Smedt ldquoEnsemblemethods for personality recognitionrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 35ndash38Boston Mass USA July 2013

[43] F Alam A Stepanov and G Riccardi ldquoPersonality traitsrecognition on social networkmdashfacebookrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 6ndash9Boston Mass USA July 2013

[44] M T Tomlinson D Hinote and D B Bracewell ldquoPredictingconscientiousness through semantic analysis of facebook postsrdquoin Proceedings of the Workshop on Computational PersonalityRecognition pp 31ndash34 July 2013

[45] D M Endres and J E Schindelin ldquoA newmetric for probabilitydistributionsrdquo IEEETransactions on InformationTheory vol 49no 7 pp 1858ndash1860 2003

[46] S Kullback and R A Leibler ldquoOn information and sufficiencyrdquoThe Annals of Mathematical Statistics vol 22 no 1 pp 79ndash861951

[47] A Genkin D D Lewis and D Madigan ldquoLarge-scale Bayesianlogistic regression for text categorizationrdquo Technometrics vol49 no 3 pp 291ndash304 2007

[48] A K McCallum and K Nigam ldquoA comparison of event modelsfor naive bayes text classificationrdquo in Proceedings of the AAAI-98 Workshop on Learning for Text Categorization pp 137ndash142Madison Wis USA July 1998

[49] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article A Novel Adaptive Conditional Probability ...downloads.hindawi.com/journals/mpe/2015/472917.pdf · Research Article A Novel Adaptive Conditional Probability-Based

6 Mathematical Problems in Engineering

neuroticism (denoted by NEU) agreeableness (denoted byAGR) conscientiousness (denoted by CON) and openness(denoted by OPN)

Our method aimed to predict user 119894rsquos Big Five personalitytraits set 119875119879

119894 As stated in [41] here each personality trait

was classified into two degrees namely extraversion wasmapped onto extravert and shy neuroticism was mappedonto neurotic and secure agreeableness was mapped ontofriendly and uncooperative conscientiousness was mappedonto precise and careless and openness was mapped ontoinsightful and unimaginative For convenience we denotedthe two degrees of each personality trait by positive leveland negative level in short respectively First we set initialthreshold of each personality trait as 05 and threshold ofthe 120579th personality trait 119901

120579is denoted by 119879(119901

120579) followed by

calculating distance between user 119894 and other users with (9)as follows

Dis (119894 119895) =119898

sum

ℎ=1119882(119891ℎ) times1003816100381610038161003816119865 (119894 119891ℎ) minus 119865 (119895 119891ℎ)

1003816100381610038161003816 (9)

where Dis(119894 119895) presents distance between user 119894 and user 119895119865(119894 119891

ℎ) and 119865(119895 119891

ℎ) present the ℎth feature 119891

ℎrsquos value of user

119894 and user 119895 and119898 presents the number of featuresSecondly we sorted user 119894rsquos distance set in ascending

order and selected top 119896 users as user 119894rsquos neighbors (denotedas119873(119894)) Then for each 119901

120579 we recorded the number of users

who had positive level of 119901120579in119873(119894) as 119899

119901120579

Thirdly we selected a 119901

120579from unvisited personality traits

set UP if user 119894rsquos personality traits set PT119894was empty and then

we calculated probability that user 119894 had positive level of 119901120579as

follows

119891 (119894 119901120579) =

119875 (119867119901120579

| 119862119899119901120579

119901120579

)

119875 (sim 119867119901120579

| 119862119899119901120579

119901120579

)

(10)

where119862119899119901120579

119901120579

presents event that there are exactly119862119899119901120579

119901120579

instanceshaving positive level of 119901

120579in119873(119894) whose 119896 nearest neighbors

also have 119899119901120579

users with positive level of 119901120579 119867119901120579

and sim 119867119901120579

present event that user 119894 has positive level of 119901120579and event that

user 119894 has negative level of 119901120579 respectively And if user 119894rsquos per-

sonality traits set PT119894was not empty considering correlations

between personality traits we calculated probability that user119894 had positive level of 119901

120579on the basis of prior knowledge

about correlations between119901120579and personality traits that have

already been visited in UP as follows

119891 (119894 119901120579) =

119875 (119867119901120579

| 1199011198971 1199011198972 ) 119875 (119862

119899119901120579

119901120579

| 119867119901120579

)

119875 (sim 119867119901120579

| 1199011198971 1199011198972 ) 119875 (119862

119899119901120579

119901120579

| sim 119867119901120579

)

(11)

where 1199011198971 1199011198972 isin PT

119894present personality traits which have

already been visited in UP If 119891(119894 119901120579) was greater than or

equal to 119879(119901120579) then positive level of 119901

120579was added to PT

119894

else negative level of 119901120579was added to PT

119894

And then we calculated error rate of each personalitytrait and error rate of 119901

120579is denoted by Err(119901

120579) which is

calculated as

Err (119901120579) = 1minus

119903120579

to (12)

where 119903120579denotes the number of instances which are classified

correctly to denote the total number of instancesFinally if unvisited personality traits set UP was empty

and error rate of each personality trait fell within a specifiedrange then algorithm was terminated else if unvisited per-sonality traits set UP was not empty then we continued topredict another personality trait in UP else if there was apersonality trait 119901

120579whose error rate did not reach defined

limits then we updated 119879(119901120579) with (12) and added it to

personality traits set UP to predict it again

119879 (119901120579) = 119879 (119901

120579) + 120576 timesErr (119901

120579) (13)

where 120576 denotes a monotonic decreasing learning rate Ouralgorithm can now be defined as Algorithm 1 Assumethat size of dataset is 119879119903 dimension of features is 119889 andsize of the nearest neighbors set is 119896 The complexity ofconditional probability-based predicting model for userrsquospersonality traits is analyzed as follows From step (10) tostep (12) computation time of Kendall correlation coefficientbetween features and personality traits is taking119874(1198791199032) timecalculating weights of features takes 119874(119889) time and thendistributing weights to features can be computed in 119874(119889119879119903)time the total time is119874(1198791199032) as dimension of features 119889 is farless than size of dataset119879119903 Calculating distance between user119894 and other users from step (13) to step (15) will take 119874(119889119879119903)time In step (16) it will take 119874(119879119903log2119879119903) time for sortingset of distances between user 119894 and other users Step (18) tostep (20) take 119874(119896) time to select 119896 nearest neighbors of user119894 If algorithm goes from step (23) to step (25) it will take119874(1198962

) time Else if algorithm goes from step (26) to step (28)it will take 119874(1198791199031198962) time It takes 119874(119879119903) time from step (37)to step (45) Assume that the number of iterations is 119905 Sincesteps (10) (11) and (12) can be done offline and do not needto be calculated repeatedly every time hence from step (13)to step (52) the overall online complexity of our proposedmethod is 119874(119879119903log

2119879119903) as 119896 and 119905 are far less than 119879119903 From

here we see that our proposed method is feasible in a big dataenvironment as in Facebook monitoring

4 Experimental Evaluation

In this section we first describe dataset used in our experi-ments And then we analyze the interdependent relationshipsbetween userrsquos personality traits Finally we conduct experi-ments on different kinds of features and make a comparisonwith other methods based on the same dataset

41 Dataset MyPersonality (httpmypersonalityorg) is apopular Facebook application which is a 336-question testmeasuring the 30 facets underlying the Big Five traits thatallows users to take real psychometric tests The respondentswho come from various age groups backgrounds and cul-tures are highly motivated to answer honestly and carefullyAdditionally usersrsquo psychological and Facebook profilesare ldquoverifiedrdquo by individualsrsquo social circles hence it is hardto lie to the people that know you best Also there is nopressure and one does not have to share hisher profileinformation thus myPersonality avoids deliberate faking

Mathematical Problems in Engineering 7

Input user 119894 users set 119880 size of the nearest neighbors set 119896 unvisited personality traits set UPOutput user 119894rsquos personality traits set PT

119894

(1) 119891 larr 119891119886119897119904119890

lowast 119891 stands for a flag if any of the thresholds of personality traits is updated then 119891 equals to trueotherwise 119891 equals to false lowast

(2) For each personality trait 119901120579isin UP do

(3) set temporary error rate Temp(119901120579) as 0

(4) End for(5) For each personality trait 119901

120579isin UP do

(6) set initial threshold 119879(119901120579) as 05

(7) End for(8) PT

119894larr 0

(9) UPlarr all of the Big Five personality traits(10) For each feature 119891

ℎisin user 119894rsquos feature set do

(11) calculate 119891ℎrsquos weight according to (8)

(12) End for(13) For user 119895 isin 119880 and 119895 = 119894 do(14) calculate Dis(119894 119895) according to (9)(15) End for(16) sort Dis(119894 1)Dis(119894 2) in ascending order(17) 119873(119894) larr select top 119896 users in Dis(119894 1)Dis(119894 2) (18) For each personality trait 119901

120579isin UP do

(19) 119899119901120579

larr the number of users who have positive level of personality trait 119901120579in119873(119894)

(20) End for(21) While UP = 0 do(22) 119901

120579larr UPpoll()

(23) If PT119894= 0 then

(24) calculate 119891(119894 119901120579) according to (10)

(25) End if(26) Else if PT

119894= 0 then

(27) calculate 119891(119894 119901120579) according to (11)

(28) End if(29) If 119891(119894 119901

120579) ge 119879(119901

120579) then

(30) 119875119879119894larr positive level of 119901

120579

(31) End if(32) Else if 119891(119894 119901

120579) lt 119879(119901

120579) then

(33) PT119894larrnegative level of 119901

120579

(34) End if(35) End while(36) 119891 larr 119891119886119897119904119890

(37) For each personality trait 119901120579isin PT119894do

(38) calculate 119901120579rsquos error rate Err(119901

120579) according to (12)

(39) If |Err(119901120579) minus Temp(119901

120579)| gt 120575 then 120575 denotes a constant which is small enough

(40) 119891 larr 119905119903119906119890

(41) Temp(119901120579) larr Err(119901

120579)

(42) update 119901120579rsquos threshold 119879(119901

120579) according to (13)

(43) UPlarr 119901120579

(44) End if(45) End for(46) If 119891 == 119905119903119906119890 then(47) Go to step (21)(48) End if(49) Else if 119891 == 119891119886119897119904119890 then(50) Go to step (52)(51) End if(52) Return PT

119894

Algorithm 1 Adaptive conditional probability-based predicting model for userrsquos personality traits

8 Mathematical Problems in Engineering

of data With consent usersrsquo psychological and Facebookprofiles are recorded for research purposes Currently thedatabase contains more than 6000000 test results togetherwith more than 4000000 individual Facebook profilesIn this paper we adopted the dataset that was provided inthe Workshop on Computational Personality Recognition(Shared Task) (httpmypersonalityorgwikilibexefetchphpmedia=wikimypersonality finalzip) [35] The datasetwas a subset of myPersonality database including 250users which had both information about personality traits(annotated with usersrsquo personality scores and gold standardpersonality labelswhichwere self-assessments obtained usinga 100-item long version of the IPIP personality questionnaire(httpipiporiorgnewFinding Labeling IPIP Scaleshtm))and social network structure (network size betweennesscentrality density brokerage and transitivity) along withtheir 9900 status updates in raw text With the aid of thisstandard dataset we can compare the performance of ourproposed method with othersrsquo personality recognitionsystems on a common benchmark

42 Analysis of Correlations between Personality Traits Pre-vious works were usually predicting userrsquos personality traitswithout considering interdependencies between them In thiscontext we investigatedwhether there had been relationshipsbetween userrsquos personality traits Since not all users had thesame level of interactions consequently we first groupedusers according to different degrees of userrsquos personalitytraits As an example if a user has positive level of agree-ableness another user has positive level of agreeableness aswell then they will be divided into a group else if a user haspositive level of agreeableness another user has negative levelof agreeableness and they will not be divided into a groupThen we explored cooccurrences between personality traitsin each group simultaneously which is shown in Figure 2

It can be observed intuitively that a significant proportionof users have negative level of extraversion positive levelof openness or positive level of agreeableness It is alsonoteworthy that positive level of extraversion has less overlapwith negative level of openness negative level of consci-entiousness and negative level of agreeableness Besidespositive level of neuroticism has less overlap with positivelevel of conscientiousness positive level of agreeableness andnegative level of openness Furthermore there are bits ofusers that have positive level of extraversion and positive levelof neuroticism simultaneously

However the above cooccurrences may be due to the apriori statistics of each trait For instance if there are moreusers with negative level of neuroticism positive level ofopenness and positive level of agreeableness than othersit is more likely to have more users with cooccurrencebetween negative level of neuroticism and positive level ofopenness negative level of neuroticism and positive level ofagreeableness and positive level of openness and positivelevel of agreeableness than others as it is shown in Figure 2Therefore since users were grouped according to differentdegrees of userrsquos personality traits in order to investigatewhether the above phenomenon meant that there werepositive or negative correlations between different degrees

of personality traits in each group we treated scores ofa certain personality trait according to which users weredivided into the group and scores of another personalitytrait with a certain degree as two random variables followedby employing Kendall correlation coefficient which wascalculated in (3) to analyze real correlations between themThe results are presented in Table 1

As it can be seen from Table 1 we can observe anegative correlation between negative level of extraversionand positive level of neuroticism in other words people thatscore high on extraversion have less possibility to score highon neuroticism as well In psychology a person with negativelevel of extraversion can be explained as solitary and a personwith negative level of conscientiousnessmeans being carelessConsequently negative level of extraversion and negativelevel of conscientiousness are not in line with positive level ofneuroticism which tends to experience unpleasant emotionseasily Another factor is social desirability Moreover agree-ableness extraversion conscientiousness and openness aremore or less ldquodesirablerdquo whereas neuroticism is quite clearlynegative Thus generally there are negative correlationsbetween neuroticism and other personality traits

What is more it is inconsistent with Figure 2 that thereis a positive correlation between positive level of neuroticismand positive level of extraversion It may be explained thata person with positive level of extraversion tends to beenthusiastic and talkative which is compatible with positivelevel of neuroticism In addition in line with Figure 2since negative level of openness shows cautiousness it hasa positive correlation with negative level of extraversionand positive level of agreeableness which is a tendency tobe compassionate However there is a positive correlationbetween negative level of openness and positive level ofneuroticism which is not in keeping with Figure 2

Since there are some contradictions between Figure 2and Table 1 we leveraged Jensen-Shannon divergence [45]to further analyze correlations between userrsquos personalitytraits Jensen-Shannon divergence between two probabilitydistributions is calculated as

119863JS (119875 119876) =12(119863KL (119875 119877) +119863KL (119876 119877)) (14)

where 119875 and 119876 denote two probability distributions 119877denotes an average distribution of 119875 and 119876 and 119863KL(119875 119877)denotes Kullback-Leibler divergence [46] between 119875 and 119877which is calculated with

119863KL (119875 119877) = sum119894

119875 (119894) log 119875 (119894)119877 (119894)

(15)

where 119875(119894) and 119877(119894) denote the 119894th value of 119875 and 119877 Theresults are presented in Table 2

In keeping with Figure 2 and Table 1 Jensen-Shannondivergences between negative level of neuroticism and posi-tive level of agreeableness positive level of conscientiousnessand positive level of openness are larger It is because ofthe fact that a person with negative level of neuroticismshows confidence on everything and positive level of open-ness reflects degree of intellectual curiosity creativity and

Mathematical Problems in Engineering 9

Table 1 Kendall correlation coefficients between personality traits 119901 and 119899 stand for positive level and negative level of a certain personalitytrait The bold number is the highest Kendall correlation coefficients and the italic number is the lowest Kendall correlation coefficients Thebigger Kendall correlation coefficient is the better the consistency is

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 022 minus003 009 minus015 minus006 minus013 007 001119899 minus023 minus018 004 004 004 018 007 022

NEU 119901 minus009 minus017 010 minus021 minus003 027119899 minus014 minus013 002 minus001 minus006 minus010

AGR 119901 minus007 007 016 029119899 minus008 016 006 002

CON 119901 minus001 minus010119899 007 minus007

OPN 119901

119899

PEXT and PNEUPEXT and PAGR

PEXT and PCONPEXT and POPNPEXT and NNEUPEXT and NAGRPEXT and NCONPEXT and NOPNPNEU and PAGR

PNEU and PCONPNEU and POPNPNEU and NEXTPNEU and NAGRPNEU and NCONPNEU and NOPNPAGR and PCONPAGR and POPNPAGR and NEXTPAGR and NNEUPAGR and NCONPAGR and NOPNPCON and POPNPCON and NEXTPCON and NNEUPCON and NAGRPCON and NOPNPOPN and NEXTPOPN and NNEUPOPN and NAGRPOPN and NCONNEXT and NNEUNEXT and NAGRNEXT and NCONNEXT and NOPNNNEU and NAGRNNEU and NCONNNEU and NOPNNAGR and NCONNAGR and NOPNNCON and NOPN

Proportion ()

000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000()

Figure 2The proportion of users who have a certain pair of personality traits simultaneously PEXT PNEU PAGR PCON and POPN standfor positive level of extraversion neuroticism agreeableness conscientiousness and openness NEXT NNEU NAGR NCON and NOPNstand for negative level of extraversion neuroticism agreeableness conscientiousness and openness

10 Mathematical Problems in Engineering

Table 2 Jensen-Shannon divergence between personality traits 119901and 119899 stand for positive level and negative level of a certain personal-ity traitThe bold number is the highest Jensen-Shannon divergenceand the italic number is the lowest Jensen-Shannon divergenceSmaller Jensen-Shannon divergence means better consistency

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 058 1479 044 219 070 236 073 084119899 479 449 521 236 561 191 1043 211

NEU 119901 080 212 110 246 298 040119899 1978 294 1697 305 2473 361

AGR 119901 077 329 082 091119899 340 086 463 125

CON 119901 115 117119899 616 140

OPN 119901

119899

a preference for novelty and variety a person has alongwith positive level of agreeableness and positive level ofconscientiousness which are compatible Furthermore it isin conformity with Table 1 that positive level of extraversionhas smaller Jensen-Shannon divergences with positive levelof neuroticism and positive level of agreeableness besidespositive level of neuroticism has a smaller Jensen-Shannondivergence with negative level of openness

In addition Soto et al [29] demonstrated convergent anddiscriminate correlations for the Big Five Inventory Soto andJohn [30] also illustrated that there was strong convergencebetween each BFI facet scale and corresponding NEO PI-Rfacet Moreover Park et al [28] demonstrated convergencewith self-reports of personality at the domain- and facet-levelIn summary although utilizing different Big Five measuresit can be found that correlations in myPersonality datasetare also in keeping with the findings from other samplesHence correlations between personality traits are expectedand actually prove that myPersonality results are validSo leveraging prior knowledge about relationships betweenpersonality traits may contribute to a better prediction onpersonality traits The results provide a better theoreticalsupport for our proposed method

43 Evaluation Metric Since most researchers exploringon personality recognition leveraged different measures toevaluate their experiments on various datasets it was hard tocompletely appraise their performance and quality Howevera common dataset which was annotated with gold standardpersonality labels was available in the Workshop on Com-putational Personality Recognition (Shared Task) users werefree to split the training and test sets as they wish andprecision recall and 1198651-measure were suggested to be usedto evaluate results of predictions So in this paper for morereliable results the proposed method was also evaluated withstratified 5-fold cross-validation Since there was no trainingset in unsupervised learning the common dataset was splitinto folds and in order to avoid overfitting the same author

Table 3 Precision recall and 1198651-measure of adaptive conditionalprobability-based predicting model (in bold best performance)

Precision Recall 1198651-measureEXT 093 076 083NEU 086 067 075AGR 096 074 083CON 089 070 078OPN 091 079 084Mean 091 0732 0806

did not appear twofold at the same timeThemean precisionrecall and 1198651-measure are calculated as follows

Precision = tptp + fp

Recall = tptp + fn

1198651-measure = 2 times 119875 times 119877119875 + 119877

(16)

where tp stands for the number of positive samples thatare classified correctly fp stands for the number of positivesamples that are classified incorrectly and fn stands for thenumber of negative samples that are classified incorrectly

44 Analysis of Impacts of Different Factors First we carriedon experiments with our proposed method Scores of preci-sion recall and 1198651-measure are shown in Table 3 includingmean scores of all personality traits

441 Impacts of Different Categories of Feature Sets In thissection we conducted experiments on social network fea-tures linguistic features emotional statistical features topicfeatures and all features respectively Due to space restric-tions Figure 3 only illustrates 1198651-measure of all personalitytraits along with mean 1198651-measure of them

It can be observed that experiments which are conductedon linguistic features emotional statistical features and topicfeatures result in unsatisfactory performance with respectto social network features and social network features havethe best classification performance for extraversion which isconsistent with the conclusion in [33] Since social networkfeatures reflect a userrsquos pattern of social status and habitswhich are relatively important decisive factors in userrsquospersonality traits as a result they provide more informationthan other types of features Nevertheless whether more orless each kind of characteristic provides unique informationin userrsquos personality traits prediction Therefore carryingon experiment with combination of social network featureslinguistic features emotional statistical features and topicfeatures can achieve better performance

442 Impact of Weighted Features Additionally we con-ducted experiments on unweighted features Here we onlypresented the most successful results in Table 4 includingmean scores of all personality traits

Mathematical Problems in Engineering 11

1

featuresEssential

featuresLinguistic

featuresEmotional

featuresTopic

featuresAll

Categories of features

EXTNEUAGR

CONOPNMEAN

09

08

07

06

05

04

03

F-s

core

s

Figure 3 1198651-measures of five personality traits along with mean1198651-measures of them with different categories of feature sets

Table 4 Precision recall and 1198651-measure of predictingmodel withunweighted features (in bold best performance)

Precision Recall 1198651-measureEXT 085 076 082NEU 092 068 076AGR 094 064 076CON 091 057 068OPN 087 078 083Mean 090 0686 077

Table 5 Precision recall and 1198651-measure of predictingmodel withunified thresholds (in bold best performance)

Precision Recall 1198651-measureEXT 062 054 057NEU 080 016 025AGR 057 073 063CON 057 077 060OPN 070 100 082Mean 0652 064 0574

From Tables 3 and 4 we can observe that the best resultsfor myPersonality dataset are achieved after distributingweights to features It illustrates that considering correlationsbetween features and userrsquos personality traits can bring per-formance gain to the prediction task since there are limitedfeatures that may remain useful for userrsquos personality traitsprediction

443 Impact of Dynamic Updated Thresholds of PersonalityTraits In addition we predicted userrsquos personality traits withunified thresholds as well The results are summarized inTable 5

Multilabel learning task is to predict one or more cat-egories for each instance In the existing algorithms suchas PT5 method proposed by Tsoumakas and Katakis [34]

Table 6 Precision of different methods on Facebook dataset Thebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 093 086 096 089 091 0910Verhoevenet al [42] SVM 079 071 067 072 087 0752

Farnadi etal [33]

SVM kNNNB 058 054 050 055 060 0554

Alam et al[43]

SVM BLRmNB 058 059 059 059 060 0590

Tomlinsonet al [44] LR NA NA NA NA NA NA

generally thresholdswere unified to the same valueHoweverall categories employing a unified minimum threshold arenot appropriate If threshold is set too high not all categoriestags will be predicted on the other hand if threshold is settoo low too many classes will be predicted in results Hencein this paper we updated thresholds of personality traitsdynamically according to error rates of them Comparedto Table 3 based on thresholds obtained through updatingdynamically our method can achieve better results

45 Performance EvaluationwithDifferentMethods Asmen-tioned in Section 41 in the Workshop on ComputationalPersonality Recognition (Shared Task) different systems forpersonality recognition from text and network features ona common benchmark have been compared Verhoeven etal [42] proposed a meta-learning approach which can beextended to certain component classifiers from other genreswith other class systems or even from other languagesFarnadi et al [33] leveraged SVM 119896 nearest neighbors(kNN) [17] and Naıve Bayes (NB) for automatic recognitionof personality traits from usersrsquo Facebook statues updatesrespectively even with a small set of training dataset it couldachieve better results than most baseline algorithms On thebasis of a set of features extracted from Facebook datasetAlam et al [43] explored suitability and performance ofseveral classification techniques which were SVM BayesianLogistic Regression (BLR) [47] andMultinomial Naıve Bayes(mNB) [48] Tomlinson et al [44] used ranking algorithmsfor feature selection and Logistic Regression (LR) [49] aslearning algorithms which achieved a high performance Inthis paper we tested our proposed method with the sameset than the participants in the Workshop on ComputationalPersonality Recognition (Shared Task) In terms of precisionrecall and 1198651-measure a comparison of the works describedabove as well as our proposed method (denoted by CP) issummarized in Tables 6 7 and 8

From Tables 6 7 and 8 the experimental results indicatethat openness is the easiest personality trait to be predictedby Facebook features with the above methods because thereare more users with positive level of openness than otherpersonality traits in Facebook dataset Since Verhoeven et al[42] trained an ensemble SVM model based on both Face-book and Essays personality traits dataset as a consequence

12 Mathematical Problems in Engineering

Table 7 Recall of different methods on Facebook dataset The bestperformance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 076 067 074 070 079 0732Verhoevenet al [42] SVM 079 072 068 072 087 0756

Farnadi etal [33]

SVMkNN NB 061 053 050 054 070 0576

Alam et al[43]

SVMBLR mNB 058 058 059 059 060 0588

Tomlinsonet al [44] LR NA NA NA NA NA NA

Table 8 1198651-measure of different methods on Facebook datasetThebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 083 075 083 078 084 0806Verhoevenet al [42] SVM 079 070 067 072 086 0748

Farnadi etal [33]

SVMkNN NB 056 052 050 054 061 0546

Alam et al[43]

SVMBLR mNB 058 058 058 059 060 0586

Tomlinsonet al [44] LR NA NA NA NA NA 0630

when tested on the same Facebook dataset it can achievebetter results than our proposed method Furthermorewhen trained and tested on the same Facebook dataset ourmethod outperforms the methods in [33 43 44] which donot take weighted features dynamic updated thresholds ofpersonality traits and correlations between personality traitsinto consideration

5 Conclusion

In this paper we studied the problem of exploiting interde-pendencies between userrsquos personality traits for predictinguserrsquos personality traits First after analyzing importance offeatures we conducted experiments on Facebook datasetto demonstrate the existence of correlations between userrsquospersonality traits Bearing this in mind an unsupervisedframework adaptive conditional probability-based predict-ing model was then proposed to predict userrsquos Big Fivepersonality traits based on importance of features dynamicupdated thresholds of personality traits and prior knowledgeabout correlations between personality traits Furthermorewe compared our results with the ones achieved by othersin the Workshop on Computational Personality Recognition(Shared Task) on the same dataset In general the experimen-tal results demonstrated the effectiveness of our proposedframework

In future work we will speculate on what directions canbe undertaken to ameliorate its performance with respect

to time complexity so as to better apply it to a big dataenvironment as in Facebook monitoring Besides in orderto make our framework applicable to dynamic networksbetter we will explore combining time series analysis withpersonality traits predicting algorithm to capture dynamicevolution process of information and network structureFurthermore since social network features (described inSection 311) are specific for our dataset they may be difficult(or even impossible) to obtain in another dataset hencewe will conduct experiments on other datasets such as theTweeter corpus of PAN shared task (httppanwebisde)on author profiling where also personality is considered tofurther validate the effectiveness of the proposed frameworkbased on different features

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China under Grant no 61300148 the Scientificand Technological Break-Through Program of Jilin ProvinceunderGrant no 20130206051GX the Science andTechnologyDevelopment Program of Jilin Province under Grant no20130522112JH the Science Foundation for China Postdoc-tor under Grant no 2012M510879 and the Basic Scien-tific Research Foundation for the Interdisciplinary Researchand Innovation Project of Jilin University under Grant no201103129

References

[1] M Oussalah F Bhat K Challis and T Schnier ldquoA softwarearchitecture for Twitter collection search and geolocationservicesrdquo Knowledge-Based Systems vol 37 pp 105ndash120 2013

[2] R R McCrae and P T Costa ldquoValidation of the 5-factor modelof personality across instruments and observersrdquo Journal ofPersonality and Social Psychology vol 52 no 1 pp 81ndash90 1987

[3] G Saucier and L R Goldberg ldquoThe structure of personalityattributesrdquo in Personality and Work Reconsidering the Role ofPersonality in Organizations M R Barrick and A M RyanEds pp 1ndash29 Jossey-Bass 2003

[4] B Caci M Cardaci M E Tabacchi and F Scrima ldquoPersonalityvariables as predictors of facebook usagerdquoPsychological Reportsvol 114 no 2 pp 528ndash539 2014

[5] M Kosinski Y Bachrach P Kohli D Stillwell and T GraepelldquoManifestations of user personality in website choice andbehaviour on online social networksrdquoMachine Learning vol 95no 3 pp 357ndash380 2014

[6] M-G Cojocaru H Thille E Thommes D Nelson and SGreenhalgh ldquoSocial influence and dynamic demand for newproductsrdquoEnvironmentalModellingamp Software vol 50 pp 169ndash185 2013

[7] F Durupinar N Pelechano J Allbeck U Gudukbay andN Badler ldquoThe impact of the OCEAN personality modelon the perception of crowdsrdquo IEEE Computer Graphics andApplications vol 31 no 3 pp 22ndash31 2011

Mathematical Problems in Engineering 13

[8] X Feng ldquoStudy on customer personality characteristics andrelationship outcomes using SEM analysisrdquo in Proceedings ofthe International Conference on Mechatronics and InformationTechnology pp 841ndash844 2013

[9] J G Lee SMoon andK Salamatian ldquoModeling and predictingthe popularity of online contents with Cox proportional hazardregression modelrdquo Neurocomputing vol 76 no 1 pp 134ndash1452012

[10] J W Stoughton L F Thompson and A W Meade ldquoBigfive personality traits reflected in job applicantsrsquo social mediapostingsrdquo Cyberpsychology Behavior and Social Networkingvol 16 no 11 pp 800ndash805 2013

[11] D A Cobb-Clark and S Schurer ldquoThe stability of big-fivepersonality traitsrdquo Economics Letters vol 115 no 1 pp 11ndash152012

[12] W Mischel ldquoToward an integrative science of the personrdquoAnnual Review of Psychology vol 55 pp 1ndash22 2004

[13] S Argamon S Dhawle M Koppel and J Pennebaker ldquoLexicalpredictors 19 of personality typerdquo in Proceedings of the JointAnnual Meeting of the Interface and the Classification Society ofNorth America 2005

[14] F Mairesse M AWalker M R Mehl and R K Moore ldquoUsinglinguistic cues for the automatic recognition of personality inconversation and textrdquo Journal of Artificial Intelligence Researchvol 30 pp 457ndash500 2007

[15] K Moore and J C McElroy ldquoThe influence of personalityon Facebook usage wall postings and regretrdquo Computers inHuman Behavior vol 28 no 1 pp 267ndash274 2012

[16] J W Pennebaker and L A King ldquoLinguistic styles languageuse as an individual differencerdquo Journal of Personality and SocialPsychology vol 77 no 6 pp 1296ndash1312 1999

[17] J Han and M Kamber Data Mining Concepts and TechniquesThe Morgan Kaufmann Series in Data Management Systemsedited by J Gray Morgan Kaufmann Boston Mass USA 3rdedition 2011

[18] J Golbeck C Robles and K Turner ldquoPredicting personalitywith social mediardquo in Proceedings of the 29th Annual CHIConference on Human Factors in Computing Systems (CHI rsquo11)pp 253ndash262 ACM May 2011

[19] J RQuinlan ldquoLearningwith continuous classesrdquo inProceedingsof the 5thAustralian Joint Conference onArtificial Intelligence (Airsquo92) pp 343ndash348 Hobart Australia November 1992

[20] S Roweis and Z Ghahramani ldquoA unifying review of lineargaussian modelsrdquo Neural Computation vol 11 no 2 pp 305ndash345 1999

[21] S Bai T Zhu and L Cheng ldquoBig-five personality pre-diction based on user behaviors at social network sitesrdquohttparxivorgpdf12044809v1pdf

[22] J Oberlander and S Nowson ldquoWhose thumb is it anywayclassifying author personality from weblog textrdquo in Proceedingsof the 44th Annual Meeting of the Association for ComputationalLinguistics pp 627ndash634 2006

[23] T Nguyen D Q Phung B Adams and S Venkatesh ldquoTowardsdiscovery of influence and personality traits through social linkpredictionrdquo in Proceedings of the International Conference onWeblogs and Social Media pp 566ndash569 Barcelona Spain July2011

[24] S T Bai B B Hao A Li S Yuan R Gao and T S ZhuldquoPredicting big five personality traits of microblog usersrdquo inProceedings of the 12th IEEEWICACM International JointConferences on Web Intelligence and Intelligent Agent Technolog(WI rsquo13) pp 501ndash508 November 2013

[25] R Sun and N Wilson ldquoA model of personality should be acognitive architecture itselfrdquo Cognitive Systems Research vol29-30 no 1 pp 1ndash30 2014

[26] A Ortigosa R M Carro and J I Quiroga ldquoPredicting userpersonality by mining social interactions in Facebookrdquo Journalof Computer and System Sciences vol 80 no 1 pp 57ndash71 2014

[27] O P John L P Naumann and C J Soto ldquoParadigm shift tothe integrative Big Five trait taxonomy history measurementand conceptualissuesrdquo in Handbook of Personality Theory andResearch O P John R W Robins and L A Pervin Eds pp114ndash158 Guilford Press New York NY USA 3rd edition 2008

[28] G Park H A Schwartz J C Eichstaedt et al ldquoAutomaticpersonality assessment through social media languagerdquo Journalof Personality and Social Psychology vol 108 no 6 pp 934ndash9522015

[29] C J Soto O P John S D Gosling and J Potter ldquoAge differencesin personality traits from 10 to 65 big five domains and facets ina large cross-sectional samplerdquo Journal of Personality and SocialPsychology vol 100 no 2 pp 330ndash348 2011

[30] C J Soto and O P John ldquoTen facet scales for the BigFive Inventory convergence with NEO PI-R facets self-peeragreement and discriminant validityrdquo Journal of Research inPersonality vol 43 no 1 pp 84ndash90 2009

[31] M V Kosti R Feldt and L Angelis ldquoPersonality emotionalintelligence and work preferences in software engineering anempirical studyrdquo Information and Software Technology vol 56no 8 pp 973ndash990 2014

[32] H A Schwartz J C Eichstaedt M L Kern et al ldquoPersonalitygender and age in the language of social media the open-vocabulary approachrdquo PLoS ONE vol 8 no 9 Article IDe73791 2013

[33] G Farnadi S ZoghbiM FMoens andMD Cock ldquoRecognis-ing personality traits using facebook status updatesrdquo in Proceed-ings of the Workshop on Computational Personality RecognitionJuly 2013 httpcliccimecunitnitfabiowcpr13farnadi wcpr-13pdf

[34] G Tsoumakas and I Katakis ldquoMulti-label classification anoverviewrdquo International Journal of Data Warehousing and Min-ing vol 3 no 3 pp 1ndash13 2007

[35] M Kosinski D Stillwell and T Graepel ldquoPrivate traits andattributes are predictable from digital records of human behav-iorrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 110 no 15 pp 5802ndash5805 2013

[36] M Belkin and P Niyogi ldquoLaplacian eigenmaps for dimension-ality reduction and data representationrdquo Neural Computationvol 15 no 6 pp 1373ndash1396 2003

[37] R Socher J Bauer C D Manning and Y N Andrew ldquoParsingwith compositional vector grammarsrdquo in Proceedings of the 51stAnnual Meeting of the Association for Computational Linguistics(ACL rsquo13) pp 455ndash465 Sofia Bulgaria August 2013

[38] K Kazama M Imada and K Kashiwagi ldquoCharacteristics ofinformation diffusion in blogs in relation to information sourcetyperdquo Neurocomputing vol 76 no 1 pp 84ndash92 2012

[39] M B David Y N Andrew and I J Michael ldquoLatent Dirichletallocationrdquo Journal ofMachine Learning Research vol 3 no 4-5pp 993ndash1022 2003

[40] P T Costa andR RMcCraeRevisedNEOPersonality Inventory(NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Profes-sional Manual Psychological Assessment Resources 1992

[41] R L Atkinson C A Richard E S Edward J B Daryland S Nolen-Hoeksema Hilgardrsquos Introduction to Psychology

14 Mathematical Problems in Engineering

Harcourt College Publishers Orlando Fla USA 13th edition2000

[42] B Verhoeven W Daelemans and T D Smedt ldquoEnsemblemethods for personality recognitionrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 35ndash38Boston Mass USA July 2013

[43] F Alam A Stepanov and G Riccardi ldquoPersonality traitsrecognition on social networkmdashfacebookrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 6ndash9Boston Mass USA July 2013

[44] M T Tomlinson D Hinote and D B Bracewell ldquoPredictingconscientiousness through semantic analysis of facebook postsrdquoin Proceedings of the Workshop on Computational PersonalityRecognition pp 31ndash34 July 2013

[45] D M Endres and J E Schindelin ldquoA newmetric for probabilitydistributionsrdquo IEEETransactions on InformationTheory vol 49no 7 pp 1858ndash1860 2003

[46] S Kullback and R A Leibler ldquoOn information and sufficiencyrdquoThe Annals of Mathematical Statistics vol 22 no 1 pp 79ndash861951

[47] A Genkin D D Lewis and D Madigan ldquoLarge-scale Bayesianlogistic regression for text categorizationrdquo Technometrics vol49 no 3 pp 291ndash304 2007

[48] A K McCallum and K Nigam ldquoA comparison of event modelsfor naive bayes text classificationrdquo in Proceedings of the AAAI-98 Workshop on Learning for Text Categorization pp 137ndash142Madison Wis USA July 1998

[49] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article A Novel Adaptive Conditional Probability ...downloads.hindawi.com/journals/mpe/2015/472917.pdf · Research Article A Novel Adaptive Conditional Probability-Based

Mathematical Problems in Engineering 7

Input user 119894 users set 119880 size of the nearest neighbors set 119896 unvisited personality traits set UPOutput user 119894rsquos personality traits set PT

119894

(1) 119891 larr 119891119886119897119904119890

lowast 119891 stands for a flag if any of the thresholds of personality traits is updated then 119891 equals to trueotherwise 119891 equals to false lowast

(2) For each personality trait 119901120579isin UP do

(3) set temporary error rate Temp(119901120579) as 0

(4) End for(5) For each personality trait 119901

120579isin UP do

(6) set initial threshold 119879(119901120579) as 05

(7) End for(8) PT

119894larr 0

(9) UPlarr all of the Big Five personality traits(10) For each feature 119891

ℎisin user 119894rsquos feature set do

(11) calculate 119891ℎrsquos weight according to (8)

(12) End for(13) For user 119895 isin 119880 and 119895 = 119894 do(14) calculate Dis(119894 119895) according to (9)(15) End for(16) sort Dis(119894 1)Dis(119894 2) in ascending order(17) 119873(119894) larr select top 119896 users in Dis(119894 1)Dis(119894 2) (18) For each personality trait 119901

120579isin UP do

(19) 119899119901120579

larr the number of users who have positive level of personality trait 119901120579in119873(119894)

(20) End for(21) While UP = 0 do(22) 119901

120579larr UPpoll()

(23) If PT119894= 0 then

(24) calculate 119891(119894 119901120579) according to (10)

(25) End if(26) Else if PT

119894= 0 then

(27) calculate 119891(119894 119901120579) according to (11)

(28) End if(29) If 119891(119894 119901

120579) ge 119879(119901

120579) then

(30) 119875119879119894larr positive level of 119901

120579

(31) End if(32) Else if 119891(119894 119901

120579) lt 119879(119901

120579) then

(33) PT119894larrnegative level of 119901

120579

(34) End if(35) End while(36) 119891 larr 119891119886119897119904119890

(37) For each personality trait 119901120579isin PT119894do

(38) calculate 119901120579rsquos error rate Err(119901

120579) according to (12)

(39) If |Err(119901120579) minus Temp(119901

120579)| gt 120575 then 120575 denotes a constant which is small enough

(40) 119891 larr 119905119903119906119890

(41) Temp(119901120579) larr Err(119901

120579)

(42) update 119901120579rsquos threshold 119879(119901

120579) according to (13)

(43) UPlarr 119901120579

(44) End if(45) End for(46) If 119891 == 119905119903119906119890 then(47) Go to step (21)(48) End if(49) Else if 119891 == 119891119886119897119904119890 then(50) Go to step (52)(51) End if(52) Return PT

119894

Algorithm 1 Adaptive conditional probability-based predicting model for userrsquos personality traits

8 Mathematical Problems in Engineering

of data With consent usersrsquo psychological and Facebookprofiles are recorded for research purposes Currently thedatabase contains more than 6000000 test results togetherwith more than 4000000 individual Facebook profilesIn this paper we adopted the dataset that was provided inthe Workshop on Computational Personality Recognition(Shared Task) (httpmypersonalityorgwikilibexefetchphpmedia=wikimypersonality finalzip) [35] The datasetwas a subset of myPersonality database including 250users which had both information about personality traits(annotated with usersrsquo personality scores and gold standardpersonality labelswhichwere self-assessments obtained usinga 100-item long version of the IPIP personality questionnaire(httpipiporiorgnewFinding Labeling IPIP Scaleshtm))and social network structure (network size betweennesscentrality density brokerage and transitivity) along withtheir 9900 status updates in raw text With the aid of thisstandard dataset we can compare the performance of ourproposed method with othersrsquo personality recognitionsystems on a common benchmark

42 Analysis of Correlations between Personality Traits Pre-vious works were usually predicting userrsquos personality traitswithout considering interdependencies between them In thiscontext we investigatedwhether there had been relationshipsbetween userrsquos personality traits Since not all users had thesame level of interactions consequently we first groupedusers according to different degrees of userrsquos personalitytraits As an example if a user has positive level of agree-ableness another user has positive level of agreeableness aswell then they will be divided into a group else if a user haspositive level of agreeableness another user has negative levelof agreeableness and they will not be divided into a groupThen we explored cooccurrences between personality traitsin each group simultaneously which is shown in Figure 2

It can be observed intuitively that a significant proportionof users have negative level of extraversion positive levelof openness or positive level of agreeableness It is alsonoteworthy that positive level of extraversion has less overlapwith negative level of openness negative level of consci-entiousness and negative level of agreeableness Besidespositive level of neuroticism has less overlap with positivelevel of conscientiousness positive level of agreeableness andnegative level of openness Furthermore there are bits ofusers that have positive level of extraversion and positive levelof neuroticism simultaneously

However the above cooccurrences may be due to the apriori statistics of each trait For instance if there are moreusers with negative level of neuroticism positive level ofopenness and positive level of agreeableness than othersit is more likely to have more users with cooccurrencebetween negative level of neuroticism and positive level ofopenness negative level of neuroticism and positive level ofagreeableness and positive level of openness and positivelevel of agreeableness than others as it is shown in Figure 2Therefore since users were grouped according to differentdegrees of userrsquos personality traits in order to investigatewhether the above phenomenon meant that there werepositive or negative correlations between different degrees

of personality traits in each group we treated scores ofa certain personality trait according to which users weredivided into the group and scores of another personalitytrait with a certain degree as two random variables followedby employing Kendall correlation coefficient which wascalculated in (3) to analyze real correlations between themThe results are presented in Table 1

As it can be seen from Table 1 we can observe anegative correlation between negative level of extraversionand positive level of neuroticism in other words people thatscore high on extraversion have less possibility to score highon neuroticism as well In psychology a person with negativelevel of extraversion can be explained as solitary and a personwith negative level of conscientiousnessmeans being carelessConsequently negative level of extraversion and negativelevel of conscientiousness are not in line with positive level ofneuroticism which tends to experience unpleasant emotionseasily Another factor is social desirability Moreover agree-ableness extraversion conscientiousness and openness aremore or less ldquodesirablerdquo whereas neuroticism is quite clearlynegative Thus generally there are negative correlationsbetween neuroticism and other personality traits

What is more it is inconsistent with Figure 2 that thereis a positive correlation between positive level of neuroticismand positive level of extraversion It may be explained thata person with positive level of extraversion tends to beenthusiastic and talkative which is compatible with positivelevel of neuroticism In addition in line with Figure 2since negative level of openness shows cautiousness it hasa positive correlation with negative level of extraversionand positive level of agreeableness which is a tendency tobe compassionate However there is a positive correlationbetween negative level of openness and positive level ofneuroticism which is not in keeping with Figure 2

Since there are some contradictions between Figure 2and Table 1 we leveraged Jensen-Shannon divergence [45]to further analyze correlations between userrsquos personalitytraits Jensen-Shannon divergence between two probabilitydistributions is calculated as

119863JS (119875 119876) =12(119863KL (119875 119877) +119863KL (119876 119877)) (14)

where 119875 and 119876 denote two probability distributions 119877denotes an average distribution of 119875 and 119876 and 119863KL(119875 119877)denotes Kullback-Leibler divergence [46] between 119875 and 119877which is calculated with

119863KL (119875 119877) = sum119894

119875 (119894) log 119875 (119894)119877 (119894)

(15)

where 119875(119894) and 119877(119894) denote the 119894th value of 119875 and 119877 Theresults are presented in Table 2

In keeping with Figure 2 and Table 1 Jensen-Shannondivergences between negative level of neuroticism and posi-tive level of agreeableness positive level of conscientiousnessand positive level of openness are larger It is because ofthe fact that a person with negative level of neuroticismshows confidence on everything and positive level of open-ness reflects degree of intellectual curiosity creativity and

Mathematical Problems in Engineering 9

Table 1 Kendall correlation coefficients between personality traits 119901 and 119899 stand for positive level and negative level of a certain personalitytrait The bold number is the highest Kendall correlation coefficients and the italic number is the lowest Kendall correlation coefficients Thebigger Kendall correlation coefficient is the better the consistency is

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 022 minus003 009 minus015 minus006 minus013 007 001119899 minus023 minus018 004 004 004 018 007 022

NEU 119901 minus009 minus017 010 minus021 minus003 027119899 minus014 minus013 002 minus001 minus006 minus010

AGR 119901 minus007 007 016 029119899 minus008 016 006 002

CON 119901 minus001 minus010119899 007 minus007

OPN 119901

119899

PEXT and PNEUPEXT and PAGR

PEXT and PCONPEXT and POPNPEXT and NNEUPEXT and NAGRPEXT and NCONPEXT and NOPNPNEU and PAGR

PNEU and PCONPNEU and POPNPNEU and NEXTPNEU and NAGRPNEU and NCONPNEU and NOPNPAGR and PCONPAGR and POPNPAGR and NEXTPAGR and NNEUPAGR and NCONPAGR and NOPNPCON and POPNPCON and NEXTPCON and NNEUPCON and NAGRPCON and NOPNPOPN and NEXTPOPN and NNEUPOPN and NAGRPOPN and NCONNEXT and NNEUNEXT and NAGRNEXT and NCONNEXT and NOPNNNEU and NAGRNNEU and NCONNNEU and NOPNNAGR and NCONNAGR and NOPNNCON and NOPN

Proportion ()

000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000()

Figure 2The proportion of users who have a certain pair of personality traits simultaneously PEXT PNEU PAGR PCON and POPN standfor positive level of extraversion neuroticism agreeableness conscientiousness and openness NEXT NNEU NAGR NCON and NOPNstand for negative level of extraversion neuroticism agreeableness conscientiousness and openness

10 Mathematical Problems in Engineering

Table 2 Jensen-Shannon divergence between personality traits 119901and 119899 stand for positive level and negative level of a certain personal-ity traitThe bold number is the highest Jensen-Shannon divergenceand the italic number is the lowest Jensen-Shannon divergenceSmaller Jensen-Shannon divergence means better consistency

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 058 1479 044 219 070 236 073 084119899 479 449 521 236 561 191 1043 211

NEU 119901 080 212 110 246 298 040119899 1978 294 1697 305 2473 361

AGR 119901 077 329 082 091119899 340 086 463 125

CON 119901 115 117119899 616 140

OPN 119901

119899

a preference for novelty and variety a person has alongwith positive level of agreeableness and positive level ofconscientiousness which are compatible Furthermore it isin conformity with Table 1 that positive level of extraversionhas smaller Jensen-Shannon divergences with positive levelof neuroticism and positive level of agreeableness besidespositive level of neuroticism has a smaller Jensen-Shannondivergence with negative level of openness

In addition Soto et al [29] demonstrated convergent anddiscriminate correlations for the Big Five Inventory Soto andJohn [30] also illustrated that there was strong convergencebetween each BFI facet scale and corresponding NEO PI-Rfacet Moreover Park et al [28] demonstrated convergencewith self-reports of personality at the domain- and facet-levelIn summary although utilizing different Big Five measuresit can be found that correlations in myPersonality datasetare also in keeping with the findings from other samplesHence correlations between personality traits are expectedand actually prove that myPersonality results are validSo leveraging prior knowledge about relationships betweenpersonality traits may contribute to a better prediction onpersonality traits The results provide a better theoreticalsupport for our proposed method

43 Evaluation Metric Since most researchers exploringon personality recognition leveraged different measures toevaluate their experiments on various datasets it was hard tocompletely appraise their performance and quality Howevera common dataset which was annotated with gold standardpersonality labels was available in the Workshop on Com-putational Personality Recognition (Shared Task) users werefree to split the training and test sets as they wish andprecision recall and 1198651-measure were suggested to be usedto evaluate results of predictions So in this paper for morereliable results the proposed method was also evaluated withstratified 5-fold cross-validation Since there was no trainingset in unsupervised learning the common dataset was splitinto folds and in order to avoid overfitting the same author

Table 3 Precision recall and 1198651-measure of adaptive conditionalprobability-based predicting model (in bold best performance)

Precision Recall 1198651-measureEXT 093 076 083NEU 086 067 075AGR 096 074 083CON 089 070 078OPN 091 079 084Mean 091 0732 0806

did not appear twofold at the same timeThemean precisionrecall and 1198651-measure are calculated as follows

Precision = tptp + fp

Recall = tptp + fn

1198651-measure = 2 times 119875 times 119877119875 + 119877

(16)

where tp stands for the number of positive samples thatare classified correctly fp stands for the number of positivesamples that are classified incorrectly and fn stands for thenumber of negative samples that are classified incorrectly

44 Analysis of Impacts of Different Factors First we carriedon experiments with our proposed method Scores of preci-sion recall and 1198651-measure are shown in Table 3 includingmean scores of all personality traits

441 Impacts of Different Categories of Feature Sets In thissection we conducted experiments on social network fea-tures linguistic features emotional statistical features topicfeatures and all features respectively Due to space restric-tions Figure 3 only illustrates 1198651-measure of all personalitytraits along with mean 1198651-measure of them

It can be observed that experiments which are conductedon linguistic features emotional statistical features and topicfeatures result in unsatisfactory performance with respectto social network features and social network features havethe best classification performance for extraversion which isconsistent with the conclusion in [33] Since social networkfeatures reflect a userrsquos pattern of social status and habitswhich are relatively important decisive factors in userrsquospersonality traits as a result they provide more informationthan other types of features Nevertheless whether more orless each kind of characteristic provides unique informationin userrsquos personality traits prediction Therefore carryingon experiment with combination of social network featureslinguistic features emotional statistical features and topicfeatures can achieve better performance

442 Impact of Weighted Features Additionally we con-ducted experiments on unweighted features Here we onlypresented the most successful results in Table 4 includingmean scores of all personality traits

Mathematical Problems in Engineering 11

1

featuresEssential

featuresLinguistic

featuresEmotional

featuresTopic

featuresAll

Categories of features

EXTNEUAGR

CONOPNMEAN

09

08

07

06

05

04

03

F-s

core

s

Figure 3 1198651-measures of five personality traits along with mean1198651-measures of them with different categories of feature sets

Table 4 Precision recall and 1198651-measure of predictingmodel withunweighted features (in bold best performance)

Precision Recall 1198651-measureEXT 085 076 082NEU 092 068 076AGR 094 064 076CON 091 057 068OPN 087 078 083Mean 090 0686 077

Table 5 Precision recall and 1198651-measure of predictingmodel withunified thresholds (in bold best performance)

Precision Recall 1198651-measureEXT 062 054 057NEU 080 016 025AGR 057 073 063CON 057 077 060OPN 070 100 082Mean 0652 064 0574

From Tables 3 and 4 we can observe that the best resultsfor myPersonality dataset are achieved after distributingweights to features It illustrates that considering correlationsbetween features and userrsquos personality traits can bring per-formance gain to the prediction task since there are limitedfeatures that may remain useful for userrsquos personality traitsprediction

443 Impact of Dynamic Updated Thresholds of PersonalityTraits In addition we predicted userrsquos personality traits withunified thresholds as well The results are summarized inTable 5

Multilabel learning task is to predict one or more cat-egories for each instance In the existing algorithms suchas PT5 method proposed by Tsoumakas and Katakis [34]

Table 6 Precision of different methods on Facebook dataset Thebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 093 086 096 089 091 0910Verhoevenet al [42] SVM 079 071 067 072 087 0752

Farnadi etal [33]

SVM kNNNB 058 054 050 055 060 0554

Alam et al[43]

SVM BLRmNB 058 059 059 059 060 0590

Tomlinsonet al [44] LR NA NA NA NA NA NA

generally thresholdswere unified to the same valueHoweverall categories employing a unified minimum threshold arenot appropriate If threshold is set too high not all categoriestags will be predicted on the other hand if threshold is settoo low too many classes will be predicted in results Hencein this paper we updated thresholds of personality traitsdynamically according to error rates of them Comparedto Table 3 based on thresholds obtained through updatingdynamically our method can achieve better results

45 Performance EvaluationwithDifferentMethods Asmen-tioned in Section 41 in the Workshop on ComputationalPersonality Recognition (Shared Task) different systems forpersonality recognition from text and network features ona common benchmark have been compared Verhoeven etal [42] proposed a meta-learning approach which can beextended to certain component classifiers from other genreswith other class systems or even from other languagesFarnadi et al [33] leveraged SVM 119896 nearest neighbors(kNN) [17] and Naıve Bayes (NB) for automatic recognitionof personality traits from usersrsquo Facebook statues updatesrespectively even with a small set of training dataset it couldachieve better results than most baseline algorithms On thebasis of a set of features extracted from Facebook datasetAlam et al [43] explored suitability and performance ofseveral classification techniques which were SVM BayesianLogistic Regression (BLR) [47] andMultinomial Naıve Bayes(mNB) [48] Tomlinson et al [44] used ranking algorithmsfor feature selection and Logistic Regression (LR) [49] aslearning algorithms which achieved a high performance Inthis paper we tested our proposed method with the sameset than the participants in the Workshop on ComputationalPersonality Recognition (Shared Task) In terms of precisionrecall and 1198651-measure a comparison of the works describedabove as well as our proposed method (denoted by CP) issummarized in Tables 6 7 and 8

From Tables 6 7 and 8 the experimental results indicatethat openness is the easiest personality trait to be predictedby Facebook features with the above methods because thereare more users with positive level of openness than otherpersonality traits in Facebook dataset Since Verhoeven et al[42] trained an ensemble SVM model based on both Face-book and Essays personality traits dataset as a consequence

12 Mathematical Problems in Engineering

Table 7 Recall of different methods on Facebook dataset The bestperformance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 076 067 074 070 079 0732Verhoevenet al [42] SVM 079 072 068 072 087 0756

Farnadi etal [33]

SVMkNN NB 061 053 050 054 070 0576

Alam et al[43]

SVMBLR mNB 058 058 059 059 060 0588

Tomlinsonet al [44] LR NA NA NA NA NA NA

Table 8 1198651-measure of different methods on Facebook datasetThebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 083 075 083 078 084 0806Verhoevenet al [42] SVM 079 070 067 072 086 0748

Farnadi etal [33]

SVMkNN NB 056 052 050 054 061 0546

Alam et al[43]

SVMBLR mNB 058 058 058 059 060 0586

Tomlinsonet al [44] LR NA NA NA NA NA 0630

when tested on the same Facebook dataset it can achievebetter results than our proposed method Furthermorewhen trained and tested on the same Facebook dataset ourmethod outperforms the methods in [33 43 44] which donot take weighted features dynamic updated thresholds ofpersonality traits and correlations between personality traitsinto consideration

5 Conclusion

In this paper we studied the problem of exploiting interde-pendencies between userrsquos personality traits for predictinguserrsquos personality traits First after analyzing importance offeatures we conducted experiments on Facebook datasetto demonstrate the existence of correlations between userrsquospersonality traits Bearing this in mind an unsupervisedframework adaptive conditional probability-based predict-ing model was then proposed to predict userrsquos Big Fivepersonality traits based on importance of features dynamicupdated thresholds of personality traits and prior knowledgeabout correlations between personality traits Furthermorewe compared our results with the ones achieved by othersin the Workshop on Computational Personality Recognition(Shared Task) on the same dataset In general the experimen-tal results demonstrated the effectiveness of our proposedframework

In future work we will speculate on what directions canbe undertaken to ameliorate its performance with respect

to time complexity so as to better apply it to a big dataenvironment as in Facebook monitoring Besides in orderto make our framework applicable to dynamic networksbetter we will explore combining time series analysis withpersonality traits predicting algorithm to capture dynamicevolution process of information and network structureFurthermore since social network features (described inSection 311) are specific for our dataset they may be difficult(or even impossible) to obtain in another dataset hencewe will conduct experiments on other datasets such as theTweeter corpus of PAN shared task (httppanwebisde)on author profiling where also personality is considered tofurther validate the effectiveness of the proposed frameworkbased on different features

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China under Grant no 61300148 the Scientificand Technological Break-Through Program of Jilin ProvinceunderGrant no 20130206051GX the Science andTechnologyDevelopment Program of Jilin Province under Grant no20130522112JH the Science Foundation for China Postdoc-tor under Grant no 2012M510879 and the Basic Scien-tific Research Foundation for the Interdisciplinary Researchand Innovation Project of Jilin University under Grant no201103129

References

[1] M Oussalah F Bhat K Challis and T Schnier ldquoA softwarearchitecture for Twitter collection search and geolocationservicesrdquo Knowledge-Based Systems vol 37 pp 105ndash120 2013

[2] R R McCrae and P T Costa ldquoValidation of the 5-factor modelof personality across instruments and observersrdquo Journal ofPersonality and Social Psychology vol 52 no 1 pp 81ndash90 1987

[3] G Saucier and L R Goldberg ldquoThe structure of personalityattributesrdquo in Personality and Work Reconsidering the Role ofPersonality in Organizations M R Barrick and A M RyanEds pp 1ndash29 Jossey-Bass 2003

[4] B Caci M Cardaci M E Tabacchi and F Scrima ldquoPersonalityvariables as predictors of facebook usagerdquoPsychological Reportsvol 114 no 2 pp 528ndash539 2014

[5] M Kosinski Y Bachrach P Kohli D Stillwell and T GraepelldquoManifestations of user personality in website choice andbehaviour on online social networksrdquoMachine Learning vol 95no 3 pp 357ndash380 2014

[6] M-G Cojocaru H Thille E Thommes D Nelson and SGreenhalgh ldquoSocial influence and dynamic demand for newproductsrdquoEnvironmentalModellingamp Software vol 50 pp 169ndash185 2013

[7] F Durupinar N Pelechano J Allbeck U Gudukbay andN Badler ldquoThe impact of the OCEAN personality modelon the perception of crowdsrdquo IEEE Computer Graphics andApplications vol 31 no 3 pp 22ndash31 2011

Mathematical Problems in Engineering 13

[8] X Feng ldquoStudy on customer personality characteristics andrelationship outcomes using SEM analysisrdquo in Proceedings ofthe International Conference on Mechatronics and InformationTechnology pp 841ndash844 2013

[9] J G Lee SMoon andK Salamatian ldquoModeling and predictingthe popularity of online contents with Cox proportional hazardregression modelrdquo Neurocomputing vol 76 no 1 pp 134ndash1452012

[10] J W Stoughton L F Thompson and A W Meade ldquoBigfive personality traits reflected in job applicantsrsquo social mediapostingsrdquo Cyberpsychology Behavior and Social Networkingvol 16 no 11 pp 800ndash805 2013

[11] D A Cobb-Clark and S Schurer ldquoThe stability of big-fivepersonality traitsrdquo Economics Letters vol 115 no 1 pp 11ndash152012

[12] W Mischel ldquoToward an integrative science of the personrdquoAnnual Review of Psychology vol 55 pp 1ndash22 2004

[13] S Argamon S Dhawle M Koppel and J Pennebaker ldquoLexicalpredictors 19 of personality typerdquo in Proceedings of the JointAnnual Meeting of the Interface and the Classification Society ofNorth America 2005

[14] F Mairesse M AWalker M R Mehl and R K Moore ldquoUsinglinguistic cues for the automatic recognition of personality inconversation and textrdquo Journal of Artificial Intelligence Researchvol 30 pp 457ndash500 2007

[15] K Moore and J C McElroy ldquoThe influence of personalityon Facebook usage wall postings and regretrdquo Computers inHuman Behavior vol 28 no 1 pp 267ndash274 2012

[16] J W Pennebaker and L A King ldquoLinguistic styles languageuse as an individual differencerdquo Journal of Personality and SocialPsychology vol 77 no 6 pp 1296ndash1312 1999

[17] J Han and M Kamber Data Mining Concepts and TechniquesThe Morgan Kaufmann Series in Data Management Systemsedited by J Gray Morgan Kaufmann Boston Mass USA 3rdedition 2011

[18] J Golbeck C Robles and K Turner ldquoPredicting personalitywith social mediardquo in Proceedings of the 29th Annual CHIConference on Human Factors in Computing Systems (CHI rsquo11)pp 253ndash262 ACM May 2011

[19] J RQuinlan ldquoLearningwith continuous classesrdquo inProceedingsof the 5thAustralian Joint Conference onArtificial Intelligence (Airsquo92) pp 343ndash348 Hobart Australia November 1992

[20] S Roweis and Z Ghahramani ldquoA unifying review of lineargaussian modelsrdquo Neural Computation vol 11 no 2 pp 305ndash345 1999

[21] S Bai T Zhu and L Cheng ldquoBig-five personality pre-diction based on user behaviors at social network sitesrdquohttparxivorgpdf12044809v1pdf

[22] J Oberlander and S Nowson ldquoWhose thumb is it anywayclassifying author personality from weblog textrdquo in Proceedingsof the 44th Annual Meeting of the Association for ComputationalLinguistics pp 627ndash634 2006

[23] T Nguyen D Q Phung B Adams and S Venkatesh ldquoTowardsdiscovery of influence and personality traits through social linkpredictionrdquo in Proceedings of the International Conference onWeblogs and Social Media pp 566ndash569 Barcelona Spain July2011

[24] S T Bai B B Hao A Li S Yuan R Gao and T S ZhuldquoPredicting big five personality traits of microblog usersrdquo inProceedings of the 12th IEEEWICACM International JointConferences on Web Intelligence and Intelligent Agent Technolog(WI rsquo13) pp 501ndash508 November 2013

[25] R Sun and N Wilson ldquoA model of personality should be acognitive architecture itselfrdquo Cognitive Systems Research vol29-30 no 1 pp 1ndash30 2014

[26] A Ortigosa R M Carro and J I Quiroga ldquoPredicting userpersonality by mining social interactions in Facebookrdquo Journalof Computer and System Sciences vol 80 no 1 pp 57ndash71 2014

[27] O P John L P Naumann and C J Soto ldquoParadigm shift tothe integrative Big Five trait taxonomy history measurementand conceptualissuesrdquo in Handbook of Personality Theory andResearch O P John R W Robins and L A Pervin Eds pp114ndash158 Guilford Press New York NY USA 3rd edition 2008

[28] G Park H A Schwartz J C Eichstaedt et al ldquoAutomaticpersonality assessment through social media languagerdquo Journalof Personality and Social Psychology vol 108 no 6 pp 934ndash9522015

[29] C J Soto O P John S D Gosling and J Potter ldquoAge differencesin personality traits from 10 to 65 big five domains and facets ina large cross-sectional samplerdquo Journal of Personality and SocialPsychology vol 100 no 2 pp 330ndash348 2011

[30] C J Soto and O P John ldquoTen facet scales for the BigFive Inventory convergence with NEO PI-R facets self-peeragreement and discriminant validityrdquo Journal of Research inPersonality vol 43 no 1 pp 84ndash90 2009

[31] M V Kosti R Feldt and L Angelis ldquoPersonality emotionalintelligence and work preferences in software engineering anempirical studyrdquo Information and Software Technology vol 56no 8 pp 973ndash990 2014

[32] H A Schwartz J C Eichstaedt M L Kern et al ldquoPersonalitygender and age in the language of social media the open-vocabulary approachrdquo PLoS ONE vol 8 no 9 Article IDe73791 2013

[33] G Farnadi S ZoghbiM FMoens andMD Cock ldquoRecognis-ing personality traits using facebook status updatesrdquo in Proceed-ings of the Workshop on Computational Personality RecognitionJuly 2013 httpcliccimecunitnitfabiowcpr13farnadi wcpr-13pdf

[34] G Tsoumakas and I Katakis ldquoMulti-label classification anoverviewrdquo International Journal of Data Warehousing and Min-ing vol 3 no 3 pp 1ndash13 2007

[35] M Kosinski D Stillwell and T Graepel ldquoPrivate traits andattributes are predictable from digital records of human behav-iorrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 110 no 15 pp 5802ndash5805 2013

[36] M Belkin and P Niyogi ldquoLaplacian eigenmaps for dimension-ality reduction and data representationrdquo Neural Computationvol 15 no 6 pp 1373ndash1396 2003

[37] R Socher J Bauer C D Manning and Y N Andrew ldquoParsingwith compositional vector grammarsrdquo in Proceedings of the 51stAnnual Meeting of the Association for Computational Linguistics(ACL rsquo13) pp 455ndash465 Sofia Bulgaria August 2013

[38] K Kazama M Imada and K Kashiwagi ldquoCharacteristics ofinformation diffusion in blogs in relation to information sourcetyperdquo Neurocomputing vol 76 no 1 pp 84ndash92 2012

[39] M B David Y N Andrew and I J Michael ldquoLatent Dirichletallocationrdquo Journal ofMachine Learning Research vol 3 no 4-5pp 993ndash1022 2003

[40] P T Costa andR RMcCraeRevisedNEOPersonality Inventory(NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Profes-sional Manual Psychological Assessment Resources 1992

[41] R L Atkinson C A Richard E S Edward J B Daryland S Nolen-Hoeksema Hilgardrsquos Introduction to Psychology

14 Mathematical Problems in Engineering

Harcourt College Publishers Orlando Fla USA 13th edition2000

[42] B Verhoeven W Daelemans and T D Smedt ldquoEnsemblemethods for personality recognitionrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 35ndash38Boston Mass USA July 2013

[43] F Alam A Stepanov and G Riccardi ldquoPersonality traitsrecognition on social networkmdashfacebookrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 6ndash9Boston Mass USA July 2013

[44] M T Tomlinson D Hinote and D B Bracewell ldquoPredictingconscientiousness through semantic analysis of facebook postsrdquoin Proceedings of the Workshop on Computational PersonalityRecognition pp 31ndash34 July 2013

[45] D M Endres and J E Schindelin ldquoA newmetric for probabilitydistributionsrdquo IEEETransactions on InformationTheory vol 49no 7 pp 1858ndash1860 2003

[46] S Kullback and R A Leibler ldquoOn information and sufficiencyrdquoThe Annals of Mathematical Statistics vol 22 no 1 pp 79ndash861951

[47] A Genkin D D Lewis and D Madigan ldquoLarge-scale Bayesianlogistic regression for text categorizationrdquo Technometrics vol49 no 3 pp 291ndash304 2007

[48] A K McCallum and K Nigam ldquoA comparison of event modelsfor naive bayes text classificationrdquo in Proceedings of the AAAI-98 Workshop on Learning for Text Categorization pp 137ndash142Madison Wis USA July 1998

[49] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article A Novel Adaptive Conditional Probability ...downloads.hindawi.com/journals/mpe/2015/472917.pdf · Research Article A Novel Adaptive Conditional Probability-Based

8 Mathematical Problems in Engineering

of data With consent usersrsquo psychological and Facebookprofiles are recorded for research purposes Currently thedatabase contains more than 6000000 test results togetherwith more than 4000000 individual Facebook profilesIn this paper we adopted the dataset that was provided inthe Workshop on Computational Personality Recognition(Shared Task) (httpmypersonalityorgwikilibexefetchphpmedia=wikimypersonality finalzip) [35] The datasetwas a subset of myPersonality database including 250users which had both information about personality traits(annotated with usersrsquo personality scores and gold standardpersonality labelswhichwere self-assessments obtained usinga 100-item long version of the IPIP personality questionnaire(httpipiporiorgnewFinding Labeling IPIP Scaleshtm))and social network structure (network size betweennesscentrality density brokerage and transitivity) along withtheir 9900 status updates in raw text With the aid of thisstandard dataset we can compare the performance of ourproposed method with othersrsquo personality recognitionsystems on a common benchmark

42 Analysis of Correlations between Personality Traits Pre-vious works were usually predicting userrsquos personality traitswithout considering interdependencies between them In thiscontext we investigatedwhether there had been relationshipsbetween userrsquos personality traits Since not all users had thesame level of interactions consequently we first groupedusers according to different degrees of userrsquos personalitytraits As an example if a user has positive level of agree-ableness another user has positive level of agreeableness aswell then they will be divided into a group else if a user haspositive level of agreeableness another user has negative levelof agreeableness and they will not be divided into a groupThen we explored cooccurrences between personality traitsin each group simultaneously which is shown in Figure 2

It can be observed intuitively that a significant proportionof users have negative level of extraversion positive levelof openness or positive level of agreeableness It is alsonoteworthy that positive level of extraversion has less overlapwith negative level of openness negative level of consci-entiousness and negative level of agreeableness Besidespositive level of neuroticism has less overlap with positivelevel of conscientiousness positive level of agreeableness andnegative level of openness Furthermore there are bits ofusers that have positive level of extraversion and positive levelof neuroticism simultaneously

However the above cooccurrences may be due to the apriori statistics of each trait For instance if there are moreusers with negative level of neuroticism positive level ofopenness and positive level of agreeableness than othersit is more likely to have more users with cooccurrencebetween negative level of neuroticism and positive level ofopenness negative level of neuroticism and positive level ofagreeableness and positive level of openness and positivelevel of agreeableness than others as it is shown in Figure 2Therefore since users were grouped according to differentdegrees of userrsquos personality traits in order to investigatewhether the above phenomenon meant that there werepositive or negative correlations between different degrees

of personality traits in each group we treated scores ofa certain personality trait according to which users weredivided into the group and scores of another personalitytrait with a certain degree as two random variables followedby employing Kendall correlation coefficient which wascalculated in (3) to analyze real correlations between themThe results are presented in Table 1

As it can be seen from Table 1 we can observe anegative correlation between negative level of extraversionand positive level of neuroticism in other words people thatscore high on extraversion have less possibility to score highon neuroticism as well In psychology a person with negativelevel of extraversion can be explained as solitary and a personwith negative level of conscientiousnessmeans being carelessConsequently negative level of extraversion and negativelevel of conscientiousness are not in line with positive level ofneuroticism which tends to experience unpleasant emotionseasily Another factor is social desirability Moreover agree-ableness extraversion conscientiousness and openness aremore or less ldquodesirablerdquo whereas neuroticism is quite clearlynegative Thus generally there are negative correlationsbetween neuroticism and other personality traits

What is more it is inconsistent with Figure 2 that thereis a positive correlation between positive level of neuroticismand positive level of extraversion It may be explained thata person with positive level of extraversion tends to beenthusiastic and talkative which is compatible with positivelevel of neuroticism In addition in line with Figure 2since negative level of openness shows cautiousness it hasa positive correlation with negative level of extraversionand positive level of agreeableness which is a tendency tobe compassionate However there is a positive correlationbetween negative level of openness and positive level ofneuroticism which is not in keeping with Figure 2

Since there are some contradictions between Figure 2and Table 1 we leveraged Jensen-Shannon divergence [45]to further analyze correlations between userrsquos personalitytraits Jensen-Shannon divergence between two probabilitydistributions is calculated as

119863JS (119875 119876) =12(119863KL (119875 119877) +119863KL (119876 119877)) (14)

where 119875 and 119876 denote two probability distributions 119877denotes an average distribution of 119875 and 119876 and 119863KL(119875 119877)denotes Kullback-Leibler divergence [46] between 119875 and 119877which is calculated with

119863KL (119875 119877) = sum119894

119875 (119894) log 119875 (119894)119877 (119894)

(15)

where 119875(119894) and 119877(119894) denote the 119894th value of 119875 and 119877 Theresults are presented in Table 2

In keeping with Figure 2 and Table 1 Jensen-Shannondivergences between negative level of neuroticism and posi-tive level of agreeableness positive level of conscientiousnessand positive level of openness are larger It is because ofthe fact that a person with negative level of neuroticismshows confidence on everything and positive level of open-ness reflects degree of intellectual curiosity creativity and

Mathematical Problems in Engineering 9

Table 1 Kendall correlation coefficients between personality traits 119901 and 119899 stand for positive level and negative level of a certain personalitytrait The bold number is the highest Kendall correlation coefficients and the italic number is the lowest Kendall correlation coefficients Thebigger Kendall correlation coefficient is the better the consistency is

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 022 minus003 009 minus015 minus006 minus013 007 001119899 minus023 minus018 004 004 004 018 007 022

NEU 119901 minus009 minus017 010 minus021 minus003 027119899 minus014 minus013 002 minus001 minus006 minus010

AGR 119901 minus007 007 016 029119899 minus008 016 006 002

CON 119901 minus001 minus010119899 007 minus007

OPN 119901

119899

PEXT and PNEUPEXT and PAGR

PEXT and PCONPEXT and POPNPEXT and NNEUPEXT and NAGRPEXT and NCONPEXT and NOPNPNEU and PAGR

PNEU and PCONPNEU and POPNPNEU and NEXTPNEU and NAGRPNEU and NCONPNEU and NOPNPAGR and PCONPAGR and POPNPAGR and NEXTPAGR and NNEUPAGR and NCONPAGR and NOPNPCON and POPNPCON and NEXTPCON and NNEUPCON and NAGRPCON and NOPNPOPN and NEXTPOPN and NNEUPOPN and NAGRPOPN and NCONNEXT and NNEUNEXT and NAGRNEXT and NCONNEXT and NOPNNNEU and NAGRNNEU and NCONNNEU and NOPNNAGR and NCONNAGR and NOPNNCON and NOPN

Proportion ()

000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000()

Figure 2The proportion of users who have a certain pair of personality traits simultaneously PEXT PNEU PAGR PCON and POPN standfor positive level of extraversion neuroticism agreeableness conscientiousness and openness NEXT NNEU NAGR NCON and NOPNstand for negative level of extraversion neuroticism agreeableness conscientiousness and openness

10 Mathematical Problems in Engineering

Table 2 Jensen-Shannon divergence between personality traits 119901and 119899 stand for positive level and negative level of a certain personal-ity traitThe bold number is the highest Jensen-Shannon divergenceand the italic number is the lowest Jensen-Shannon divergenceSmaller Jensen-Shannon divergence means better consistency

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 058 1479 044 219 070 236 073 084119899 479 449 521 236 561 191 1043 211

NEU 119901 080 212 110 246 298 040119899 1978 294 1697 305 2473 361

AGR 119901 077 329 082 091119899 340 086 463 125

CON 119901 115 117119899 616 140

OPN 119901

119899

a preference for novelty and variety a person has alongwith positive level of agreeableness and positive level ofconscientiousness which are compatible Furthermore it isin conformity with Table 1 that positive level of extraversionhas smaller Jensen-Shannon divergences with positive levelof neuroticism and positive level of agreeableness besidespositive level of neuroticism has a smaller Jensen-Shannondivergence with negative level of openness

In addition Soto et al [29] demonstrated convergent anddiscriminate correlations for the Big Five Inventory Soto andJohn [30] also illustrated that there was strong convergencebetween each BFI facet scale and corresponding NEO PI-Rfacet Moreover Park et al [28] demonstrated convergencewith self-reports of personality at the domain- and facet-levelIn summary although utilizing different Big Five measuresit can be found that correlations in myPersonality datasetare also in keeping with the findings from other samplesHence correlations between personality traits are expectedand actually prove that myPersonality results are validSo leveraging prior knowledge about relationships betweenpersonality traits may contribute to a better prediction onpersonality traits The results provide a better theoreticalsupport for our proposed method

43 Evaluation Metric Since most researchers exploringon personality recognition leveraged different measures toevaluate their experiments on various datasets it was hard tocompletely appraise their performance and quality Howevera common dataset which was annotated with gold standardpersonality labels was available in the Workshop on Com-putational Personality Recognition (Shared Task) users werefree to split the training and test sets as they wish andprecision recall and 1198651-measure were suggested to be usedto evaluate results of predictions So in this paper for morereliable results the proposed method was also evaluated withstratified 5-fold cross-validation Since there was no trainingset in unsupervised learning the common dataset was splitinto folds and in order to avoid overfitting the same author

Table 3 Precision recall and 1198651-measure of adaptive conditionalprobability-based predicting model (in bold best performance)

Precision Recall 1198651-measureEXT 093 076 083NEU 086 067 075AGR 096 074 083CON 089 070 078OPN 091 079 084Mean 091 0732 0806

did not appear twofold at the same timeThemean precisionrecall and 1198651-measure are calculated as follows

Precision = tptp + fp

Recall = tptp + fn

1198651-measure = 2 times 119875 times 119877119875 + 119877

(16)

where tp stands for the number of positive samples thatare classified correctly fp stands for the number of positivesamples that are classified incorrectly and fn stands for thenumber of negative samples that are classified incorrectly

44 Analysis of Impacts of Different Factors First we carriedon experiments with our proposed method Scores of preci-sion recall and 1198651-measure are shown in Table 3 includingmean scores of all personality traits

441 Impacts of Different Categories of Feature Sets In thissection we conducted experiments on social network fea-tures linguistic features emotional statistical features topicfeatures and all features respectively Due to space restric-tions Figure 3 only illustrates 1198651-measure of all personalitytraits along with mean 1198651-measure of them

It can be observed that experiments which are conductedon linguistic features emotional statistical features and topicfeatures result in unsatisfactory performance with respectto social network features and social network features havethe best classification performance for extraversion which isconsistent with the conclusion in [33] Since social networkfeatures reflect a userrsquos pattern of social status and habitswhich are relatively important decisive factors in userrsquospersonality traits as a result they provide more informationthan other types of features Nevertheless whether more orless each kind of characteristic provides unique informationin userrsquos personality traits prediction Therefore carryingon experiment with combination of social network featureslinguistic features emotional statistical features and topicfeatures can achieve better performance

442 Impact of Weighted Features Additionally we con-ducted experiments on unweighted features Here we onlypresented the most successful results in Table 4 includingmean scores of all personality traits

Mathematical Problems in Engineering 11

1

featuresEssential

featuresLinguistic

featuresEmotional

featuresTopic

featuresAll

Categories of features

EXTNEUAGR

CONOPNMEAN

09

08

07

06

05

04

03

F-s

core

s

Figure 3 1198651-measures of five personality traits along with mean1198651-measures of them with different categories of feature sets

Table 4 Precision recall and 1198651-measure of predictingmodel withunweighted features (in bold best performance)

Precision Recall 1198651-measureEXT 085 076 082NEU 092 068 076AGR 094 064 076CON 091 057 068OPN 087 078 083Mean 090 0686 077

Table 5 Precision recall and 1198651-measure of predictingmodel withunified thresholds (in bold best performance)

Precision Recall 1198651-measureEXT 062 054 057NEU 080 016 025AGR 057 073 063CON 057 077 060OPN 070 100 082Mean 0652 064 0574

From Tables 3 and 4 we can observe that the best resultsfor myPersonality dataset are achieved after distributingweights to features It illustrates that considering correlationsbetween features and userrsquos personality traits can bring per-formance gain to the prediction task since there are limitedfeatures that may remain useful for userrsquos personality traitsprediction

443 Impact of Dynamic Updated Thresholds of PersonalityTraits In addition we predicted userrsquos personality traits withunified thresholds as well The results are summarized inTable 5

Multilabel learning task is to predict one or more cat-egories for each instance In the existing algorithms suchas PT5 method proposed by Tsoumakas and Katakis [34]

Table 6 Precision of different methods on Facebook dataset Thebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 093 086 096 089 091 0910Verhoevenet al [42] SVM 079 071 067 072 087 0752

Farnadi etal [33]

SVM kNNNB 058 054 050 055 060 0554

Alam et al[43]

SVM BLRmNB 058 059 059 059 060 0590

Tomlinsonet al [44] LR NA NA NA NA NA NA

generally thresholdswere unified to the same valueHoweverall categories employing a unified minimum threshold arenot appropriate If threshold is set too high not all categoriestags will be predicted on the other hand if threshold is settoo low too many classes will be predicted in results Hencein this paper we updated thresholds of personality traitsdynamically according to error rates of them Comparedto Table 3 based on thresholds obtained through updatingdynamically our method can achieve better results

45 Performance EvaluationwithDifferentMethods Asmen-tioned in Section 41 in the Workshop on ComputationalPersonality Recognition (Shared Task) different systems forpersonality recognition from text and network features ona common benchmark have been compared Verhoeven etal [42] proposed a meta-learning approach which can beextended to certain component classifiers from other genreswith other class systems or even from other languagesFarnadi et al [33] leveraged SVM 119896 nearest neighbors(kNN) [17] and Naıve Bayes (NB) for automatic recognitionof personality traits from usersrsquo Facebook statues updatesrespectively even with a small set of training dataset it couldachieve better results than most baseline algorithms On thebasis of a set of features extracted from Facebook datasetAlam et al [43] explored suitability and performance ofseveral classification techniques which were SVM BayesianLogistic Regression (BLR) [47] andMultinomial Naıve Bayes(mNB) [48] Tomlinson et al [44] used ranking algorithmsfor feature selection and Logistic Regression (LR) [49] aslearning algorithms which achieved a high performance Inthis paper we tested our proposed method with the sameset than the participants in the Workshop on ComputationalPersonality Recognition (Shared Task) In terms of precisionrecall and 1198651-measure a comparison of the works describedabove as well as our proposed method (denoted by CP) issummarized in Tables 6 7 and 8

From Tables 6 7 and 8 the experimental results indicatethat openness is the easiest personality trait to be predictedby Facebook features with the above methods because thereare more users with positive level of openness than otherpersonality traits in Facebook dataset Since Verhoeven et al[42] trained an ensemble SVM model based on both Face-book and Essays personality traits dataset as a consequence

12 Mathematical Problems in Engineering

Table 7 Recall of different methods on Facebook dataset The bestperformance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 076 067 074 070 079 0732Verhoevenet al [42] SVM 079 072 068 072 087 0756

Farnadi etal [33]

SVMkNN NB 061 053 050 054 070 0576

Alam et al[43]

SVMBLR mNB 058 058 059 059 060 0588

Tomlinsonet al [44] LR NA NA NA NA NA NA

Table 8 1198651-measure of different methods on Facebook datasetThebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 083 075 083 078 084 0806Verhoevenet al [42] SVM 079 070 067 072 086 0748

Farnadi etal [33]

SVMkNN NB 056 052 050 054 061 0546

Alam et al[43]

SVMBLR mNB 058 058 058 059 060 0586

Tomlinsonet al [44] LR NA NA NA NA NA 0630

when tested on the same Facebook dataset it can achievebetter results than our proposed method Furthermorewhen trained and tested on the same Facebook dataset ourmethod outperforms the methods in [33 43 44] which donot take weighted features dynamic updated thresholds ofpersonality traits and correlations between personality traitsinto consideration

5 Conclusion

In this paper we studied the problem of exploiting interde-pendencies between userrsquos personality traits for predictinguserrsquos personality traits First after analyzing importance offeatures we conducted experiments on Facebook datasetto demonstrate the existence of correlations between userrsquospersonality traits Bearing this in mind an unsupervisedframework adaptive conditional probability-based predict-ing model was then proposed to predict userrsquos Big Fivepersonality traits based on importance of features dynamicupdated thresholds of personality traits and prior knowledgeabout correlations between personality traits Furthermorewe compared our results with the ones achieved by othersin the Workshop on Computational Personality Recognition(Shared Task) on the same dataset In general the experimen-tal results demonstrated the effectiveness of our proposedframework

In future work we will speculate on what directions canbe undertaken to ameliorate its performance with respect

to time complexity so as to better apply it to a big dataenvironment as in Facebook monitoring Besides in orderto make our framework applicable to dynamic networksbetter we will explore combining time series analysis withpersonality traits predicting algorithm to capture dynamicevolution process of information and network structureFurthermore since social network features (described inSection 311) are specific for our dataset they may be difficult(or even impossible) to obtain in another dataset hencewe will conduct experiments on other datasets such as theTweeter corpus of PAN shared task (httppanwebisde)on author profiling where also personality is considered tofurther validate the effectiveness of the proposed frameworkbased on different features

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China under Grant no 61300148 the Scientificand Technological Break-Through Program of Jilin ProvinceunderGrant no 20130206051GX the Science andTechnologyDevelopment Program of Jilin Province under Grant no20130522112JH the Science Foundation for China Postdoc-tor under Grant no 2012M510879 and the Basic Scien-tific Research Foundation for the Interdisciplinary Researchand Innovation Project of Jilin University under Grant no201103129

References

[1] M Oussalah F Bhat K Challis and T Schnier ldquoA softwarearchitecture for Twitter collection search and geolocationservicesrdquo Knowledge-Based Systems vol 37 pp 105ndash120 2013

[2] R R McCrae and P T Costa ldquoValidation of the 5-factor modelof personality across instruments and observersrdquo Journal ofPersonality and Social Psychology vol 52 no 1 pp 81ndash90 1987

[3] G Saucier and L R Goldberg ldquoThe structure of personalityattributesrdquo in Personality and Work Reconsidering the Role ofPersonality in Organizations M R Barrick and A M RyanEds pp 1ndash29 Jossey-Bass 2003

[4] B Caci M Cardaci M E Tabacchi and F Scrima ldquoPersonalityvariables as predictors of facebook usagerdquoPsychological Reportsvol 114 no 2 pp 528ndash539 2014

[5] M Kosinski Y Bachrach P Kohli D Stillwell and T GraepelldquoManifestations of user personality in website choice andbehaviour on online social networksrdquoMachine Learning vol 95no 3 pp 357ndash380 2014

[6] M-G Cojocaru H Thille E Thommes D Nelson and SGreenhalgh ldquoSocial influence and dynamic demand for newproductsrdquoEnvironmentalModellingamp Software vol 50 pp 169ndash185 2013

[7] F Durupinar N Pelechano J Allbeck U Gudukbay andN Badler ldquoThe impact of the OCEAN personality modelon the perception of crowdsrdquo IEEE Computer Graphics andApplications vol 31 no 3 pp 22ndash31 2011

Mathematical Problems in Engineering 13

[8] X Feng ldquoStudy on customer personality characteristics andrelationship outcomes using SEM analysisrdquo in Proceedings ofthe International Conference on Mechatronics and InformationTechnology pp 841ndash844 2013

[9] J G Lee SMoon andK Salamatian ldquoModeling and predictingthe popularity of online contents with Cox proportional hazardregression modelrdquo Neurocomputing vol 76 no 1 pp 134ndash1452012

[10] J W Stoughton L F Thompson and A W Meade ldquoBigfive personality traits reflected in job applicantsrsquo social mediapostingsrdquo Cyberpsychology Behavior and Social Networkingvol 16 no 11 pp 800ndash805 2013

[11] D A Cobb-Clark and S Schurer ldquoThe stability of big-fivepersonality traitsrdquo Economics Letters vol 115 no 1 pp 11ndash152012

[12] W Mischel ldquoToward an integrative science of the personrdquoAnnual Review of Psychology vol 55 pp 1ndash22 2004

[13] S Argamon S Dhawle M Koppel and J Pennebaker ldquoLexicalpredictors 19 of personality typerdquo in Proceedings of the JointAnnual Meeting of the Interface and the Classification Society ofNorth America 2005

[14] F Mairesse M AWalker M R Mehl and R K Moore ldquoUsinglinguistic cues for the automatic recognition of personality inconversation and textrdquo Journal of Artificial Intelligence Researchvol 30 pp 457ndash500 2007

[15] K Moore and J C McElroy ldquoThe influence of personalityon Facebook usage wall postings and regretrdquo Computers inHuman Behavior vol 28 no 1 pp 267ndash274 2012

[16] J W Pennebaker and L A King ldquoLinguistic styles languageuse as an individual differencerdquo Journal of Personality and SocialPsychology vol 77 no 6 pp 1296ndash1312 1999

[17] J Han and M Kamber Data Mining Concepts and TechniquesThe Morgan Kaufmann Series in Data Management Systemsedited by J Gray Morgan Kaufmann Boston Mass USA 3rdedition 2011

[18] J Golbeck C Robles and K Turner ldquoPredicting personalitywith social mediardquo in Proceedings of the 29th Annual CHIConference on Human Factors in Computing Systems (CHI rsquo11)pp 253ndash262 ACM May 2011

[19] J RQuinlan ldquoLearningwith continuous classesrdquo inProceedingsof the 5thAustralian Joint Conference onArtificial Intelligence (Airsquo92) pp 343ndash348 Hobart Australia November 1992

[20] S Roweis and Z Ghahramani ldquoA unifying review of lineargaussian modelsrdquo Neural Computation vol 11 no 2 pp 305ndash345 1999

[21] S Bai T Zhu and L Cheng ldquoBig-five personality pre-diction based on user behaviors at social network sitesrdquohttparxivorgpdf12044809v1pdf

[22] J Oberlander and S Nowson ldquoWhose thumb is it anywayclassifying author personality from weblog textrdquo in Proceedingsof the 44th Annual Meeting of the Association for ComputationalLinguistics pp 627ndash634 2006

[23] T Nguyen D Q Phung B Adams and S Venkatesh ldquoTowardsdiscovery of influence and personality traits through social linkpredictionrdquo in Proceedings of the International Conference onWeblogs and Social Media pp 566ndash569 Barcelona Spain July2011

[24] S T Bai B B Hao A Li S Yuan R Gao and T S ZhuldquoPredicting big five personality traits of microblog usersrdquo inProceedings of the 12th IEEEWICACM International JointConferences on Web Intelligence and Intelligent Agent Technolog(WI rsquo13) pp 501ndash508 November 2013

[25] R Sun and N Wilson ldquoA model of personality should be acognitive architecture itselfrdquo Cognitive Systems Research vol29-30 no 1 pp 1ndash30 2014

[26] A Ortigosa R M Carro and J I Quiroga ldquoPredicting userpersonality by mining social interactions in Facebookrdquo Journalof Computer and System Sciences vol 80 no 1 pp 57ndash71 2014

[27] O P John L P Naumann and C J Soto ldquoParadigm shift tothe integrative Big Five trait taxonomy history measurementand conceptualissuesrdquo in Handbook of Personality Theory andResearch O P John R W Robins and L A Pervin Eds pp114ndash158 Guilford Press New York NY USA 3rd edition 2008

[28] G Park H A Schwartz J C Eichstaedt et al ldquoAutomaticpersonality assessment through social media languagerdquo Journalof Personality and Social Psychology vol 108 no 6 pp 934ndash9522015

[29] C J Soto O P John S D Gosling and J Potter ldquoAge differencesin personality traits from 10 to 65 big five domains and facets ina large cross-sectional samplerdquo Journal of Personality and SocialPsychology vol 100 no 2 pp 330ndash348 2011

[30] C J Soto and O P John ldquoTen facet scales for the BigFive Inventory convergence with NEO PI-R facets self-peeragreement and discriminant validityrdquo Journal of Research inPersonality vol 43 no 1 pp 84ndash90 2009

[31] M V Kosti R Feldt and L Angelis ldquoPersonality emotionalintelligence and work preferences in software engineering anempirical studyrdquo Information and Software Technology vol 56no 8 pp 973ndash990 2014

[32] H A Schwartz J C Eichstaedt M L Kern et al ldquoPersonalitygender and age in the language of social media the open-vocabulary approachrdquo PLoS ONE vol 8 no 9 Article IDe73791 2013

[33] G Farnadi S ZoghbiM FMoens andMD Cock ldquoRecognis-ing personality traits using facebook status updatesrdquo in Proceed-ings of the Workshop on Computational Personality RecognitionJuly 2013 httpcliccimecunitnitfabiowcpr13farnadi wcpr-13pdf

[34] G Tsoumakas and I Katakis ldquoMulti-label classification anoverviewrdquo International Journal of Data Warehousing and Min-ing vol 3 no 3 pp 1ndash13 2007

[35] M Kosinski D Stillwell and T Graepel ldquoPrivate traits andattributes are predictable from digital records of human behav-iorrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 110 no 15 pp 5802ndash5805 2013

[36] M Belkin and P Niyogi ldquoLaplacian eigenmaps for dimension-ality reduction and data representationrdquo Neural Computationvol 15 no 6 pp 1373ndash1396 2003

[37] R Socher J Bauer C D Manning and Y N Andrew ldquoParsingwith compositional vector grammarsrdquo in Proceedings of the 51stAnnual Meeting of the Association for Computational Linguistics(ACL rsquo13) pp 455ndash465 Sofia Bulgaria August 2013

[38] K Kazama M Imada and K Kashiwagi ldquoCharacteristics ofinformation diffusion in blogs in relation to information sourcetyperdquo Neurocomputing vol 76 no 1 pp 84ndash92 2012

[39] M B David Y N Andrew and I J Michael ldquoLatent Dirichletallocationrdquo Journal ofMachine Learning Research vol 3 no 4-5pp 993ndash1022 2003

[40] P T Costa andR RMcCraeRevisedNEOPersonality Inventory(NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Profes-sional Manual Psychological Assessment Resources 1992

[41] R L Atkinson C A Richard E S Edward J B Daryland S Nolen-Hoeksema Hilgardrsquos Introduction to Psychology

14 Mathematical Problems in Engineering

Harcourt College Publishers Orlando Fla USA 13th edition2000

[42] B Verhoeven W Daelemans and T D Smedt ldquoEnsemblemethods for personality recognitionrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 35ndash38Boston Mass USA July 2013

[43] F Alam A Stepanov and G Riccardi ldquoPersonality traitsrecognition on social networkmdashfacebookrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 6ndash9Boston Mass USA July 2013

[44] M T Tomlinson D Hinote and D B Bracewell ldquoPredictingconscientiousness through semantic analysis of facebook postsrdquoin Proceedings of the Workshop on Computational PersonalityRecognition pp 31ndash34 July 2013

[45] D M Endres and J E Schindelin ldquoA newmetric for probabilitydistributionsrdquo IEEETransactions on InformationTheory vol 49no 7 pp 1858ndash1860 2003

[46] S Kullback and R A Leibler ldquoOn information and sufficiencyrdquoThe Annals of Mathematical Statistics vol 22 no 1 pp 79ndash861951

[47] A Genkin D D Lewis and D Madigan ldquoLarge-scale Bayesianlogistic regression for text categorizationrdquo Technometrics vol49 no 3 pp 291ndash304 2007

[48] A K McCallum and K Nigam ldquoA comparison of event modelsfor naive bayes text classificationrdquo in Proceedings of the AAAI-98 Workshop on Learning for Text Categorization pp 137ndash142Madison Wis USA July 1998

[49] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article A Novel Adaptive Conditional Probability ...downloads.hindawi.com/journals/mpe/2015/472917.pdf · Research Article A Novel Adaptive Conditional Probability-Based

Mathematical Problems in Engineering 9

Table 1 Kendall correlation coefficients between personality traits 119901 and 119899 stand for positive level and negative level of a certain personalitytrait The bold number is the highest Kendall correlation coefficients and the italic number is the lowest Kendall correlation coefficients Thebigger Kendall correlation coefficient is the better the consistency is

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 022 minus003 009 minus015 minus006 minus013 007 001119899 minus023 minus018 004 004 004 018 007 022

NEU 119901 minus009 minus017 010 minus021 minus003 027119899 minus014 minus013 002 minus001 minus006 minus010

AGR 119901 minus007 007 016 029119899 minus008 016 006 002

CON 119901 minus001 minus010119899 007 minus007

OPN 119901

119899

PEXT and PNEUPEXT and PAGR

PEXT and PCONPEXT and POPNPEXT and NNEUPEXT and NAGRPEXT and NCONPEXT and NOPNPNEU and PAGR

PNEU and PCONPNEU and POPNPNEU and NEXTPNEU and NAGRPNEU and NCONPNEU and NOPNPAGR and PCONPAGR and POPNPAGR and NEXTPAGR and NNEUPAGR and NCONPAGR and NOPNPCON and POPNPCON and NEXTPCON and NNEUPCON and NAGRPCON and NOPNPOPN and NEXTPOPN and NNEUPOPN and NAGRPOPN and NCONNEXT and NNEUNEXT and NAGRNEXT and NCONNEXT and NOPNNNEU and NAGRNNEU and NCONNNEU and NOPNNAGR and NCONNAGR and NOPNNCON and NOPN

Proportion ()

000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000()

Figure 2The proportion of users who have a certain pair of personality traits simultaneously PEXT PNEU PAGR PCON and POPN standfor positive level of extraversion neuroticism agreeableness conscientiousness and openness NEXT NNEU NAGR NCON and NOPNstand for negative level of extraversion neuroticism agreeableness conscientiousness and openness

10 Mathematical Problems in Engineering

Table 2 Jensen-Shannon divergence between personality traits 119901and 119899 stand for positive level and negative level of a certain personal-ity traitThe bold number is the highest Jensen-Shannon divergenceand the italic number is the lowest Jensen-Shannon divergenceSmaller Jensen-Shannon divergence means better consistency

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 058 1479 044 219 070 236 073 084119899 479 449 521 236 561 191 1043 211

NEU 119901 080 212 110 246 298 040119899 1978 294 1697 305 2473 361

AGR 119901 077 329 082 091119899 340 086 463 125

CON 119901 115 117119899 616 140

OPN 119901

119899

a preference for novelty and variety a person has alongwith positive level of agreeableness and positive level ofconscientiousness which are compatible Furthermore it isin conformity with Table 1 that positive level of extraversionhas smaller Jensen-Shannon divergences with positive levelof neuroticism and positive level of agreeableness besidespositive level of neuroticism has a smaller Jensen-Shannondivergence with negative level of openness

In addition Soto et al [29] demonstrated convergent anddiscriminate correlations for the Big Five Inventory Soto andJohn [30] also illustrated that there was strong convergencebetween each BFI facet scale and corresponding NEO PI-Rfacet Moreover Park et al [28] demonstrated convergencewith self-reports of personality at the domain- and facet-levelIn summary although utilizing different Big Five measuresit can be found that correlations in myPersonality datasetare also in keeping with the findings from other samplesHence correlations between personality traits are expectedand actually prove that myPersonality results are validSo leveraging prior knowledge about relationships betweenpersonality traits may contribute to a better prediction onpersonality traits The results provide a better theoreticalsupport for our proposed method

43 Evaluation Metric Since most researchers exploringon personality recognition leveraged different measures toevaluate their experiments on various datasets it was hard tocompletely appraise their performance and quality Howevera common dataset which was annotated with gold standardpersonality labels was available in the Workshop on Com-putational Personality Recognition (Shared Task) users werefree to split the training and test sets as they wish andprecision recall and 1198651-measure were suggested to be usedto evaluate results of predictions So in this paper for morereliable results the proposed method was also evaluated withstratified 5-fold cross-validation Since there was no trainingset in unsupervised learning the common dataset was splitinto folds and in order to avoid overfitting the same author

Table 3 Precision recall and 1198651-measure of adaptive conditionalprobability-based predicting model (in bold best performance)

Precision Recall 1198651-measureEXT 093 076 083NEU 086 067 075AGR 096 074 083CON 089 070 078OPN 091 079 084Mean 091 0732 0806

did not appear twofold at the same timeThemean precisionrecall and 1198651-measure are calculated as follows

Precision = tptp + fp

Recall = tptp + fn

1198651-measure = 2 times 119875 times 119877119875 + 119877

(16)

where tp stands for the number of positive samples thatare classified correctly fp stands for the number of positivesamples that are classified incorrectly and fn stands for thenumber of negative samples that are classified incorrectly

44 Analysis of Impacts of Different Factors First we carriedon experiments with our proposed method Scores of preci-sion recall and 1198651-measure are shown in Table 3 includingmean scores of all personality traits

441 Impacts of Different Categories of Feature Sets In thissection we conducted experiments on social network fea-tures linguistic features emotional statistical features topicfeatures and all features respectively Due to space restric-tions Figure 3 only illustrates 1198651-measure of all personalitytraits along with mean 1198651-measure of them

It can be observed that experiments which are conductedon linguistic features emotional statistical features and topicfeatures result in unsatisfactory performance with respectto social network features and social network features havethe best classification performance for extraversion which isconsistent with the conclusion in [33] Since social networkfeatures reflect a userrsquos pattern of social status and habitswhich are relatively important decisive factors in userrsquospersonality traits as a result they provide more informationthan other types of features Nevertheless whether more orless each kind of characteristic provides unique informationin userrsquos personality traits prediction Therefore carryingon experiment with combination of social network featureslinguistic features emotional statistical features and topicfeatures can achieve better performance

442 Impact of Weighted Features Additionally we con-ducted experiments on unweighted features Here we onlypresented the most successful results in Table 4 includingmean scores of all personality traits

Mathematical Problems in Engineering 11

1

featuresEssential

featuresLinguistic

featuresEmotional

featuresTopic

featuresAll

Categories of features

EXTNEUAGR

CONOPNMEAN

09

08

07

06

05

04

03

F-s

core

s

Figure 3 1198651-measures of five personality traits along with mean1198651-measures of them with different categories of feature sets

Table 4 Precision recall and 1198651-measure of predictingmodel withunweighted features (in bold best performance)

Precision Recall 1198651-measureEXT 085 076 082NEU 092 068 076AGR 094 064 076CON 091 057 068OPN 087 078 083Mean 090 0686 077

Table 5 Precision recall and 1198651-measure of predictingmodel withunified thresholds (in bold best performance)

Precision Recall 1198651-measureEXT 062 054 057NEU 080 016 025AGR 057 073 063CON 057 077 060OPN 070 100 082Mean 0652 064 0574

From Tables 3 and 4 we can observe that the best resultsfor myPersonality dataset are achieved after distributingweights to features It illustrates that considering correlationsbetween features and userrsquos personality traits can bring per-formance gain to the prediction task since there are limitedfeatures that may remain useful for userrsquos personality traitsprediction

443 Impact of Dynamic Updated Thresholds of PersonalityTraits In addition we predicted userrsquos personality traits withunified thresholds as well The results are summarized inTable 5

Multilabel learning task is to predict one or more cat-egories for each instance In the existing algorithms suchas PT5 method proposed by Tsoumakas and Katakis [34]

Table 6 Precision of different methods on Facebook dataset Thebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 093 086 096 089 091 0910Verhoevenet al [42] SVM 079 071 067 072 087 0752

Farnadi etal [33]

SVM kNNNB 058 054 050 055 060 0554

Alam et al[43]

SVM BLRmNB 058 059 059 059 060 0590

Tomlinsonet al [44] LR NA NA NA NA NA NA

generally thresholdswere unified to the same valueHoweverall categories employing a unified minimum threshold arenot appropriate If threshold is set too high not all categoriestags will be predicted on the other hand if threshold is settoo low too many classes will be predicted in results Hencein this paper we updated thresholds of personality traitsdynamically according to error rates of them Comparedto Table 3 based on thresholds obtained through updatingdynamically our method can achieve better results

45 Performance EvaluationwithDifferentMethods Asmen-tioned in Section 41 in the Workshop on ComputationalPersonality Recognition (Shared Task) different systems forpersonality recognition from text and network features ona common benchmark have been compared Verhoeven etal [42] proposed a meta-learning approach which can beextended to certain component classifiers from other genreswith other class systems or even from other languagesFarnadi et al [33] leveraged SVM 119896 nearest neighbors(kNN) [17] and Naıve Bayes (NB) for automatic recognitionof personality traits from usersrsquo Facebook statues updatesrespectively even with a small set of training dataset it couldachieve better results than most baseline algorithms On thebasis of a set of features extracted from Facebook datasetAlam et al [43] explored suitability and performance ofseveral classification techniques which were SVM BayesianLogistic Regression (BLR) [47] andMultinomial Naıve Bayes(mNB) [48] Tomlinson et al [44] used ranking algorithmsfor feature selection and Logistic Regression (LR) [49] aslearning algorithms which achieved a high performance Inthis paper we tested our proposed method with the sameset than the participants in the Workshop on ComputationalPersonality Recognition (Shared Task) In terms of precisionrecall and 1198651-measure a comparison of the works describedabove as well as our proposed method (denoted by CP) issummarized in Tables 6 7 and 8

From Tables 6 7 and 8 the experimental results indicatethat openness is the easiest personality trait to be predictedby Facebook features with the above methods because thereare more users with positive level of openness than otherpersonality traits in Facebook dataset Since Verhoeven et al[42] trained an ensemble SVM model based on both Face-book and Essays personality traits dataset as a consequence

12 Mathematical Problems in Engineering

Table 7 Recall of different methods on Facebook dataset The bestperformance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 076 067 074 070 079 0732Verhoevenet al [42] SVM 079 072 068 072 087 0756

Farnadi etal [33]

SVMkNN NB 061 053 050 054 070 0576

Alam et al[43]

SVMBLR mNB 058 058 059 059 060 0588

Tomlinsonet al [44] LR NA NA NA NA NA NA

Table 8 1198651-measure of different methods on Facebook datasetThebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 083 075 083 078 084 0806Verhoevenet al [42] SVM 079 070 067 072 086 0748

Farnadi etal [33]

SVMkNN NB 056 052 050 054 061 0546

Alam et al[43]

SVMBLR mNB 058 058 058 059 060 0586

Tomlinsonet al [44] LR NA NA NA NA NA 0630

when tested on the same Facebook dataset it can achievebetter results than our proposed method Furthermorewhen trained and tested on the same Facebook dataset ourmethod outperforms the methods in [33 43 44] which donot take weighted features dynamic updated thresholds ofpersonality traits and correlations between personality traitsinto consideration

5 Conclusion

In this paper we studied the problem of exploiting interde-pendencies between userrsquos personality traits for predictinguserrsquos personality traits First after analyzing importance offeatures we conducted experiments on Facebook datasetto demonstrate the existence of correlations between userrsquospersonality traits Bearing this in mind an unsupervisedframework adaptive conditional probability-based predict-ing model was then proposed to predict userrsquos Big Fivepersonality traits based on importance of features dynamicupdated thresholds of personality traits and prior knowledgeabout correlations between personality traits Furthermorewe compared our results with the ones achieved by othersin the Workshop on Computational Personality Recognition(Shared Task) on the same dataset In general the experimen-tal results demonstrated the effectiveness of our proposedframework

In future work we will speculate on what directions canbe undertaken to ameliorate its performance with respect

to time complexity so as to better apply it to a big dataenvironment as in Facebook monitoring Besides in orderto make our framework applicable to dynamic networksbetter we will explore combining time series analysis withpersonality traits predicting algorithm to capture dynamicevolution process of information and network structureFurthermore since social network features (described inSection 311) are specific for our dataset they may be difficult(or even impossible) to obtain in another dataset hencewe will conduct experiments on other datasets such as theTweeter corpus of PAN shared task (httppanwebisde)on author profiling where also personality is considered tofurther validate the effectiveness of the proposed frameworkbased on different features

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China under Grant no 61300148 the Scientificand Technological Break-Through Program of Jilin ProvinceunderGrant no 20130206051GX the Science andTechnologyDevelopment Program of Jilin Province under Grant no20130522112JH the Science Foundation for China Postdoc-tor under Grant no 2012M510879 and the Basic Scien-tific Research Foundation for the Interdisciplinary Researchand Innovation Project of Jilin University under Grant no201103129

References

[1] M Oussalah F Bhat K Challis and T Schnier ldquoA softwarearchitecture for Twitter collection search and geolocationservicesrdquo Knowledge-Based Systems vol 37 pp 105ndash120 2013

[2] R R McCrae and P T Costa ldquoValidation of the 5-factor modelof personality across instruments and observersrdquo Journal ofPersonality and Social Psychology vol 52 no 1 pp 81ndash90 1987

[3] G Saucier and L R Goldberg ldquoThe structure of personalityattributesrdquo in Personality and Work Reconsidering the Role ofPersonality in Organizations M R Barrick and A M RyanEds pp 1ndash29 Jossey-Bass 2003

[4] B Caci M Cardaci M E Tabacchi and F Scrima ldquoPersonalityvariables as predictors of facebook usagerdquoPsychological Reportsvol 114 no 2 pp 528ndash539 2014

[5] M Kosinski Y Bachrach P Kohli D Stillwell and T GraepelldquoManifestations of user personality in website choice andbehaviour on online social networksrdquoMachine Learning vol 95no 3 pp 357ndash380 2014

[6] M-G Cojocaru H Thille E Thommes D Nelson and SGreenhalgh ldquoSocial influence and dynamic demand for newproductsrdquoEnvironmentalModellingamp Software vol 50 pp 169ndash185 2013

[7] F Durupinar N Pelechano J Allbeck U Gudukbay andN Badler ldquoThe impact of the OCEAN personality modelon the perception of crowdsrdquo IEEE Computer Graphics andApplications vol 31 no 3 pp 22ndash31 2011

Mathematical Problems in Engineering 13

[8] X Feng ldquoStudy on customer personality characteristics andrelationship outcomes using SEM analysisrdquo in Proceedings ofthe International Conference on Mechatronics and InformationTechnology pp 841ndash844 2013

[9] J G Lee SMoon andK Salamatian ldquoModeling and predictingthe popularity of online contents with Cox proportional hazardregression modelrdquo Neurocomputing vol 76 no 1 pp 134ndash1452012

[10] J W Stoughton L F Thompson and A W Meade ldquoBigfive personality traits reflected in job applicantsrsquo social mediapostingsrdquo Cyberpsychology Behavior and Social Networkingvol 16 no 11 pp 800ndash805 2013

[11] D A Cobb-Clark and S Schurer ldquoThe stability of big-fivepersonality traitsrdquo Economics Letters vol 115 no 1 pp 11ndash152012

[12] W Mischel ldquoToward an integrative science of the personrdquoAnnual Review of Psychology vol 55 pp 1ndash22 2004

[13] S Argamon S Dhawle M Koppel and J Pennebaker ldquoLexicalpredictors 19 of personality typerdquo in Proceedings of the JointAnnual Meeting of the Interface and the Classification Society ofNorth America 2005

[14] F Mairesse M AWalker M R Mehl and R K Moore ldquoUsinglinguistic cues for the automatic recognition of personality inconversation and textrdquo Journal of Artificial Intelligence Researchvol 30 pp 457ndash500 2007

[15] K Moore and J C McElroy ldquoThe influence of personalityon Facebook usage wall postings and regretrdquo Computers inHuman Behavior vol 28 no 1 pp 267ndash274 2012

[16] J W Pennebaker and L A King ldquoLinguistic styles languageuse as an individual differencerdquo Journal of Personality and SocialPsychology vol 77 no 6 pp 1296ndash1312 1999

[17] J Han and M Kamber Data Mining Concepts and TechniquesThe Morgan Kaufmann Series in Data Management Systemsedited by J Gray Morgan Kaufmann Boston Mass USA 3rdedition 2011

[18] J Golbeck C Robles and K Turner ldquoPredicting personalitywith social mediardquo in Proceedings of the 29th Annual CHIConference on Human Factors in Computing Systems (CHI rsquo11)pp 253ndash262 ACM May 2011

[19] J RQuinlan ldquoLearningwith continuous classesrdquo inProceedingsof the 5thAustralian Joint Conference onArtificial Intelligence (Airsquo92) pp 343ndash348 Hobart Australia November 1992

[20] S Roweis and Z Ghahramani ldquoA unifying review of lineargaussian modelsrdquo Neural Computation vol 11 no 2 pp 305ndash345 1999

[21] S Bai T Zhu and L Cheng ldquoBig-five personality pre-diction based on user behaviors at social network sitesrdquohttparxivorgpdf12044809v1pdf

[22] J Oberlander and S Nowson ldquoWhose thumb is it anywayclassifying author personality from weblog textrdquo in Proceedingsof the 44th Annual Meeting of the Association for ComputationalLinguistics pp 627ndash634 2006

[23] T Nguyen D Q Phung B Adams and S Venkatesh ldquoTowardsdiscovery of influence and personality traits through social linkpredictionrdquo in Proceedings of the International Conference onWeblogs and Social Media pp 566ndash569 Barcelona Spain July2011

[24] S T Bai B B Hao A Li S Yuan R Gao and T S ZhuldquoPredicting big five personality traits of microblog usersrdquo inProceedings of the 12th IEEEWICACM International JointConferences on Web Intelligence and Intelligent Agent Technolog(WI rsquo13) pp 501ndash508 November 2013

[25] R Sun and N Wilson ldquoA model of personality should be acognitive architecture itselfrdquo Cognitive Systems Research vol29-30 no 1 pp 1ndash30 2014

[26] A Ortigosa R M Carro and J I Quiroga ldquoPredicting userpersonality by mining social interactions in Facebookrdquo Journalof Computer and System Sciences vol 80 no 1 pp 57ndash71 2014

[27] O P John L P Naumann and C J Soto ldquoParadigm shift tothe integrative Big Five trait taxonomy history measurementand conceptualissuesrdquo in Handbook of Personality Theory andResearch O P John R W Robins and L A Pervin Eds pp114ndash158 Guilford Press New York NY USA 3rd edition 2008

[28] G Park H A Schwartz J C Eichstaedt et al ldquoAutomaticpersonality assessment through social media languagerdquo Journalof Personality and Social Psychology vol 108 no 6 pp 934ndash9522015

[29] C J Soto O P John S D Gosling and J Potter ldquoAge differencesin personality traits from 10 to 65 big five domains and facets ina large cross-sectional samplerdquo Journal of Personality and SocialPsychology vol 100 no 2 pp 330ndash348 2011

[30] C J Soto and O P John ldquoTen facet scales for the BigFive Inventory convergence with NEO PI-R facets self-peeragreement and discriminant validityrdquo Journal of Research inPersonality vol 43 no 1 pp 84ndash90 2009

[31] M V Kosti R Feldt and L Angelis ldquoPersonality emotionalintelligence and work preferences in software engineering anempirical studyrdquo Information and Software Technology vol 56no 8 pp 973ndash990 2014

[32] H A Schwartz J C Eichstaedt M L Kern et al ldquoPersonalitygender and age in the language of social media the open-vocabulary approachrdquo PLoS ONE vol 8 no 9 Article IDe73791 2013

[33] G Farnadi S ZoghbiM FMoens andMD Cock ldquoRecognis-ing personality traits using facebook status updatesrdquo in Proceed-ings of the Workshop on Computational Personality RecognitionJuly 2013 httpcliccimecunitnitfabiowcpr13farnadi wcpr-13pdf

[34] G Tsoumakas and I Katakis ldquoMulti-label classification anoverviewrdquo International Journal of Data Warehousing and Min-ing vol 3 no 3 pp 1ndash13 2007

[35] M Kosinski D Stillwell and T Graepel ldquoPrivate traits andattributes are predictable from digital records of human behav-iorrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 110 no 15 pp 5802ndash5805 2013

[36] M Belkin and P Niyogi ldquoLaplacian eigenmaps for dimension-ality reduction and data representationrdquo Neural Computationvol 15 no 6 pp 1373ndash1396 2003

[37] R Socher J Bauer C D Manning and Y N Andrew ldquoParsingwith compositional vector grammarsrdquo in Proceedings of the 51stAnnual Meeting of the Association for Computational Linguistics(ACL rsquo13) pp 455ndash465 Sofia Bulgaria August 2013

[38] K Kazama M Imada and K Kashiwagi ldquoCharacteristics ofinformation diffusion in blogs in relation to information sourcetyperdquo Neurocomputing vol 76 no 1 pp 84ndash92 2012

[39] M B David Y N Andrew and I J Michael ldquoLatent Dirichletallocationrdquo Journal ofMachine Learning Research vol 3 no 4-5pp 993ndash1022 2003

[40] P T Costa andR RMcCraeRevisedNEOPersonality Inventory(NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Profes-sional Manual Psychological Assessment Resources 1992

[41] R L Atkinson C A Richard E S Edward J B Daryland S Nolen-Hoeksema Hilgardrsquos Introduction to Psychology

14 Mathematical Problems in Engineering

Harcourt College Publishers Orlando Fla USA 13th edition2000

[42] B Verhoeven W Daelemans and T D Smedt ldquoEnsemblemethods for personality recognitionrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 35ndash38Boston Mass USA July 2013

[43] F Alam A Stepanov and G Riccardi ldquoPersonality traitsrecognition on social networkmdashfacebookrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 6ndash9Boston Mass USA July 2013

[44] M T Tomlinson D Hinote and D B Bracewell ldquoPredictingconscientiousness through semantic analysis of facebook postsrdquoin Proceedings of the Workshop on Computational PersonalityRecognition pp 31ndash34 July 2013

[45] D M Endres and J E Schindelin ldquoA newmetric for probabilitydistributionsrdquo IEEETransactions on InformationTheory vol 49no 7 pp 1858ndash1860 2003

[46] S Kullback and R A Leibler ldquoOn information and sufficiencyrdquoThe Annals of Mathematical Statistics vol 22 no 1 pp 79ndash861951

[47] A Genkin D D Lewis and D Madigan ldquoLarge-scale Bayesianlogistic regression for text categorizationrdquo Technometrics vol49 no 3 pp 291ndash304 2007

[48] A K McCallum and K Nigam ldquoA comparison of event modelsfor naive bayes text classificationrdquo in Proceedings of the AAAI-98 Workshop on Learning for Text Categorization pp 137ndash142Madison Wis USA July 1998

[49] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article A Novel Adaptive Conditional Probability ...downloads.hindawi.com/journals/mpe/2015/472917.pdf · Research Article A Novel Adaptive Conditional Probability-Based

10 Mathematical Problems in Engineering

Table 2 Jensen-Shannon divergence between personality traits 119901and 119899 stand for positive level and negative level of a certain personal-ity traitThe bold number is the highest Jensen-Shannon divergenceand the italic number is the lowest Jensen-Shannon divergenceSmaller Jensen-Shannon divergence means better consistency

EXT NEU AGR CON OPN119901 119899 119901 119899 119901 119899 119901 119899 119901 119899

EXT 119901 058 1479 044 219 070 236 073 084119899 479 449 521 236 561 191 1043 211

NEU 119901 080 212 110 246 298 040119899 1978 294 1697 305 2473 361

AGR 119901 077 329 082 091119899 340 086 463 125

CON 119901 115 117119899 616 140

OPN 119901

119899

a preference for novelty and variety a person has alongwith positive level of agreeableness and positive level ofconscientiousness which are compatible Furthermore it isin conformity with Table 1 that positive level of extraversionhas smaller Jensen-Shannon divergences with positive levelof neuroticism and positive level of agreeableness besidespositive level of neuroticism has a smaller Jensen-Shannondivergence with negative level of openness

In addition Soto et al [29] demonstrated convergent anddiscriminate correlations for the Big Five Inventory Soto andJohn [30] also illustrated that there was strong convergencebetween each BFI facet scale and corresponding NEO PI-Rfacet Moreover Park et al [28] demonstrated convergencewith self-reports of personality at the domain- and facet-levelIn summary although utilizing different Big Five measuresit can be found that correlations in myPersonality datasetare also in keeping with the findings from other samplesHence correlations between personality traits are expectedand actually prove that myPersonality results are validSo leveraging prior knowledge about relationships betweenpersonality traits may contribute to a better prediction onpersonality traits The results provide a better theoreticalsupport for our proposed method

43 Evaluation Metric Since most researchers exploringon personality recognition leveraged different measures toevaluate their experiments on various datasets it was hard tocompletely appraise their performance and quality Howevera common dataset which was annotated with gold standardpersonality labels was available in the Workshop on Com-putational Personality Recognition (Shared Task) users werefree to split the training and test sets as they wish andprecision recall and 1198651-measure were suggested to be usedto evaluate results of predictions So in this paper for morereliable results the proposed method was also evaluated withstratified 5-fold cross-validation Since there was no trainingset in unsupervised learning the common dataset was splitinto folds and in order to avoid overfitting the same author

Table 3 Precision recall and 1198651-measure of adaptive conditionalprobability-based predicting model (in bold best performance)

Precision Recall 1198651-measureEXT 093 076 083NEU 086 067 075AGR 096 074 083CON 089 070 078OPN 091 079 084Mean 091 0732 0806

did not appear twofold at the same timeThemean precisionrecall and 1198651-measure are calculated as follows

Precision = tptp + fp

Recall = tptp + fn

1198651-measure = 2 times 119875 times 119877119875 + 119877

(16)

where tp stands for the number of positive samples thatare classified correctly fp stands for the number of positivesamples that are classified incorrectly and fn stands for thenumber of negative samples that are classified incorrectly

44 Analysis of Impacts of Different Factors First we carriedon experiments with our proposed method Scores of preci-sion recall and 1198651-measure are shown in Table 3 includingmean scores of all personality traits

441 Impacts of Different Categories of Feature Sets In thissection we conducted experiments on social network fea-tures linguistic features emotional statistical features topicfeatures and all features respectively Due to space restric-tions Figure 3 only illustrates 1198651-measure of all personalitytraits along with mean 1198651-measure of them

It can be observed that experiments which are conductedon linguistic features emotional statistical features and topicfeatures result in unsatisfactory performance with respectto social network features and social network features havethe best classification performance for extraversion which isconsistent with the conclusion in [33] Since social networkfeatures reflect a userrsquos pattern of social status and habitswhich are relatively important decisive factors in userrsquospersonality traits as a result they provide more informationthan other types of features Nevertheless whether more orless each kind of characteristic provides unique informationin userrsquos personality traits prediction Therefore carryingon experiment with combination of social network featureslinguistic features emotional statistical features and topicfeatures can achieve better performance

442 Impact of Weighted Features Additionally we con-ducted experiments on unweighted features Here we onlypresented the most successful results in Table 4 includingmean scores of all personality traits

Mathematical Problems in Engineering 11

1

featuresEssential

featuresLinguistic

featuresEmotional

featuresTopic

featuresAll

Categories of features

EXTNEUAGR

CONOPNMEAN

09

08

07

06

05

04

03

F-s

core

s

Figure 3 1198651-measures of five personality traits along with mean1198651-measures of them with different categories of feature sets

Table 4 Precision recall and 1198651-measure of predictingmodel withunweighted features (in bold best performance)

Precision Recall 1198651-measureEXT 085 076 082NEU 092 068 076AGR 094 064 076CON 091 057 068OPN 087 078 083Mean 090 0686 077

Table 5 Precision recall and 1198651-measure of predictingmodel withunified thresholds (in bold best performance)

Precision Recall 1198651-measureEXT 062 054 057NEU 080 016 025AGR 057 073 063CON 057 077 060OPN 070 100 082Mean 0652 064 0574

From Tables 3 and 4 we can observe that the best resultsfor myPersonality dataset are achieved after distributingweights to features It illustrates that considering correlationsbetween features and userrsquos personality traits can bring per-formance gain to the prediction task since there are limitedfeatures that may remain useful for userrsquos personality traitsprediction

443 Impact of Dynamic Updated Thresholds of PersonalityTraits In addition we predicted userrsquos personality traits withunified thresholds as well The results are summarized inTable 5

Multilabel learning task is to predict one or more cat-egories for each instance In the existing algorithms suchas PT5 method proposed by Tsoumakas and Katakis [34]

Table 6 Precision of different methods on Facebook dataset Thebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 093 086 096 089 091 0910Verhoevenet al [42] SVM 079 071 067 072 087 0752

Farnadi etal [33]

SVM kNNNB 058 054 050 055 060 0554

Alam et al[43]

SVM BLRmNB 058 059 059 059 060 0590

Tomlinsonet al [44] LR NA NA NA NA NA NA

generally thresholdswere unified to the same valueHoweverall categories employing a unified minimum threshold arenot appropriate If threshold is set too high not all categoriestags will be predicted on the other hand if threshold is settoo low too many classes will be predicted in results Hencein this paper we updated thresholds of personality traitsdynamically according to error rates of them Comparedto Table 3 based on thresholds obtained through updatingdynamically our method can achieve better results

45 Performance EvaluationwithDifferentMethods Asmen-tioned in Section 41 in the Workshop on ComputationalPersonality Recognition (Shared Task) different systems forpersonality recognition from text and network features ona common benchmark have been compared Verhoeven etal [42] proposed a meta-learning approach which can beextended to certain component classifiers from other genreswith other class systems or even from other languagesFarnadi et al [33] leveraged SVM 119896 nearest neighbors(kNN) [17] and Naıve Bayes (NB) for automatic recognitionof personality traits from usersrsquo Facebook statues updatesrespectively even with a small set of training dataset it couldachieve better results than most baseline algorithms On thebasis of a set of features extracted from Facebook datasetAlam et al [43] explored suitability and performance ofseveral classification techniques which were SVM BayesianLogistic Regression (BLR) [47] andMultinomial Naıve Bayes(mNB) [48] Tomlinson et al [44] used ranking algorithmsfor feature selection and Logistic Regression (LR) [49] aslearning algorithms which achieved a high performance Inthis paper we tested our proposed method with the sameset than the participants in the Workshop on ComputationalPersonality Recognition (Shared Task) In terms of precisionrecall and 1198651-measure a comparison of the works describedabove as well as our proposed method (denoted by CP) issummarized in Tables 6 7 and 8

From Tables 6 7 and 8 the experimental results indicatethat openness is the easiest personality trait to be predictedby Facebook features with the above methods because thereare more users with positive level of openness than otherpersonality traits in Facebook dataset Since Verhoeven et al[42] trained an ensemble SVM model based on both Face-book and Essays personality traits dataset as a consequence

12 Mathematical Problems in Engineering

Table 7 Recall of different methods on Facebook dataset The bestperformance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 076 067 074 070 079 0732Verhoevenet al [42] SVM 079 072 068 072 087 0756

Farnadi etal [33]

SVMkNN NB 061 053 050 054 070 0576

Alam et al[43]

SVMBLR mNB 058 058 059 059 060 0588

Tomlinsonet al [44] LR NA NA NA NA NA NA

Table 8 1198651-measure of different methods on Facebook datasetThebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 083 075 083 078 084 0806Verhoevenet al [42] SVM 079 070 067 072 086 0748

Farnadi etal [33]

SVMkNN NB 056 052 050 054 061 0546

Alam et al[43]

SVMBLR mNB 058 058 058 059 060 0586

Tomlinsonet al [44] LR NA NA NA NA NA 0630

when tested on the same Facebook dataset it can achievebetter results than our proposed method Furthermorewhen trained and tested on the same Facebook dataset ourmethod outperforms the methods in [33 43 44] which donot take weighted features dynamic updated thresholds ofpersonality traits and correlations between personality traitsinto consideration

5 Conclusion

In this paper we studied the problem of exploiting interde-pendencies between userrsquos personality traits for predictinguserrsquos personality traits First after analyzing importance offeatures we conducted experiments on Facebook datasetto demonstrate the existence of correlations between userrsquospersonality traits Bearing this in mind an unsupervisedframework adaptive conditional probability-based predict-ing model was then proposed to predict userrsquos Big Fivepersonality traits based on importance of features dynamicupdated thresholds of personality traits and prior knowledgeabout correlations between personality traits Furthermorewe compared our results with the ones achieved by othersin the Workshop on Computational Personality Recognition(Shared Task) on the same dataset In general the experimen-tal results demonstrated the effectiveness of our proposedframework

In future work we will speculate on what directions canbe undertaken to ameliorate its performance with respect

to time complexity so as to better apply it to a big dataenvironment as in Facebook monitoring Besides in orderto make our framework applicable to dynamic networksbetter we will explore combining time series analysis withpersonality traits predicting algorithm to capture dynamicevolution process of information and network structureFurthermore since social network features (described inSection 311) are specific for our dataset they may be difficult(or even impossible) to obtain in another dataset hencewe will conduct experiments on other datasets such as theTweeter corpus of PAN shared task (httppanwebisde)on author profiling where also personality is considered tofurther validate the effectiveness of the proposed frameworkbased on different features

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China under Grant no 61300148 the Scientificand Technological Break-Through Program of Jilin ProvinceunderGrant no 20130206051GX the Science andTechnologyDevelopment Program of Jilin Province under Grant no20130522112JH the Science Foundation for China Postdoc-tor under Grant no 2012M510879 and the Basic Scien-tific Research Foundation for the Interdisciplinary Researchand Innovation Project of Jilin University under Grant no201103129

References

[1] M Oussalah F Bhat K Challis and T Schnier ldquoA softwarearchitecture for Twitter collection search and geolocationservicesrdquo Knowledge-Based Systems vol 37 pp 105ndash120 2013

[2] R R McCrae and P T Costa ldquoValidation of the 5-factor modelof personality across instruments and observersrdquo Journal ofPersonality and Social Psychology vol 52 no 1 pp 81ndash90 1987

[3] G Saucier and L R Goldberg ldquoThe structure of personalityattributesrdquo in Personality and Work Reconsidering the Role ofPersonality in Organizations M R Barrick and A M RyanEds pp 1ndash29 Jossey-Bass 2003

[4] B Caci M Cardaci M E Tabacchi and F Scrima ldquoPersonalityvariables as predictors of facebook usagerdquoPsychological Reportsvol 114 no 2 pp 528ndash539 2014

[5] M Kosinski Y Bachrach P Kohli D Stillwell and T GraepelldquoManifestations of user personality in website choice andbehaviour on online social networksrdquoMachine Learning vol 95no 3 pp 357ndash380 2014

[6] M-G Cojocaru H Thille E Thommes D Nelson and SGreenhalgh ldquoSocial influence and dynamic demand for newproductsrdquoEnvironmentalModellingamp Software vol 50 pp 169ndash185 2013

[7] F Durupinar N Pelechano J Allbeck U Gudukbay andN Badler ldquoThe impact of the OCEAN personality modelon the perception of crowdsrdquo IEEE Computer Graphics andApplications vol 31 no 3 pp 22ndash31 2011

Mathematical Problems in Engineering 13

[8] X Feng ldquoStudy on customer personality characteristics andrelationship outcomes using SEM analysisrdquo in Proceedings ofthe International Conference on Mechatronics and InformationTechnology pp 841ndash844 2013

[9] J G Lee SMoon andK Salamatian ldquoModeling and predictingthe popularity of online contents with Cox proportional hazardregression modelrdquo Neurocomputing vol 76 no 1 pp 134ndash1452012

[10] J W Stoughton L F Thompson and A W Meade ldquoBigfive personality traits reflected in job applicantsrsquo social mediapostingsrdquo Cyberpsychology Behavior and Social Networkingvol 16 no 11 pp 800ndash805 2013

[11] D A Cobb-Clark and S Schurer ldquoThe stability of big-fivepersonality traitsrdquo Economics Letters vol 115 no 1 pp 11ndash152012

[12] W Mischel ldquoToward an integrative science of the personrdquoAnnual Review of Psychology vol 55 pp 1ndash22 2004

[13] S Argamon S Dhawle M Koppel and J Pennebaker ldquoLexicalpredictors 19 of personality typerdquo in Proceedings of the JointAnnual Meeting of the Interface and the Classification Society ofNorth America 2005

[14] F Mairesse M AWalker M R Mehl and R K Moore ldquoUsinglinguistic cues for the automatic recognition of personality inconversation and textrdquo Journal of Artificial Intelligence Researchvol 30 pp 457ndash500 2007

[15] K Moore and J C McElroy ldquoThe influence of personalityon Facebook usage wall postings and regretrdquo Computers inHuman Behavior vol 28 no 1 pp 267ndash274 2012

[16] J W Pennebaker and L A King ldquoLinguistic styles languageuse as an individual differencerdquo Journal of Personality and SocialPsychology vol 77 no 6 pp 1296ndash1312 1999

[17] J Han and M Kamber Data Mining Concepts and TechniquesThe Morgan Kaufmann Series in Data Management Systemsedited by J Gray Morgan Kaufmann Boston Mass USA 3rdedition 2011

[18] J Golbeck C Robles and K Turner ldquoPredicting personalitywith social mediardquo in Proceedings of the 29th Annual CHIConference on Human Factors in Computing Systems (CHI rsquo11)pp 253ndash262 ACM May 2011

[19] J RQuinlan ldquoLearningwith continuous classesrdquo inProceedingsof the 5thAustralian Joint Conference onArtificial Intelligence (Airsquo92) pp 343ndash348 Hobart Australia November 1992

[20] S Roweis and Z Ghahramani ldquoA unifying review of lineargaussian modelsrdquo Neural Computation vol 11 no 2 pp 305ndash345 1999

[21] S Bai T Zhu and L Cheng ldquoBig-five personality pre-diction based on user behaviors at social network sitesrdquohttparxivorgpdf12044809v1pdf

[22] J Oberlander and S Nowson ldquoWhose thumb is it anywayclassifying author personality from weblog textrdquo in Proceedingsof the 44th Annual Meeting of the Association for ComputationalLinguistics pp 627ndash634 2006

[23] T Nguyen D Q Phung B Adams and S Venkatesh ldquoTowardsdiscovery of influence and personality traits through social linkpredictionrdquo in Proceedings of the International Conference onWeblogs and Social Media pp 566ndash569 Barcelona Spain July2011

[24] S T Bai B B Hao A Li S Yuan R Gao and T S ZhuldquoPredicting big five personality traits of microblog usersrdquo inProceedings of the 12th IEEEWICACM International JointConferences on Web Intelligence and Intelligent Agent Technolog(WI rsquo13) pp 501ndash508 November 2013

[25] R Sun and N Wilson ldquoA model of personality should be acognitive architecture itselfrdquo Cognitive Systems Research vol29-30 no 1 pp 1ndash30 2014

[26] A Ortigosa R M Carro and J I Quiroga ldquoPredicting userpersonality by mining social interactions in Facebookrdquo Journalof Computer and System Sciences vol 80 no 1 pp 57ndash71 2014

[27] O P John L P Naumann and C J Soto ldquoParadigm shift tothe integrative Big Five trait taxonomy history measurementand conceptualissuesrdquo in Handbook of Personality Theory andResearch O P John R W Robins and L A Pervin Eds pp114ndash158 Guilford Press New York NY USA 3rd edition 2008

[28] G Park H A Schwartz J C Eichstaedt et al ldquoAutomaticpersonality assessment through social media languagerdquo Journalof Personality and Social Psychology vol 108 no 6 pp 934ndash9522015

[29] C J Soto O P John S D Gosling and J Potter ldquoAge differencesin personality traits from 10 to 65 big five domains and facets ina large cross-sectional samplerdquo Journal of Personality and SocialPsychology vol 100 no 2 pp 330ndash348 2011

[30] C J Soto and O P John ldquoTen facet scales for the BigFive Inventory convergence with NEO PI-R facets self-peeragreement and discriminant validityrdquo Journal of Research inPersonality vol 43 no 1 pp 84ndash90 2009

[31] M V Kosti R Feldt and L Angelis ldquoPersonality emotionalintelligence and work preferences in software engineering anempirical studyrdquo Information and Software Technology vol 56no 8 pp 973ndash990 2014

[32] H A Schwartz J C Eichstaedt M L Kern et al ldquoPersonalitygender and age in the language of social media the open-vocabulary approachrdquo PLoS ONE vol 8 no 9 Article IDe73791 2013

[33] G Farnadi S ZoghbiM FMoens andMD Cock ldquoRecognis-ing personality traits using facebook status updatesrdquo in Proceed-ings of the Workshop on Computational Personality RecognitionJuly 2013 httpcliccimecunitnitfabiowcpr13farnadi wcpr-13pdf

[34] G Tsoumakas and I Katakis ldquoMulti-label classification anoverviewrdquo International Journal of Data Warehousing and Min-ing vol 3 no 3 pp 1ndash13 2007

[35] M Kosinski D Stillwell and T Graepel ldquoPrivate traits andattributes are predictable from digital records of human behav-iorrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 110 no 15 pp 5802ndash5805 2013

[36] M Belkin and P Niyogi ldquoLaplacian eigenmaps for dimension-ality reduction and data representationrdquo Neural Computationvol 15 no 6 pp 1373ndash1396 2003

[37] R Socher J Bauer C D Manning and Y N Andrew ldquoParsingwith compositional vector grammarsrdquo in Proceedings of the 51stAnnual Meeting of the Association for Computational Linguistics(ACL rsquo13) pp 455ndash465 Sofia Bulgaria August 2013

[38] K Kazama M Imada and K Kashiwagi ldquoCharacteristics ofinformation diffusion in blogs in relation to information sourcetyperdquo Neurocomputing vol 76 no 1 pp 84ndash92 2012

[39] M B David Y N Andrew and I J Michael ldquoLatent Dirichletallocationrdquo Journal ofMachine Learning Research vol 3 no 4-5pp 993ndash1022 2003

[40] P T Costa andR RMcCraeRevisedNEOPersonality Inventory(NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Profes-sional Manual Psychological Assessment Resources 1992

[41] R L Atkinson C A Richard E S Edward J B Daryland S Nolen-Hoeksema Hilgardrsquos Introduction to Psychology

14 Mathematical Problems in Engineering

Harcourt College Publishers Orlando Fla USA 13th edition2000

[42] B Verhoeven W Daelemans and T D Smedt ldquoEnsemblemethods for personality recognitionrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 35ndash38Boston Mass USA July 2013

[43] F Alam A Stepanov and G Riccardi ldquoPersonality traitsrecognition on social networkmdashfacebookrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 6ndash9Boston Mass USA July 2013

[44] M T Tomlinson D Hinote and D B Bracewell ldquoPredictingconscientiousness through semantic analysis of facebook postsrdquoin Proceedings of the Workshop on Computational PersonalityRecognition pp 31ndash34 July 2013

[45] D M Endres and J E Schindelin ldquoA newmetric for probabilitydistributionsrdquo IEEETransactions on InformationTheory vol 49no 7 pp 1858ndash1860 2003

[46] S Kullback and R A Leibler ldquoOn information and sufficiencyrdquoThe Annals of Mathematical Statistics vol 22 no 1 pp 79ndash861951

[47] A Genkin D D Lewis and D Madigan ldquoLarge-scale Bayesianlogistic regression for text categorizationrdquo Technometrics vol49 no 3 pp 291ndash304 2007

[48] A K McCallum and K Nigam ldquoA comparison of event modelsfor naive bayes text classificationrdquo in Proceedings of the AAAI-98 Workshop on Learning for Text Categorization pp 137ndash142Madison Wis USA July 1998

[49] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article A Novel Adaptive Conditional Probability ...downloads.hindawi.com/journals/mpe/2015/472917.pdf · Research Article A Novel Adaptive Conditional Probability-Based

Mathematical Problems in Engineering 11

1

featuresEssential

featuresLinguistic

featuresEmotional

featuresTopic

featuresAll

Categories of features

EXTNEUAGR

CONOPNMEAN

09

08

07

06

05

04

03

F-s

core

s

Figure 3 1198651-measures of five personality traits along with mean1198651-measures of them with different categories of feature sets

Table 4 Precision recall and 1198651-measure of predictingmodel withunweighted features (in bold best performance)

Precision Recall 1198651-measureEXT 085 076 082NEU 092 068 076AGR 094 064 076CON 091 057 068OPN 087 078 083Mean 090 0686 077

Table 5 Precision recall and 1198651-measure of predictingmodel withunified thresholds (in bold best performance)

Precision Recall 1198651-measureEXT 062 054 057NEU 080 016 025AGR 057 073 063CON 057 077 060OPN 070 100 082Mean 0652 064 0574

From Tables 3 and 4 we can observe that the best resultsfor myPersonality dataset are achieved after distributingweights to features It illustrates that considering correlationsbetween features and userrsquos personality traits can bring per-formance gain to the prediction task since there are limitedfeatures that may remain useful for userrsquos personality traitsprediction

443 Impact of Dynamic Updated Thresholds of PersonalityTraits In addition we predicted userrsquos personality traits withunified thresholds as well The results are summarized inTable 5

Multilabel learning task is to predict one or more cat-egories for each instance In the existing algorithms suchas PT5 method proposed by Tsoumakas and Katakis [34]

Table 6 Precision of different methods on Facebook dataset Thebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 093 086 096 089 091 0910Verhoevenet al [42] SVM 079 071 067 072 087 0752

Farnadi etal [33]

SVM kNNNB 058 054 050 055 060 0554

Alam et al[43]

SVM BLRmNB 058 059 059 059 060 0590

Tomlinsonet al [44] LR NA NA NA NA NA NA

generally thresholdswere unified to the same valueHoweverall categories employing a unified minimum threshold arenot appropriate If threshold is set too high not all categoriestags will be predicted on the other hand if threshold is settoo low too many classes will be predicted in results Hencein this paper we updated thresholds of personality traitsdynamically according to error rates of them Comparedto Table 3 based on thresholds obtained through updatingdynamically our method can achieve better results

45 Performance EvaluationwithDifferentMethods Asmen-tioned in Section 41 in the Workshop on ComputationalPersonality Recognition (Shared Task) different systems forpersonality recognition from text and network features ona common benchmark have been compared Verhoeven etal [42] proposed a meta-learning approach which can beextended to certain component classifiers from other genreswith other class systems or even from other languagesFarnadi et al [33] leveraged SVM 119896 nearest neighbors(kNN) [17] and Naıve Bayes (NB) for automatic recognitionof personality traits from usersrsquo Facebook statues updatesrespectively even with a small set of training dataset it couldachieve better results than most baseline algorithms On thebasis of a set of features extracted from Facebook datasetAlam et al [43] explored suitability and performance ofseveral classification techniques which were SVM BayesianLogistic Regression (BLR) [47] andMultinomial Naıve Bayes(mNB) [48] Tomlinson et al [44] used ranking algorithmsfor feature selection and Logistic Regression (LR) [49] aslearning algorithms which achieved a high performance Inthis paper we tested our proposed method with the sameset than the participants in the Workshop on ComputationalPersonality Recognition (Shared Task) In terms of precisionrecall and 1198651-measure a comparison of the works describedabove as well as our proposed method (denoted by CP) issummarized in Tables 6 7 and 8

From Tables 6 7 and 8 the experimental results indicatethat openness is the easiest personality trait to be predictedby Facebook features with the above methods because thereare more users with positive level of openness than otherpersonality traits in Facebook dataset Since Verhoeven et al[42] trained an ensemble SVM model based on both Face-book and Essays personality traits dataset as a consequence

12 Mathematical Problems in Engineering

Table 7 Recall of different methods on Facebook dataset The bestperformance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 076 067 074 070 079 0732Verhoevenet al [42] SVM 079 072 068 072 087 0756

Farnadi etal [33]

SVMkNN NB 061 053 050 054 070 0576

Alam et al[43]

SVMBLR mNB 058 058 059 059 060 0588

Tomlinsonet al [44] LR NA NA NA NA NA NA

Table 8 1198651-measure of different methods on Facebook datasetThebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 083 075 083 078 084 0806Verhoevenet al [42] SVM 079 070 067 072 086 0748

Farnadi etal [33]

SVMkNN NB 056 052 050 054 061 0546

Alam et al[43]

SVMBLR mNB 058 058 058 059 060 0586

Tomlinsonet al [44] LR NA NA NA NA NA 0630

when tested on the same Facebook dataset it can achievebetter results than our proposed method Furthermorewhen trained and tested on the same Facebook dataset ourmethod outperforms the methods in [33 43 44] which donot take weighted features dynamic updated thresholds ofpersonality traits and correlations between personality traitsinto consideration

5 Conclusion

In this paper we studied the problem of exploiting interde-pendencies between userrsquos personality traits for predictinguserrsquos personality traits First after analyzing importance offeatures we conducted experiments on Facebook datasetto demonstrate the existence of correlations between userrsquospersonality traits Bearing this in mind an unsupervisedframework adaptive conditional probability-based predict-ing model was then proposed to predict userrsquos Big Fivepersonality traits based on importance of features dynamicupdated thresholds of personality traits and prior knowledgeabout correlations between personality traits Furthermorewe compared our results with the ones achieved by othersin the Workshop on Computational Personality Recognition(Shared Task) on the same dataset In general the experimen-tal results demonstrated the effectiveness of our proposedframework

In future work we will speculate on what directions canbe undertaken to ameliorate its performance with respect

to time complexity so as to better apply it to a big dataenvironment as in Facebook monitoring Besides in orderto make our framework applicable to dynamic networksbetter we will explore combining time series analysis withpersonality traits predicting algorithm to capture dynamicevolution process of information and network structureFurthermore since social network features (described inSection 311) are specific for our dataset they may be difficult(or even impossible) to obtain in another dataset hencewe will conduct experiments on other datasets such as theTweeter corpus of PAN shared task (httppanwebisde)on author profiling where also personality is considered tofurther validate the effectiveness of the proposed frameworkbased on different features

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China under Grant no 61300148 the Scientificand Technological Break-Through Program of Jilin ProvinceunderGrant no 20130206051GX the Science andTechnologyDevelopment Program of Jilin Province under Grant no20130522112JH the Science Foundation for China Postdoc-tor under Grant no 2012M510879 and the Basic Scien-tific Research Foundation for the Interdisciplinary Researchand Innovation Project of Jilin University under Grant no201103129

References

[1] M Oussalah F Bhat K Challis and T Schnier ldquoA softwarearchitecture for Twitter collection search and geolocationservicesrdquo Knowledge-Based Systems vol 37 pp 105ndash120 2013

[2] R R McCrae and P T Costa ldquoValidation of the 5-factor modelof personality across instruments and observersrdquo Journal ofPersonality and Social Psychology vol 52 no 1 pp 81ndash90 1987

[3] G Saucier and L R Goldberg ldquoThe structure of personalityattributesrdquo in Personality and Work Reconsidering the Role ofPersonality in Organizations M R Barrick and A M RyanEds pp 1ndash29 Jossey-Bass 2003

[4] B Caci M Cardaci M E Tabacchi and F Scrima ldquoPersonalityvariables as predictors of facebook usagerdquoPsychological Reportsvol 114 no 2 pp 528ndash539 2014

[5] M Kosinski Y Bachrach P Kohli D Stillwell and T GraepelldquoManifestations of user personality in website choice andbehaviour on online social networksrdquoMachine Learning vol 95no 3 pp 357ndash380 2014

[6] M-G Cojocaru H Thille E Thommes D Nelson and SGreenhalgh ldquoSocial influence and dynamic demand for newproductsrdquoEnvironmentalModellingamp Software vol 50 pp 169ndash185 2013

[7] F Durupinar N Pelechano J Allbeck U Gudukbay andN Badler ldquoThe impact of the OCEAN personality modelon the perception of crowdsrdquo IEEE Computer Graphics andApplications vol 31 no 3 pp 22ndash31 2011

Mathematical Problems in Engineering 13

[8] X Feng ldquoStudy on customer personality characteristics andrelationship outcomes using SEM analysisrdquo in Proceedings ofthe International Conference on Mechatronics and InformationTechnology pp 841ndash844 2013

[9] J G Lee SMoon andK Salamatian ldquoModeling and predictingthe popularity of online contents with Cox proportional hazardregression modelrdquo Neurocomputing vol 76 no 1 pp 134ndash1452012

[10] J W Stoughton L F Thompson and A W Meade ldquoBigfive personality traits reflected in job applicantsrsquo social mediapostingsrdquo Cyberpsychology Behavior and Social Networkingvol 16 no 11 pp 800ndash805 2013

[11] D A Cobb-Clark and S Schurer ldquoThe stability of big-fivepersonality traitsrdquo Economics Letters vol 115 no 1 pp 11ndash152012

[12] W Mischel ldquoToward an integrative science of the personrdquoAnnual Review of Psychology vol 55 pp 1ndash22 2004

[13] S Argamon S Dhawle M Koppel and J Pennebaker ldquoLexicalpredictors 19 of personality typerdquo in Proceedings of the JointAnnual Meeting of the Interface and the Classification Society ofNorth America 2005

[14] F Mairesse M AWalker M R Mehl and R K Moore ldquoUsinglinguistic cues for the automatic recognition of personality inconversation and textrdquo Journal of Artificial Intelligence Researchvol 30 pp 457ndash500 2007

[15] K Moore and J C McElroy ldquoThe influence of personalityon Facebook usage wall postings and regretrdquo Computers inHuman Behavior vol 28 no 1 pp 267ndash274 2012

[16] J W Pennebaker and L A King ldquoLinguistic styles languageuse as an individual differencerdquo Journal of Personality and SocialPsychology vol 77 no 6 pp 1296ndash1312 1999

[17] J Han and M Kamber Data Mining Concepts and TechniquesThe Morgan Kaufmann Series in Data Management Systemsedited by J Gray Morgan Kaufmann Boston Mass USA 3rdedition 2011

[18] J Golbeck C Robles and K Turner ldquoPredicting personalitywith social mediardquo in Proceedings of the 29th Annual CHIConference on Human Factors in Computing Systems (CHI rsquo11)pp 253ndash262 ACM May 2011

[19] J RQuinlan ldquoLearningwith continuous classesrdquo inProceedingsof the 5thAustralian Joint Conference onArtificial Intelligence (Airsquo92) pp 343ndash348 Hobart Australia November 1992

[20] S Roweis and Z Ghahramani ldquoA unifying review of lineargaussian modelsrdquo Neural Computation vol 11 no 2 pp 305ndash345 1999

[21] S Bai T Zhu and L Cheng ldquoBig-five personality pre-diction based on user behaviors at social network sitesrdquohttparxivorgpdf12044809v1pdf

[22] J Oberlander and S Nowson ldquoWhose thumb is it anywayclassifying author personality from weblog textrdquo in Proceedingsof the 44th Annual Meeting of the Association for ComputationalLinguistics pp 627ndash634 2006

[23] T Nguyen D Q Phung B Adams and S Venkatesh ldquoTowardsdiscovery of influence and personality traits through social linkpredictionrdquo in Proceedings of the International Conference onWeblogs and Social Media pp 566ndash569 Barcelona Spain July2011

[24] S T Bai B B Hao A Li S Yuan R Gao and T S ZhuldquoPredicting big five personality traits of microblog usersrdquo inProceedings of the 12th IEEEWICACM International JointConferences on Web Intelligence and Intelligent Agent Technolog(WI rsquo13) pp 501ndash508 November 2013

[25] R Sun and N Wilson ldquoA model of personality should be acognitive architecture itselfrdquo Cognitive Systems Research vol29-30 no 1 pp 1ndash30 2014

[26] A Ortigosa R M Carro and J I Quiroga ldquoPredicting userpersonality by mining social interactions in Facebookrdquo Journalof Computer and System Sciences vol 80 no 1 pp 57ndash71 2014

[27] O P John L P Naumann and C J Soto ldquoParadigm shift tothe integrative Big Five trait taxonomy history measurementand conceptualissuesrdquo in Handbook of Personality Theory andResearch O P John R W Robins and L A Pervin Eds pp114ndash158 Guilford Press New York NY USA 3rd edition 2008

[28] G Park H A Schwartz J C Eichstaedt et al ldquoAutomaticpersonality assessment through social media languagerdquo Journalof Personality and Social Psychology vol 108 no 6 pp 934ndash9522015

[29] C J Soto O P John S D Gosling and J Potter ldquoAge differencesin personality traits from 10 to 65 big five domains and facets ina large cross-sectional samplerdquo Journal of Personality and SocialPsychology vol 100 no 2 pp 330ndash348 2011

[30] C J Soto and O P John ldquoTen facet scales for the BigFive Inventory convergence with NEO PI-R facets self-peeragreement and discriminant validityrdquo Journal of Research inPersonality vol 43 no 1 pp 84ndash90 2009

[31] M V Kosti R Feldt and L Angelis ldquoPersonality emotionalintelligence and work preferences in software engineering anempirical studyrdquo Information and Software Technology vol 56no 8 pp 973ndash990 2014

[32] H A Schwartz J C Eichstaedt M L Kern et al ldquoPersonalitygender and age in the language of social media the open-vocabulary approachrdquo PLoS ONE vol 8 no 9 Article IDe73791 2013

[33] G Farnadi S ZoghbiM FMoens andMD Cock ldquoRecognis-ing personality traits using facebook status updatesrdquo in Proceed-ings of the Workshop on Computational Personality RecognitionJuly 2013 httpcliccimecunitnitfabiowcpr13farnadi wcpr-13pdf

[34] G Tsoumakas and I Katakis ldquoMulti-label classification anoverviewrdquo International Journal of Data Warehousing and Min-ing vol 3 no 3 pp 1ndash13 2007

[35] M Kosinski D Stillwell and T Graepel ldquoPrivate traits andattributes are predictable from digital records of human behav-iorrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 110 no 15 pp 5802ndash5805 2013

[36] M Belkin and P Niyogi ldquoLaplacian eigenmaps for dimension-ality reduction and data representationrdquo Neural Computationvol 15 no 6 pp 1373ndash1396 2003

[37] R Socher J Bauer C D Manning and Y N Andrew ldquoParsingwith compositional vector grammarsrdquo in Proceedings of the 51stAnnual Meeting of the Association for Computational Linguistics(ACL rsquo13) pp 455ndash465 Sofia Bulgaria August 2013

[38] K Kazama M Imada and K Kashiwagi ldquoCharacteristics ofinformation diffusion in blogs in relation to information sourcetyperdquo Neurocomputing vol 76 no 1 pp 84ndash92 2012

[39] M B David Y N Andrew and I J Michael ldquoLatent Dirichletallocationrdquo Journal ofMachine Learning Research vol 3 no 4-5pp 993ndash1022 2003

[40] P T Costa andR RMcCraeRevisedNEOPersonality Inventory(NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Profes-sional Manual Psychological Assessment Resources 1992

[41] R L Atkinson C A Richard E S Edward J B Daryland S Nolen-Hoeksema Hilgardrsquos Introduction to Psychology

14 Mathematical Problems in Engineering

Harcourt College Publishers Orlando Fla USA 13th edition2000

[42] B Verhoeven W Daelemans and T D Smedt ldquoEnsemblemethods for personality recognitionrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 35ndash38Boston Mass USA July 2013

[43] F Alam A Stepanov and G Riccardi ldquoPersonality traitsrecognition on social networkmdashfacebookrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 6ndash9Boston Mass USA July 2013

[44] M T Tomlinson D Hinote and D B Bracewell ldquoPredictingconscientiousness through semantic analysis of facebook postsrdquoin Proceedings of the Workshop on Computational PersonalityRecognition pp 31ndash34 July 2013

[45] D M Endres and J E Schindelin ldquoA newmetric for probabilitydistributionsrdquo IEEETransactions on InformationTheory vol 49no 7 pp 1858ndash1860 2003

[46] S Kullback and R A Leibler ldquoOn information and sufficiencyrdquoThe Annals of Mathematical Statistics vol 22 no 1 pp 79ndash861951

[47] A Genkin D D Lewis and D Madigan ldquoLarge-scale Bayesianlogistic regression for text categorizationrdquo Technometrics vol49 no 3 pp 291ndash304 2007

[48] A K McCallum and K Nigam ldquoA comparison of event modelsfor naive bayes text classificationrdquo in Proceedings of the AAAI-98 Workshop on Learning for Text Categorization pp 137ndash142Madison Wis USA July 1998

[49] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article A Novel Adaptive Conditional Probability ...downloads.hindawi.com/journals/mpe/2015/472917.pdf · Research Article A Novel Adaptive Conditional Probability-Based

12 Mathematical Problems in Engineering

Table 7 Recall of different methods on Facebook dataset The bestperformance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 076 067 074 070 079 0732Verhoevenet al [42] SVM 079 072 068 072 087 0756

Farnadi etal [33]

SVMkNN NB 061 053 050 054 070 0576

Alam et al[43]

SVMBLR mNB 058 058 059 059 060 0588

Tomlinsonet al [44] LR NA NA NA NA NA NA

Table 8 1198651-measure of different methods on Facebook datasetThebest performance per personality trait appears boldfaced

Relatedworks Methods EXT NEU AGR CON OPN Mean

Our work CP 083 075 083 078 084 0806Verhoevenet al [42] SVM 079 070 067 072 086 0748

Farnadi etal [33]

SVMkNN NB 056 052 050 054 061 0546

Alam et al[43]

SVMBLR mNB 058 058 058 059 060 0586

Tomlinsonet al [44] LR NA NA NA NA NA 0630

when tested on the same Facebook dataset it can achievebetter results than our proposed method Furthermorewhen trained and tested on the same Facebook dataset ourmethod outperforms the methods in [33 43 44] which donot take weighted features dynamic updated thresholds ofpersonality traits and correlations between personality traitsinto consideration

5 Conclusion

In this paper we studied the problem of exploiting interde-pendencies between userrsquos personality traits for predictinguserrsquos personality traits First after analyzing importance offeatures we conducted experiments on Facebook datasetto demonstrate the existence of correlations between userrsquospersonality traits Bearing this in mind an unsupervisedframework adaptive conditional probability-based predict-ing model was then proposed to predict userrsquos Big Fivepersonality traits based on importance of features dynamicupdated thresholds of personality traits and prior knowledgeabout correlations between personality traits Furthermorewe compared our results with the ones achieved by othersin the Workshop on Computational Personality Recognition(Shared Task) on the same dataset In general the experimen-tal results demonstrated the effectiveness of our proposedframework

In future work we will speculate on what directions canbe undertaken to ameliorate its performance with respect

to time complexity so as to better apply it to a big dataenvironment as in Facebook monitoring Besides in orderto make our framework applicable to dynamic networksbetter we will explore combining time series analysis withpersonality traits predicting algorithm to capture dynamicevolution process of information and network structureFurthermore since social network features (described inSection 311) are specific for our dataset they may be difficult(or even impossible) to obtain in another dataset hencewe will conduct experiments on other datasets such as theTweeter corpus of PAN shared task (httppanwebisde)on author profiling where also personality is considered tofurther validate the effectiveness of the proposed frameworkbased on different features

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China under Grant no 61300148 the Scientificand Technological Break-Through Program of Jilin ProvinceunderGrant no 20130206051GX the Science andTechnologyDevelopment Program of Jilin Province under Grant no20130522112JH the Science Foundation for China Postdoc-tor under Grant no 2012M510879 and the Basic Scien-tific Research Foundation for the Interdisciplinary Researchand Innovation Project of Jilin University under Grant no201103129

References

[1] M Oussalah F Bhat K Challis and T Schnier ldquoA softwarearchitecture for Twitter collection search and geolocationservicesrdquo Knowledge-Based Systems vol 37 pp 105ndash120 2013

[2] R R McCrae and P T Costa ldquoValidation of the 5-factor modelof personality across instruments and observersrdquo Journal ofPersonality and Social Psychology vol 52 no 1 pp 81ndash90 1987

[3] G Saucier and L R Goldberg ldquoThe structure of personalityattributesrdquo in Personality and Work Reconsidering the Role ofPersonality in Organizations M R Barrick and A M RyanEds pp 1ndash29 Jossey-Bass 2003

[4] B Caci M Cardaci M E Tabacchi and F Scrima ldquoPersonalityvariables as predictors of facebook usagerdquoPsychological Reportsvol 114 no 2 pp 528ndash539 2014

[5] M Kosinski Y Bachrach P Kohli D Stillwell and T GraepelldquoManifestations of user personality in website choice andbehaviour on online social networksrdquoMachine Learning vol 95no 3 pp 357ndash380 2014

[6] M-G Cojocaru H Thille E Thommes D Nelson and SGreenhalgh ldquoSocial influence and dynamic demand for newproductsrdquoEnvironmentalModellingamp Software vol 50 pp 169ndash185 2013

[7] F Durupinar N Pelechano J Allbeck U Gudukbay andN Badler ldquoThe impact of the OCEAN personality modelon the perception of crowdsrdquo IEEE Computer Graphics andApplications vol 31 no 3 pp 22ndash31 2011

Mathematical Problems in Engineering 13

[8] X Feng ldquoStudy on customer personality characteristics andrelationship outcomes using SEM analysisrdquo in Proceedings ofthe International Conference on Mechatronics and InformationTechnology pp 841ndash844 2013

[9] J G Lee SMoon andK Salamatian ldquoModeling and predictingthe popularity of online contents with Cox proportional hazardregression modelrdquo Neurocomputing vol 76 no 1 pp 134ndash1452012

[10] J W Stoughton L F Thompson and A W Meade ldquoBigfive personality traits reflected in job applicantsrsquo social mediapostingsrdquo Cyberpsychology Behavior and Social Networkingvol 16 no 11 pp 800ndash805 2013

[11] D A Cobb-Clark and S Schurer ldquoThe stability of big-fivepersonality traitsrdquo Economics Letters vol 115 no 1 pp 11ndash152012

[12] W Mischel ldquoToward an integrative science of the personrdquoAnnual Review of Psychology vol 55 pp 1ndash22 2004

[13] S Argamon S Dhawle M Koppel and J Pennebaker ldquoLexicalpredictors 19 of personality typerdquo in Proceedings of the JointAnnual Meeting of the Interface and the Classification Society ofNorth America 2005

[14] F Mairesse M AWalker M R Mehl and R K Moore ldquoUsinglinguistic cues for the automatic recognition of personality inconversation and textrdquo Journal of Artificial Intelligence Researchvol 30 pp 457ndash500 2007

[15] K Moore and J C McElroy ldquoThe influence of personalityon Facebook usage wall postings and regretrdquo Computers inHuman Behavior vol 28 no 1 pp 267ndash274 2012

[16] J W Pennebaker and L A King ldquoLinguistic styles languageuse as an individual differencerdquo Journal of Personality and SocialPsychology vol 77 no 6 pp 1296ndash1312 1999

[17] J Han and M Kamber Data Mining Concepts and TechniquesThe Morgan Kaufmann Series in Data Management Systemsedited by J Gray Morgan Kaufmann Boston Mass USA 3rdedition 2011

[18] J Golbeck C Robles and K Turner ldquoPredicting personalitywith social mediardquo in Proceedings of the 29th Annual CHIConference on Human Factors in Computing Systems (CHI rsquo11)pp 253ndash262 ACM May 2011

[19] J RQuinlan ldquoLearningwith continuous classesrdquo inProceedingsof the 5thAustralian Joint Conference onArtificial Intelligence (Airsquo92) pp 343ndash348 Hobart Australia November 1992

[20] S Roweis and Z Ghahramani ldquoA unifying review of lineargaussian modelsrdquo Neural Computation vol 11 no 2 pp 305ndash345 1999

[21] S Bai T Zhu and L Cheng ldquoBig-five personality pre-diction based on user behaviors at social network sitesrdquohttparxivorgpdf12044809v1pdf

[22] J Oberlander and S Nowson ldquoWhose thumb is it anywayclassifying author personality from weblog textrdquo in Proceedingsof the 44th Annual Meeting of the Association for ComputationalLinguistics pp 627ndash634 2006

[23] T Nguyen D Q Phung B Adams and S Venkatesh ldquoTowardsdiscovery of influence and personality traits through social linkpredictionrdquo in Proceedings of the International Conference onWeblogs and Social Media pp 566ndash569 Barcelona Spain July2011

[24] S T Bai B B Hao A Li S Yuan R Gao and T S ZhuldquoPredicting big five personality traits of microblog usersrdquo inProceedings of the 12th IEEEWICACM International JointConferences on Web Intelligence and Intelligent Agent Technolog(WI rsquo13) pp 501ndash508 November 2013

[25] R Sun and N Wilson ldquoA model of personality should be acognitive architecture itselfrdquo Cognitive Systems Research vol29-30 no 1 pp 1ndash30 2014

[26] A Ortigosa R M Carro and J I Quiroga ldquoPredicting userpersonality by mining social interactions in Facebookrdquo Journalof Computer and System Sciences vol 80 no 1 pp 57ndash71 2014

[27] O P John L P Naumann and C J Soto ldquoParadigm shift tothe integrative Big Five trait taxonomy history measurementand conceptualissuesrdquo in Handbook of Personality Theory andResearch O P John R W Robins and L A Pervin Eds pp114ndash158 Guilford Press New York NY USA 3rd edition 2008

[28] G Park H A Schwartz J C Eichstaedt et al ldquoAutomaticpersonality assessment through social media languagerdquo Journalof Personality and Social Psychology vol 108 no 6 pp 934ndash9522015

[29] C J Soto O P John S D Gosling and J Potter ldquoAge differencesin personality traits from 10 to 65 big five domains and facets ina large cross-sectional samplerdquo Journal of Personality and SocialPsychology vol 100 no 2 pp 330ndash348 2011

[30] C J Soto and O P John ldquoTen facet scales for the BigFive Inventory convergence with NEO PI-R facets self-peeragreement and discriminant validityrdquo Journal of Research inPersonality vol 43 no 1 pp 84ndash90 2009

[31] M V Kosti R Feldt and L Angelis ldquoPersonality emotionalintelligence and work preferences in software engineering anempirical studyrdquo Information and Software Technology vol 56no 8 pp 973ndash990 2014

[32] H A Schwartz J C Eichstaedt M L Kern et al ldquoPersonalitygender and age in the language of social media the open-vocabulary approachrdquo PLoS ONE vol 8 no 9 Article IDe73791 2013

[33] G Farnadi S ZoghbiM FMoens andMD Cock ldquoRecognis-ing personality traits using facebook status updatesrdquo in Proceed-ings of the Workshop on Computational Personality RecognitionJuly 2013 httpcliccimecunitnitfabiowcpr13farnadi wcpr-13pdf

[34] G Tsoumakas and I Katakis ldquoMulti-label classification anoverviewrdquo International Journal of Data Warehousing and Min-ing vol 3 no 3 pp 1ndash13 2007

[35] M Kosinski D Stillwell and T Graepel ldquoPrivate traits andattributes are predictable from digital records of human behav-iorrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 110 no 15 pp 5802ndash5805 2013

[36] M Belkin and P Niyogi ldquoLaplacian eigenmaps for dimension-ality reduction and data representationrdquo Neural Computationvol 15 no 6 pp 1373ndash1396 2003

[37] R Socher J Bauer C D Manning and Y N Andrew ldquoParsingwith compositional vector grammarsrdquo in Proceedings of the 51stAnnual Meeting of the Association for Computational Linguistics(ACL rsquo13) pp 455ndash465 Sofia Bulgaria August 2013

[38] K Kazama M Imada and K Kashiwagi ldquoCharacteristics ofinformation diffusion in blogs in relation to information sourcetyperdquo Neurocomputing vol 76 no 1 pp 84ndash92 2012

[39] M B David Y N Andrew and I J Michael ldquoLatent Dirichletallocationrdquo Journal ofMachine Learning Research vol 3 no 4-5pp 993ndash1022 2003

[40] P T Costa andR RMcCraeRevisedNEOPersonality Inventory(NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Profes-sional Manual Psychological Assessment Resources 1992

[41] R L Atkinson C A Richard E S Edward J B Daryland S Nolen-Hoeksema Hilgardrsquos Introduction to Psychology

14 Mathematical Problems in Engineering

Harcourt College Publishers Orlando Fla USA 13th edition2000

[42] B Verhoeven W Daelemans and T D Smedt ldquoEnsemblemethods for personality recognitionrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 35ndash38Boston Mass USA July 2013

[43] F Alam A Stepanov and G Riccardi ldquoPersonality traitsrecognition on social networkmdashfacebookrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 6ndash9Boston Mass USA July 2013

[44] M T Tomlinson D Hinote and D B Bracewell ldquoPredictingconscientiousness through semantic analysis of facebook postsrdquoin Proceedings of the Workshop on Computational PersonalityRecognition pp 31ndash34 July 2013

[45] D M Endres and J E Schindelin ldquoA newmetric for probabilitydistributionsrdquo IEEETransactions on InformationTheory vol 49no 7 pp 1858ndash1860 2003

[46] S Kullback and R A Leibler ldquoOn information and sufficiencyrdquoThe Annals of Mathematical Statistics vol 22 no 1 pp 79ndash861951

[47] A Genkin D D Lewis and D Madigan ldquoLarge-scale Bayesianlogistic regression for text categorizationrdquo Technometrics vol49 no 3 pp 291ndash304 2007

[48] A K McCallum and K Nigam ldquoA comparison of event modelsfor naive bayes text classificationrdquo in Proceedings of the AAAI-98 Workshop on Learning for Text Categorization pp 137ndash142Madison Wis USA July 1998

[49] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Research Article A Novel Adaptive Conditional Probability ...downloads.hindawi.com/journals/mpe/2015/472917.pdf · Research Article A Novel Adaptive Conditional Probability-Based

Mathematical Problems in Engineering 13

[8] X Feng ldquoStudy on customer personality characteristics andrelationship outcomes using SEM analysisrdquo in Proceedings ofthe International Conference on Mechatronics and InformationTechnology pp 841ndash844 2013

[9] J G Lee SMoon andK Salamatian ldquoModeling and predictingthe popularity of online contents with Cox proportional hazardregression modelrdquo Neurocomputing vol 76 no 1 pp 134ndash1452012

[10] J W Stoughton L F Thompson and A W Meade ldquoBigfive personality traits reflected in job applicantsrsquo social mediapostingsrdquo Cyberpsychology Behavior and Social Networkingvol 16 no 11 pp 800ndash805 2013

[11] D A Cobb-Clark and S Schurer ldquoThe stability of big-fivepersonality traitsrdquo Economics Letters vol 115 no 1 pp 11ndash152012

[12] W Mischel ldquoToward an integrative science of the personrdquoAnnual Review of Psychology vol 55 pp 1ndash22 2004

[13] S Argamon S Dhawle M Koppel and J Pennebaker ldquoLexicalpredictors 19 of personality typerdquo in Proceedings of the JointAnnual Meeting of the Interface and the Classification Society ofNorth America 2005

[14] F Mairesse M AWalker M R Mehl and R K Moore ldquoUsinglinguistic cues for the automatic recognition of personality inconversation and textrdquo Journal of Artificial Intelligence Researchvol 30 pp 457ndash500 2007

[15] K Moore and J C McElroy ldquoThe influence of personalityon Facebook usage wall postings and regretrdquo Computers inHuman Behavior vol 28 no 1 pp 267ndash274 2012

[16] J W Pennebaker and L A King ldquoLinguistic styles languageuse as an individual differencerdquo Journal of Personality and SocialPsychology vol 77 no 6 pp 1296ndash1312 1999

[17] J Han and M Kamber Data Mining Concepts and TechniquesThe Morgan Kaufmann Series in Data Management Systemsedited by J Gray Morgan Kaufmann Boston Mass USA 3rdedition 2011

[18] J Golbeck C Robles and K Turner ldquoPredicting personalitywith social mediardquo in Proceedings of the 29th Annual CHIConference on Human Factors in Computing Systems (CHI rsquo11)pp 253ndash262 ACM May 2011

[19] J RQuinlan ldquoLearningwith continuous classesrdquo inProceedingsof the 5thAustralian Joint Conference onArtificial Intelligence (Airsquo92) pp 343ndash348 Hobart Australia November 1992

[20] S Roweis and Z Ghahramani ldquoA unifying review of lineargaussian modelsrdquo Neural Computation vol 11 no 2 pp 305ndash345 1999

[21] S Bai T Zhu and L Cheng ldquoBig-five personality pre-diction based on user behaviors at social network sitesrdquohttparxivorgpdf12044809v1pdf

[22] J Oberlander and S Nowson ldquoWhose thumb is it anywayclassifying author personality from weblog textrdquo in Proceedingsof the 44th Annual Meeting of the Association for ComputationalLinguistics pp 627ndash634 2006

[23] T Nguyen D Q Phung B Adams and S Venkatesh ldquoTowardsdiscovery of influence and personality traits through social linkpredictionrdquo in Proceedings of the International Conference onWeblogs and Social Media pp 566ndash569 Barcelona Spain July2011

[24] S T Bai B B Hao A Li S Yuan R Gao and T S ZhuldquoPredicting big five personality traits of microblog usersrdquo inProceedings of the 12th IEEEWICACM International JointConferences on Web Intelligence and Intelligent Agent Technolog(WI rsquo13) pp 501ndash508 November 2013

[25] R Sun and N Wilson ldquoA model of personality should be acognitive architecture itselfrdquo Cognitive Systems Research vol29-30 no 1 pp 1ndash30 2014

[26] A Ortigosa R M Carro and J I Quiroga ldquoPredicting userpersonality by mining social interactions in Facebookrdquo Journalof Computer and System Sciences vol 80 no 1 pp 57ndash71 2014

[27] O P John L P Naumann and C J Soto ldquoParadigm shift tothe integrative Big Five trait taxonomy history measurementand conceptualissuesrdquo in Handbook of Personality Theory andResearch O P John R W Robins and L A Pervin Eds pp114ndash158 Guilford Press New York NY USA 3rd edition 2008

[28] G Park H A Schwartz J C Eichstaedt et al ldquoAutomaticpersonality assessment through social media languagerdquo Journalof Personality and Social Psychology vol 108 no 6 pp 934ndash9522015

[29] C J Soto O P John S D Gosling and J Potter ldquoAge differencesin personality traits from 10 to 65 big five domains and facets ina large cross-sectional samplerdquo Journal of Personality and SocialPsychology vol 100 no 2 pp 330ndash348 2011

[30] C J Soto and O P John ldquoTen facet scales for the BigFive Inventory convergence with NEO PI-R facets self-peeragreement and discriminant validityrdquo Journal of Research inPersonality vol 43 no 1 pp 84ndash90 2009

[31] M V Kosti R Feldt and L Angelis ldquoPersonality emotionalintelligence and work preferences in software engineering anempirical studyrdquo Information and Software Technology vol 56no 8 pp 973ndash990 2014

[32] H A Schwartz J C Eichstaedt M L Kern et al ldquoPersonalitygender and age in the language of social media the open-vocabulary approachrdquo PLoS ONE vol 8 no 9 Article IDe73791 2013

[33] G Farnadi S ZoghbiM FMoens andMD Cock ldquoRecognis-ing personality traits using facebook status updatesrdquo in Proceed-ings of the Workshop on Computational Personality RecognitionJuly 2013 httpcliccimecunitnitfabiowcpr13farnadi wcpr-13pdf

[34] G Tsoumakas and I Katakis ldquoMulti-label classification anoverviewrdquo International Journal of Data Warehousing and Min-ing vol 3 no 3 pp 1ndash13 2007

[35] M Kosinski D Stillwell and T Graepel ldquoPrivate traits andattributes are predictable from digital records of human behav-iorrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 110 no 15 pp 5802ndash5805 2013

[36] M Belkin and P Niyogi ldquoLaplacian eigenmaps for dimension-ality reduction and data representationrdquo Neural Computationvol 15 no 6 pp 1373ndash1396 2003

[37] R Socher J Bauer C D Manning and Y N Andrew ldquoParsingwith compositional vector grammarsrdquo in Proceedings of the 51stAnnual Meeting of the Association for Computational Linguistics(ACL rsquo13) pp 455ndash465 Sofia Bulgaria August 2013

[38] K Kazama M Imada and K Kashiwagi ldquoCharacteristics ofinformation diffusion in blogs in relation to information sourcetyperdquo Neurocomputing vol 76 no 1 pp 84ndash92 2012

[39] M B David Y N Andrew and I J Michael ldquoLatent Dirichletallocationrdquo Journal ofMachine Learning Research vol 3 no 4-5pp 993ndash1022 2003

[40] P T Costa andR RMcCraeRevisedNEOPersonality Inventory(NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Profes-sional Manual Psychological Assessment Resources 1992

[41] R L Atkinson C A Richard E S Edward J B Daryland S Nolen-Hoeksema Hilgardrsquos Introduction to Psychology

14 Mathematical Problems in Engineering

Harcourt College Publishers Orlando Fla USA 13th edition2000

[42] B Verhoeven W Daelemans and T D Smedt ldquoEnsemblemethods for personality recognitionrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 35ndash38Boston Mass USA July 2013

[43] F Alam A Stepanov and G Riccardi ldquoPersonality traitsrecognition on social networkmdashfacebookrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 6ndash9Boston Mass USA July 2013

[44] M T Tomlinson D Hinote and D B Bracewell ldquoPredictingconscientiousness through semantic analysis of facebook postsrdquoin Proceedings of the Workshop on Computational PersonalityRecognition pp 31ndash34 July 2013

[45] D M Endres and J E Schindelin ldquoA newmetric for probabilitydistributionsrdquo IEEETransactions on InformationTheory vol 49no 7 pp 1858ndash1860 2003

[46] S Kullback and R A Leibler ldquoOn information and sufficiencyrdquoThe Annals of Mathematical Statistics vol 22 no 1 pp 79ndash861951

[47] A Genkin D D Lewis and D Madigan ldquoLarge-scale Bayesianlogistic regression for text categorizationrdquo Technometrics vol49 no 3 pp 291ndash304 2007

[48] A K McCallum and K Nigam ldquoA comparison of event modelsfor naive bayes text classificationrdquo in Proceedings of the AAAI-98 Workshop on Learning for Text Categorization pp 137ndash142Madison Wis USA July 1998

[49] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 14: Research Article A Novel Adaptive Conditional Probability ...downloads.hindawi.com/journals/mpe/2015/472917.pdf · Research Article A Novel Adaptive Conditional Probability-Based

14 Mathematical Problems in Engineering

Harcourt College Publishers Orlando Fla USA 13th edition2000

[42] B Verhoeven W Daelemans and T D Smedt ldquoEnsemblemethods for personality recognitionrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 35ndash38Boston Mass USA July 2013

[43] F Alam A Stepanov and G Riccardi ldquoPersonality traitsrecognition on social networkmdashfacebookrdquo in Proceedings of theWorkshop on Computational Personality Recognition pp 6ndash9Boston Mass USA July 2013

[44] M T Tomlinson D Hinote and D B Bracewell ldquoPredictingconscientiousness through semantic analysis of facebook postsrdquoin Proceedings of the Workshop on Computational PersonalityRecognition pp 31ndash34 July 2013

[45] D M Endres and J E Schindelin ldquoA newmetric for probabilitydistributionsrdquo IEEETransactions on InformationTheory vol 49no 7 pp 1858ndash1860 2003

[46] S Kullback and R A Leibler ldquoOn information and sufficiencyrdquoThe Annals of Mathematical Statistics vol 22 no 1 pp 79ndash861951

[47] A Genkin D D Lewis and D Madigan ldquoLarge-scale Bayesianlogistic regression for text categorizationrdquo Technometrics vol49 no 3 pp 291ndash304 2007

[48] A K McCallum and K Nigam ldquoA comparison of event modelsfor naive bayes text classificationrdquo in Proceedings of the AAAI-98 Workshop on Learning for Text Categorization pp 137ndash142Madison Wis USA July 1998

[49] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 15: Research Article A Novel Adaptive Conditional Probability ...downloads.hindawi.com/journals/mpe/2015/472917.pdf · Research Article A Novel Adaptive Conditional Probability-Based

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of