data mining (7)

Upload: jagadeshwar-reddy

Post on 09-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Data Mining (7)

    1/17

    Data MiningChris Nelson

    CS 157 A

    Fall 2007

  • 8/8/2019 Data Mining (7)

    2/17

    Data Mining New buzzword, old idea.

    Inferring new information from already

    collected data.

    Traditionally job of Data Analysts

    Computers have changed this.

    Far more efficient to comb through data usinga machine than eyeballing statistical data.

  • 8/8/2019 Data Mining (7)

    3/17

  • 8/8/2019 Data Mining (7)

    4/17

    Data Mining vs. Data Analysis In terms of software and the marketing thereof

    Data Mining != Data Analysis

    Data Mining implies software uses some intelligence

    over simple grouping and partitioning of data toinfer new information.

    Data Analysis is more in line with standard statisticalsoftware (ie: web stats). These usually presentinformation about subsets and relations within therecorded data set (ie: browser/search engine usage,average visit time, etc. )

  • 8/8/2019 Data Mining (7)

    5/17

    Data Mining Subtypes Data Dredging

    The process of scanning a data set for relations and thencoming up with a hypothesis for existence of those relations.

    MetaDataData that describes other data. Can describe an individual

    element, or a collection of elements.Wikipedia example: In a library, where the data is thecontent of the titles stocked, metadata about a title wouldtypically include a description of the content, the author, the

    publication date and the physical location Applications for Data Dredging in business include Market

    and Risk Analysis, as well as trading strategies.

    Applications for Science include disaster prediction.

  • 8/8/2019 Data Mining (7)

    6/17

    Propositional vs. Relational Data Old data mining methods relied on Propositional Data, or data

    that was related to a single, central element, that could berepresented in a vector format. (ie: the purchasing history of a

    single user. Amazon uses such vectors in its related itemsuggestions [a multidimensional dot product])

    Current, advanced data mining methods rely on RelationalData, or data that can be stored and modeled easily throughuse of relational databases. An example of this would be data

    used to represent interpersonal relations. Relational Data is more interesting than Propositional data to

    miners in the sense that an entity, and all the entities to whichit is related, factor into the data inference process.

  • 8/8/2019 Data Mining (7)

    7/17

    Key Component of Data Mining WhetherKnowledge Discovery orKnowledge

    Prediction, data mining takes information that wasonce quite difficult to detect and presents it in aneasily understandable format (ie: graphical orstatistical)

    Data mining Techniques involve sophisticatedalgorithms, including Decision Tree Classifications,

    Association detection, and Clustering. Since Data mining is not on test, I will keep things

    superficial.

  • 8/8/2019 Data Mining (7)

    8/17

    Uses of Data Mining AI/Machine Learning

    Combinatorial/Game Data MiningGood for analyzing winning strategies to games, and thus

    developing intelligent AI opponents. (ie: Chess) Business Strategies

    Market Basket AnalysisIdentify customer demographics, preferences, and purchasing

    patterns.

    RiskAnalysisProduct Defect AnalysisAnalyze product defect rates for given plants and predict

    possible complications (read: lawsuits) down the line.

  • 8/8/2019 Data Mining (7)

    9/17

    Uses of Data Mining (Continued) User Behavior Validation

    Fraud Detection

    In the realm of cell phonesComparing phone activity to calling records.Can help detect calls made on cloned phones.

    Similarly, with credit cards, comparingpurchases with historical purchases. Candetect activity with stolen cards.

  • 8/8/2019 Data Mining (7)

    10/17

    Uses of Data Mining (Continued) Health and Science

    Protein FoldingPredicting protein interactions and functionality within

    biological cells. Applications of this research includedetermining causes and possible cures for Alzheimers,Parkinson's, and some cancers (caused by protein "misfolds")

    Extra-Terrestrial IntelligenceScanning Satellite receptions for possible transmissions from

    other planets. For more information see Stanfords Folding@home and

    SETI@home projects. Both involve participation in a widelydistributed computer application.

  • 8/8/2019 Data Mining (7)

    11/17

    Sources of Data for Mining Databases (most obvious)

    Text Documents

    Computer Simulations

    Social Networks

  • 8/8/2019 Data Mining (7)

    12/17

    Privacy Concerns Mining of public and government databases is done,

    though people have, and continue to raise concerns.

    Wiki quote:"data mining gives information that would not beavailable otherwise. It must be properly interpretedto be useful. When the data collected involves

    individual people, there are many questionsconcerning privacy, legality, and ethics."

  • 8/8/2019 Data Mining (7)

    13/17

    Prevalence of Data Mining Your data is already being mined, whether you like it or not.

    Many web services require that you allow access to your information [fordata mining] in order to use the service.

    Google mines email data in Gmail accounts to present account ownerswith ads.

    Facebook requires users to allow access to info from non-Facebook pages.Facebook privacy policy:"We may use information about you that we collect from other sources,including but not limited to newspapers and Internet sources such as

    blogs, instant messaging services and other users of Facebook, tosupplement your profile.

    This allows access to your blog RSS feed (rather innocuous), as well asinformation obtained through partner sites (worthy of concern).

  • 8/8/2019 Data Mining (7)

    14/17

    Data Mining Controversies Latest one: Facebook's Beacon Advertising program

    (Just popped on Slashdot within the last week)

    What Beacon does:

    when you engage in consumer activity at a[Facebook] partner website, such as Amazon, eBay,or the New York Times, not only will Facebookrecord that activity, but your Facebook connectionswill also be informed of your purchases or actions.

    [taken fromhttp://trickytrickywhiteboy.blogspot.com/2007/11/beware-of-facebooks-beacon.html]

  • 8/8/2019 Data Mining (7)

    15/17

    Controversies continued Implications: "Thus where Facebook used to be collecting data only

    within the confines of its own website, it will now extend that ability toharvest data across other websites that it partners with. Some of thecompanies that have signed on to participate on the advertising sideinclude Coca-Cola, Sony, Verizon, Comcast, Ebay and the CBC. Theinitial list of 44 partner websites participating on the data collection sideinclude the New York Times, Blockbuster, Amazon, eBay, LiveJournal,and Epicurious.[Remember the privacy policy on the previous slide]

    Verdict is still out. This may violate an old (100+ years) New York law

    prohibiting advertising using endorsements without the endorseesconsent.

    Facebook currently offers users no way to opt out of Beacon (once it hasbeen activated ?). Users can close the accounts, but account data is neverdeleted.

  • 8/8/2019 Data Mining (7)

    16/17

    Bottom Line Data obtained through Data Mining is

    incredibly valuable

    Companies are understandably reluctant togive up data they have obtained.

    Expect to see prevalence of Data Mining and

    (possibly subversive) methods increase inyears to come.

  • 8/8/2019 Data Mining (7)

    17/17

    Recommended Resources and

    Works Consulted Wikipedia Data Mining entry

    http://en.wikipedia.org/wiki/Data_mining

    "Privacy is Dead - Get Over It: Revisited"Steve Rambam's Hope Number Six lecturehttp://www.hopenumbersix.net/speakers.html#pid2

    Facebook's Faux Pashttp://www.newsweek.com/id/69275

    Beware of Facebooks Beaconhttp://trickytrickywhiteboy.blogspot.com/2007/11/beware-of-facebooks-beacon.html

    Facebook Data Mining guidehttp://saunderslog.com/2007/11/25/facebook-market-research-secrets/

    Data Mining in Social Networkshttp://kdl.cs.umass.edu/papers/jensen-neville-nas2002.pdf