ubiquitous data mining dr. susanna pirttikangas intelligent systems group (isg) dept. electrical and...
TRANSCRIPT
![Page 1: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/1.jpg)
Ubiquitous Data Mining
Dr. Susanna Pirttikangas
Intelligent Systems Group (ISG)Dept. Electrical and Information Engineering
University of OuluFinland
![Page 2: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/2.jpg)
Outline
• Data mining, Ubiquitous computing – Ubiquitous Data Mining
• Test Planning in UDM• Online Data Streams• Pattern Recognition• Visualization• Tools• Conclusions and Future directions
![Page 3: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/3.jpg)
Data Mining
Scianta Intelligence: “Data Mining, also called KnowledgeDiscovery, is a general term for a variety of interlocking technologies that, used together, find, isolate, and quantify patterns hidden in large and often disparate collections of data. As a general knowledge extraction process, its primary goal is the discovery of nontrivial and potentially valuable hidden in local files, databases, and in repositories scattered across distributed networks.“
![Page 4: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/4.jpg)
Ubiquitous Computing
![Page 5: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/5.jpg)
Ubiquitous Computing
PeoplePlacesNetworksServicesOther machinesetc.
Improving human machine interaction,providing right information in right situation
![Page 6: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/6.jpg)
From Henry Tirri’s Presentation at PerComm2007
• What sort of raw (context) data management problem are we facing at Nokia ?– A multidimensional (2-30) vector of real values
• Frequency 0.5s-1 day• Typically ”always-on”
– A 1-4M pixel image• Frequency 10 min – week(s)• Very irregular, high intensity bursts (many images within minutes)
– A 100K-1M sound file• Frequency 1 min – days• Irregular; streaming
– Naturally many application domains require a mixture of these
10K phones – vector every 2 min results in 2.7 billion vectors/year
200M phones – vector every 60 min results in about 10^12 vectors/year
![Page 7: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/7.jpg)
Association Rule Algorithm: AprioriH. Mannila et al
Cigarettes
Diapers Beer Noodels Juice
T1 1 0 1 1 0
T2 0 1 1 0 1
T3 1 1 1 0 1
T4 0 0 1 1 0
T5 1 0 0 0 0
T6 1 0 0 0 1
”A customer who buys beer and sausages will also buy diapers with a probability of 0.85.”Whenever a transaction T contains X, then T probably also contains Y.
||/|}|{|)( DTXDTXS
)(/)()(
)()(
XSYXSYXC
YXSYXS
![Page 8: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/8.jpg)
Time Location1LivingRoom
Location1Office
1 1 0
2 1 0
3 1 0
4 1 0
From transactions to continuous flow of data
Locationing system?
Walking
TV on RemoteUsed
1
1 1
1 1 x
Activity Rec
Artefact Usage
![Page 9: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/9.jpg)
Nakamura et al: Mana 2007 (SWDMNSS)
Real World Oriented Application
Query Processing Database
Recognition
Syncronous Control
Sensor Cycle Set
Sensors
Mana
![Page 10: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/10.jpg)
Ubiquitous Data Mining (1/2)
• Performing analysis of data in mobile, embedded and ubiquitous devices.– Communication; network characteristics– Computation; intensive– Changes over time– Archiving– Energy consumption of mobile devices or sensors– Memory requirements– Result accuracy, data loss– Transferring and presenting results for the user– Security; sharing, privacy
![Page 11: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/11.jpg)
Test Planning in UDM
• User scenarios – What do we do with all the devices?– What devices do we utilize?
• Sampling frequency– The equipment set restrictions
• What to collect?• How much to collect?
– Pattern recognition
![Page 12: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/12.jpg)
Online Data Streams: Segmentation Problem
Clear starting and ending point for an event
![Page 13: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/13.jpg)
Thresholding + SSMM• OFFLINE:First a
piecewise linear approximation of an example footstep pattern is constructed
• ONLINE: When a sudden increase in the energy of the EMFi-signal is detected the pattern matching begins
• A Viterbi-like algorithm is used to detect the occurrences of patterns similar (or similar enough) to the created footstep model
![Page 14: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/14.jpg)
(Body) Sensor Network,Activity Recognition and Artefact User
Identification
![Page 15: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/15.jpg)
Pattern recognition
Theodoridis and Koutroumbas (1999):”Pattern recognition is the scientific discipline whose goal is the classification of objects into a number of categories or classes. Depending on the application, these objects can be images or signal waveforms or any type of measurements that need to be classified.”
input sensing segmentationfeature
extraction
classificationpost
processingdecision
![Page 16: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/16.jpg)
Data collection
• Collect in a natural environment?– Requires the direct observation by the researchers, – Is expensive and impossible for larger populations. – The diaries will include errors– The testees need to report their activities– The testees will forget to write activities down
• MIT experience sampling method : requires interruptions– Some activities do not occur on a daily basis.
• Ask the testees to do the activities• Semi-naturalistic data collection Intille et al, MIT (2004)
– The activities are disguised as goals in an obstacle course to minimize the testees awareness of data collection.
![Page 17: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/17.jpg)
Data Collection Tools
• The testee can determine – when to collect and – where to collect
• The testee can detect if something went wrong (connection lost)
• No need to carry a mobile device in the hand
• Sound alerts for failure
![Page 18: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/18.jpg)
Activity recognition• clean whiteboard• read a newspaper• stand still• sit and relax• sit and watch TV• drink• brush teeth• lie down• vacuum clean• type• walk• climb stairs• descend stairs• elevator up• elevator down• run• cycle
![Page 19: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/19.jpg)
Feature Extraction and Selection
• Know what you are dealing with• Between classes
– What are the discriminative attributes for different classes– What are the common attributes for the same class
• With many features: ``curse of dimensionality'' • If too few features -> not enough information to
describe the phenomena
• If a very complex situation, calculate many features– Feature selection
• Subset selection– branch-and-bound – forward search – backward search
• Feasible to utilize a simple and light algorithm (kNN)
![Page 20: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/20.jpg)
Location Data, Visualization
<logentry> <header> <date>30-09-2003T14:29:44</date> <module> <name></name> <version></version> </module> <session> <id>216</id> <username>seppo</username> </session> </header> <body> <userAttributeChangeEvent> <location> <longitude>25.468917078116988</longitude> <latitude>65.0110523987453</latitude> <altitude>0.0</altitude> <floor>0</floor> </location> </userAttributeChangeEvent> </body></logentry>
![Page 21: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/21.jpg)
Rotuaari: Location data• Following data was collected from the 1st field test
– 28.8-30.9.2003, ~200 users, log file’s size 14.7 MB (763367 lines)– 18 shops created mobile ads
<logentry> <header> <date>30-09-2003T14:29:44</date> <module> <name></name> <version></version> </module> <session> <id>216</id> <username>seppo</username> </session> </header> <body> <userAttributeChangeEvent> <location> <longitude>25.468917078116988</longitude> <latitude>65.0110523987453</latitude> <altitude>0.0</altitude> <floor>0</floor> </location> </userAttributeChangeEvent> </body></logentry>
…<logentry> <header> <date>30-09-2003T14:22:32</date> <module> <name></name> <version></version> </module> <session> <id>216</id> <username>seppo</username> </session> </header> <body> <userAttributeChangeEvent> <flyer_received> 1061904953746 </flyer_received> </userAttributeChangeEvent> </body></logentry>
…
![Page 22: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/22.jpg)
Phases of Data Visualization
Raw Data
Loaded Data
Load Subset
Loaded Data
Active OperationActive Data
Execute
Show in UI
Bound
![Page 23: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/23.jpg)
Number of location measurements inside a cell is presented by a color
• 3077 measurements made inside the most crowded cell
• User studies the range [1, 100] : 100 measurements gives the maximum color (red)
![Page 24: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/24.jpg)
Examples for processing 3D-acceleration
![Page 25: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/25.jpg)
Distinguishing a Robot from a Human, User Identification (1/4)
![Page 26: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/26.jpg)
Distinguishing a Robot from a Human, User Identification (2/4)
• Construct templates for different actors in the environment
Human Robot
s1 s2 s3 s4 s5
• Pattern matching (segmentation) using piecewise linear model and SSMM method
![Page 27: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/27.jpg)
Distinguishin a Robot from a Human, User Identification (3/4)
• Decide which actor is moving in the environment
TrainedClassifier
Robot
Human?
• If human, perform user identification
![Page 28: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/28.jpg)
User Identification (4/4)
• Calculate the distiguhishing features
• Identify
![Page 29: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/29.jpg)
After Finding the Interesting Information
• Choose the best model– evaluation, train and test
• Representation of Information ?• Personalize
– user sets all the preference, user is shown the updated context and is allowed to choose the actions or the application actively changes its functionality based on context
– Predict• Implement• Issues
– Confidence of the recognition– Visualization of the situation– Let the user teach the device or the environment
![Page 30: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/30.jpg)
Data Refinement for Data Reserves
• Novel methodology to solve signal synchronization, fusion and feature selection/dimensionality reduction and preprocessing online data streams (available data defined in the introduction).
• Common denominators for different situations in the data preprocessing pipes, to enable the reusage of software and algorithms.
• Error models for sensory equipment to enable quick feedback for/from the data produces or device manufacturers.
• Refined data for the data reserves.
![Page 31: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/31.jpg)
Future Directions
• Data streams– Smart Archiving, compressive sensing– Online segmentation– Online algorithms– Adaptive models
• Reliability– Plan carefully (placement of sensors, sampling frequency and
resolution, calibration, method selection)– Introduce the error
• In system level– Fast prototyping (Davies, Pervasive Computing)– Develop for critical situations (war zones, refugee camps), utilize
expert knowledge– Share the code
• Interdisciplinary research– linguistics, sociology, arts, etc.
![Page 32: Ubiquitous Data Mining Dr. Susanna Pirttikangas Intelligent Systems Group (ISG) Dept. Electrical and Information Engineering University of Oulu Finland](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bff81a28abf838cbf561/html5/thumbnails/32.jpg)
Tools• Statistical Data Mining Tutorials
– Andrew Moore, Carnegie Mellon, http://www.cs.cmu.edu/~awm/• Matlab
– Filtering, data preprocessing– Neural Network Toolbox– Bayes Net Toolbox– Hidden Markov Model Toolbox
• WEKA• MIT’s LNKnet
– neural network, statistical, and machine learning classification, clustering, and feature selection algorithms
• The Hidden Markov Model Toolkit (HTK) , Cambridge University • B-Course, HIIT, Helsinki, http://b-course.cs.helsinki.fi/• SPSS, SAS
– statistical analysis– classification trees
• Clementine• CommonGIS