Download - Lecture 16 - Cornell University
![Page 1: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/1.jpg)
gamedesigninitiativeat cornell university
the
Data Collection
Lecture 16
![Page 2: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/2.jpg)
gamedesigninitiativeat cornell university
the
The Rise of Big Data
� Big data is changing game design � Can gather data form a huge number of players � Can use that data to inform future content
� What can we do with all that data? � What types of questions can we answer? � How does it affect our business model?
� How do we collect all of this data? � What are the technical challenges? � What are the legal/ethical challenges?
Data Collection 2
![Page 3: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/3.jpg)
gamedesigninitiativeat cornell university
the
The Rise of Big Data
� Big data is changing game design � Can gather data form a huge number of players � Can use that data to inform future content
� What can we do with all that data? � What types of questions can we answer? � How does it affect our business model?
� How do we collect all of this data? � What are the technical challenges? � What are the legal/ethical challenges?
Data Collection 3
![Page 4: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/4.jpg)
gamedesigninitiativeat cornell university
the
Data Collection 4
Logging Game Data
Query 2 Log
Data Store
![Page 5: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/5.jpg)
gamedesigninitiativeat cornell university
the
Issues with Data Collection
� You have to record the data � Capture data from the game state � Do it so it does not hurt performance
� You have to store the data somewhere � Lots of players means a lot of data � Use database or Google-style storage solution
� You have to move the data to the datastore � Requires an internet connection (while playing?)
Data Collection 5
![Page 6: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/6.jpg)
gamedesigninitiativeat cornell university
the
Single Player Game
Data Collection 6
Instrumenting Activity
Online Game
Server
Client
LOG? LOG?
![Page 7: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/7.jpg)
gamedesigninitiativeat cornell university
the
Single Player Game
Data Collection 7
Instrumenting Activity
Online Game
Server
Client
LOG? LOG?
You Need to Access the Data Log
![Page 8: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/8.jpg)
gamedesigninitiativeat cornell university
the
Single Player Game
Data Collection 8
Instrumenting Activity
Online Game
Server
Client
LOG? LOG?
![Page 9: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/9.jpg)
gamedesigninitiativeat cornell university
the
Single Player Game
Data Collection 9
Instrumenting Activity
Online Game
Server
Client
LOG? LOG?
Battery &
Connectivity Server Load
![Page 10: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/10.jpg)
gamedesigninitiativeat cornell university
the
Single Player Game
Data Collection 10
Instrumenting Activity
Online Game
Server
Client
LOG LOG
![Page 11: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/11.jpg)
gamedesigninitiativeat cornell university
the
� Disks (for logs) are slow � SDD: 500 MB/s
� Standard Disk: 200 MB/s
� Or about 8 MB/frame
� Memory (for state) is fast � RAM: ~15000 MB/s
� Or about 250 MB/frame
� Cannot write fast enough! � Asynchronous logging?
Data Collection 11
Even This is Not Enough
Update
Draw
State
Log
![Page 12: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/12.jpg)
gamedesigninitiativeat cornell university
the
Solution: Limit What you Collect
� Way too much data to collect everything � Example: 500 objects with 10 integer size fields
� 20 KB/frame or 1.2 MB/s or 72 MB/minute
� Console games have more objects, played longer
� What would you do if even if you could? � How do you search through all this data?
� Just saying “use a database” is not enough
Data Collection 12
![Page 13: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/13.jpg)
gamedesigninitiativeat cornell university
the
� I/O has problems with jitter � Throughput is not constant � Might be in use elsewhere
� Update loop also has jitter � May finish before 16ms � “Sleeps” until next frame
� Remove jitter with a budget � Use time at end of update � Write as much as possible � Will keep up in the long run
Data Collection 13
Aside: Asynchronous Still Desirable
Update
Draw
Game State
Every Frame
Log State
Periodically (With Budget)
![Page 14: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/14.jpg)
gamedesigninitiativeat cornell university
the
Performance
� Only collect what you need � Overcoming challenges
� Significant game events
� Unusual actions/interactions
� Instrumentation is complex � Identify the source of event
� Find this location in code
� Add log statement there
Data Collection 14
The Classic Tradeoff
Extensibility
� Who knows what I need? � Resource data can get large
� Player position might matter
� Eventually have full state
� How do I get more data? � Always log it all (no)
� Ask programmer to add more instrumentation(?)
![Page 15: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/15.jpg)
gamedesigninitiativeat cornell university
the
How Do We Solve this Problem?
Data Collection 15
![Page 16: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/16.jpg)
gamedesigninitiativeat cornell university
the
How Do We Process All this Data?
� How you query the data is more important � Example: SQL, Map Reduce, custom set-up
� This often determines data format for you.
� In general, use a uniform schema for the data � All data should have the same attributes
� Avoid key-value blobs; make attributes structured
� This means you could use SQL (even if you do not)
Data Collection 16
![Page 17: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/17.jpg)
gamedesigninitiativeat cornell university
the
Is SQL Suitable for Analytics?
Data Collection 17
� Game state is (often) naturally relational � Objects are a long list of attributes � Store each game object as a row
� Lots of libraries can convert objects to SQL
key player x y health damage range flying
1 1 12 342 100 50 20 false
2 1 43 12 100 50 20 false
3 2 123 90 50 30 50 true
![Page 18: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/18.jpg)
gamedesigninitiativeat cornell university
the
Or Maybe Not
Data Collection 18
![Page 19: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/19.jpg)
gamedesigninitiativeat cornell university
the
Data Collection 19
Can Still Do This Relationally
But how easy is this to process?
player … inventory
1 … 10
2 … 11
3 … 12
Player Table item … value
20 … 250
21 … 1000
22 … 50
23 … 100
24 … 5000
Item Table player item qty
1 21 1
1 20 3
1 22 10
2 21 1
2 24 1
Join Table
![Page 20: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/20.jpg)
gamedesigninitiativeat cornell university
the
Analytics Involve Time-Sensitive Data
Data Collection 20
� Each event/data needs to be time-stamped � Time it happened in seconds/milliseconds/etc � Typically use an int, as it makes grouping easier
� Time attribute is an integral part of queries
key player x y health damage range flying time
1 1 12 342 100 50 20 false 1209823
2 1 43 12 100 50 20 false 1309875
3 2 123 90 50 30 50 true 1724304
![Page 21: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/21.jpg)
gamedesigninitiativeat cornell university
the
Example: Aggregating Daily Prices
� Average price of each item, each day � Want to group together item transactions per day
� Use SEC_PER_DAY to convert time stamp to days
� Relatively simple aggregate query
Data Collection 21
SELECT item, avg(price), time/SEC_PER_DAY FROM gamedata GROUPBY time/SEC_PER_DAY
![Page 22: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/22.jpg)
gamedesigninitiativeat cornell university
the
Hard Example: Quest Order
� Return first quest completed after tutorial level � Want a quest that is completed after tutorial
� Want no other quest completed in between
� This is known as a time-series query
Data Collection 22
Tutorial Quest 1
Quest 2
![Page 23: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/23.jpg)
gamedesigninitiativeat cornell university
the
Time Series in SQL
� Return all pairs of consecutive time events
Data Collection 23
SELECT D1.name, D2.name FROM Data as D1, Data as D2 WHERE D2.time > D1.time
EXCEPT
SELECT D1.name, D2.name FROM Data as D1, Data as D2, Data as S WHERE S.time > D1.time and D2.time > S.time
![Page 24: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/24.jpg)
gamedesigninitiativeat cornell university
the
NoSQL is not Much Help
� Time series is a correlation of actions � Events X that happen after Y � Correlation requires a DB join
� NoSQL means Map Reduce � Map: Sort data into key, values � Reduce: Aggregate values of same keys � Example: Average price of an item
� Joins are hard in Map Reduce � Google gets scalability by “cheating”
Data Collection 24
![Page 25: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/25.jpg)
gamedesigninitiativeat cornell university
the
� Given set of event streams � Database with time stamps � Can only read data in order � Can only read data once
� Idea: Get events as they occur � But this is a little unrealistic � Delay to create the event
� Also have a pattern query � Matches event/set of events � Outputs match as new event � Called a composite event
Data Collection 25
Alternative: Data Stream Systems
Event Stream
Event Stream
Event Stream
Event Pattern Query
Output also an Event Stream
![Page 26: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/26.jpg)
gamedesigninitiativeat cornell university
the
External Stream System
Data Collection 26
Using Data Stream Systems
Internal Stream System
Event Stream
Pattern
Event Stream
Game State
Data Stream System
![Page 27: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/27.jpg)
gamedesigninitiativeat cornell university
the
External Event System
� Game acts as base stream � Matching done outside � Simple but needs bandwidth
� Several commercial options � Streambase � Microsoft StreamInsight � Oracle CEP
� Less open source options � Lot of academic prototypes
Data Collection 27
Using Data Stream Systems
Internal Event System
� Game handles all matching � Event streams in game state � Bandwidth only for output
� You must implement system � Maybe use open source � But always read licenses!
� Why do this? Complex AI � Just like rule matching! � Bioshock Infinite does this
![Page 28: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/28.jpg)
gamedesigninitiativeat cornell university
the
Did I Mention A LOT of Data?
� Amount of data might be too much for one log
� Just one player can create a significant amount
� Data probably in different categories
� Examples: Items sold, quests finished
� Put the data in appropriate databases
� Each might be on a different server
� Data streams can send data to the right place
Data Collection 28
![Page 29: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/29.jpg)
gamedesigninitiativeat cornell university
the
Data Collection 29
Data Streams and Categorization
Log Data Stream System
Query 2
![Page 30: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/30.jpg)
gamedesigninitiativeat cornell university
the
Classic Event Patterns
� Selection: P(e) (event e has property P) � Can be applied to base or composite events � Common property for composite events: duration
� Ordering: E1:E2 (E2 happened after E1)
� Sequences: E1;E2 (E2 is the first event after E1) � Ranges over all events fed into the pattern query � Sometimes want to exclude certain matches to E2 � E1 ;P E2 (E2 is first event after E1satisfying P)
Data Collection 30
![Page 31: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/31.jpg)
gamedesigninitiativeat cornell university
the
Patterns and Base Streams
Data Collection 31
(a1,b1,c1,…,t1)
(a2,b2,c2,…,t2)
(a3,b3,c3,…,t3)
(a4,b4,c4,…,t4)
(a5,b5,c5,…,t5)
(a6,b6,c6,…,t6)
(a7,b7,c7,…,t7) …
New
er E
vent
s
Base Stream from Application
(a1,b1,c1,…,t1)
(a3,b3,c3,…,t3)
(a6,b6,c6,…,t6)
(a7,b7,c7,…,t7) …
New
er E
vent
s
Derived Stream “P”
P(x)
Filter Predicate
Boolean expression on the event tuple (easy to compute)
One for each predicate
![Page 32: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/32.jpg)
gamedesigninitiativeat cornell university
the
Sequencing Streams
Data Collection 32
(a2,b2,…,t2)
(a4,b4,…,t4)
(a7,b7,…,t7) …
Input Stream E1
(a4,…,x4,…,t2,t5)
…
Derived Stream E1;E2
E1;E2
Event Pattern (x1,y1,…,t1)
(x5,y5,…,t5)
(x6,y6,…,t6)
(x7,y7,…,t7) …
Input Stream E2
(a2,...,x5,…,t2,t5)
![Page 33: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/33.jpg)
gamedesigninitiativeat cornell university
the
Sequencing Streams
Data Collection 33
(a2,b2,…,t2)
(a4,b4,…,t4)
(a7,b7,…,t7) …
Input Stream E1
(a4,…,x4,…,t2,t5)
…
Derived Stream E1;E2
E1;E2
Event Pattern (x1,y1,…,t1)
(x5,y5,…,t5)
(x6,y6,…,t6)
(x7,y7,…,t7) …
Input Stream E2
(a2,...,x5,…,t2,t5)
Have to store until a match
is found
![Page 34: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/34.jpg)
gamedesigninitiativeat cornell university
the
� All events must happen � Specific time it is generated � Crucial for time series
� Pattern output is an event! � New stream to be reused � Often pick latest time stamp
� But duration is important � Good candidate for P(x) � Allows us to timeout
� Solution: intervals
Data Collection 34
Some Words on Time Stamps
Base Events (B)
Composite Events (B;B)
e1 e2 e2 e4
e1; e2
e2; e3
e3; e4 Still Lossy. Why?
![Page 35: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/35.jpg)
gamedesigninitiativeat cornell university
the
Ordering Streams
Data Collection 35
(a2,b2,…,t2)
(a4,b4,…,t4)
(a7,b7,…,t7) …
Input Stream E1
…
Derived Stream E1:E2
E1:E2
Event Pattern (x1,y1,…,t1)
(x5,y5,…,t5)
(x6,y6,…,t6)
(x7,y7,…,t7) …
Input Stream E2
(a2,…,t2,t5) (a4,…,t2,t5)
(a2,…,t2,t6) (a4,…,t2,t6)
(a2,…,t2,t7) (a4,…,t2,t7)
O(n2) blow-up.
Need timeouts!
![Page 36: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/36.jpg)
gamedesigninitiativeat cornell university
the
� Analysis => lots of patterns � Each play action has pattern
� Was purpose of the system
� Time Series => data storage � Have to store the first part
� Drop at match (or timeout)
� Idea: Each pattern has a list � Stores uncompleted part
� Complex time series?
� Pattern 1: P
� Pattern 2: P;Q
� Pattern 3: R;QS
� Pattern 4: P;Q;R
Data Collection 36
Managing Patterns
Nothing to store
Have to store
Have to store
Must store Must store
![Page 37: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/37.jpg)
gamedesigninitiativeat cornell university
the
Data Collection 37
Solution: Automata
Pattern: T<30s((P;QR):S)
P R
S
(a3,…,t3)
(a4,…,t4)
(a7,…,t7)
(a1,b2,…,t1,t2)
S True Q < 30s
< 30s
Each edge has: � Input event stream
� Test predicate
� (Concatenation op)
For each edge: 1. Take each tuple from out-state 2. Concatenate with stream tuple 3. Test if predicate is true 4. If so, move tuple to in-state
Nondeterminstic
![Page 38: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/38.jpg)
gamedesigninitiativeat cornell university
the
Events and Data Driven-Development
� Designers should not program these automata � Essentially want a nice query language � There are some simple ones to implement
� May already have automata framework for AI � Allow designers to specify AI behavior in script file � Compile into automaton when necessary
� Your event system can piggy back on this � Similar system (except for the tuple storage) � But need a different (custom) compiler
Data Collection 38
![Page 39: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/39.jpg)
gamedesigninitiativeat cornell university
the
The Main Challenge: Negation
� Common pattern: � Find a sequence A;B with no C in between � Example: Quest started/done with none in between
� Classic Idea: Regular expressions � A(; ¬C)*;B (Keep checking that not C until find B) � This is not quite right, however. Why? � Bigger issue: should not store “not” events
� Messy technical challenge; no clean solutions � Purpose of contexts in Irrational’s technical talk
Data Collection 39
![Page 40: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/40.jpg)
gamedesigninitiativeat cornell university
the
Putting this All Together
� We need to record the data fast � Requires a custom logging system in game � Have to be choosy about what data is recorded � Need to be flexible to respond to designers
� We need to process the data fast � Depends on the choice of query languages � Need support for aggregation and time-series � Data stream systems are ideal for this
Data Collection 40
![Page 41: Lecture 16 - Cornell University](https://reader030.vdocument.in/reader030/viewer/2022020706/61fc8cef8d33c02b785e7bb1/html5/thumbnails/41.jpg)
gamedesigninitiativeat cornell university
the
Brainstorm: Mobile Data Collection
Data Collection 41