which dbms and why?
DESCRIPTION
Use cases for different DBMSsTRANSCRIPT
![Page 1: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/1.jpg)
Which DBMS And Why?
سید مجید عظیمی@majidazimi
/پنجمین همایش سراسری نرم افزارهای آزاد متن باز
۱۳۹۳–زنجان شهریور
![Page 2: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/2.jpg)
Theory: ACID
● Atomicity● Consistency● Isolation● Durability
![Page 3: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/3.jpg)
Theory: CAP
![Page 4: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/4.jpg)
Theory: BASE
● Basically Available● Soft State● Eventual Consistency
– Hopeful Consistency?
![Page 5: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/5.jpg)
Read Only Data
![Page 6: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/6.jpg)
!!!!استاد
. بهشون پیامک ارسال بشهنباید شماره تلفن دارم که میلیون۱۰
به مشتری همون لحظهوب سرویس باید در . هست لیست سیاهبگه که شماره ارسالی توی
![Page 7: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/7.jpg)
Read Only Data
● Change rate is low over time● Mainly real time● Size matters alot● Can we use estimation strategies?
![Page 8: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/8.jpg)
Redis● Key/Value● InMemory
– Persistent using snap shot● Lot's of data structures
– Set, Sorted Set, List, Hash● Sharding & Replication is simple
![Page 9: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/9.jpg)
Redis: Set/* add elements */SADD blacklist 9352579085SADD blacklist 9143028953
/* search */SISMEMBER blacklist 9127984909
10 Million Numbers * 8 Byte = 80 MB
![Page 10: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/10.jpg)
Redis: BitSet/* set bit */SETBIT blacklist 9127984909 1SETBIT blacklist 9143028953 1
/* is 9123024909 blacklisted? */GETBIT blacklist 9123024909
300 Million Numbers * 1 Bit = 37.5 MB
![Page 11: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/11.jpg)
MySQL InMemory EngineCREATE TABLE `blacklist` (`phone_number` BIGINT NOT NULL;
) ENGINE = MEMORY;
SELECT * FROM `black_list`WHERE `phone_number` = ?
Use Old School SQL
![Page 12: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/12.jpg)
RAMDISK: Make any DBMS InMemory
● Create partition on RAM● Mount partition ● Tell DBMS to store data theremkdir -p /datamount -t tmpfs -o size=800M tmpfs /data
![Page 13: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/13.jpg)
Estimation: Bloom FilterSpace efficient probabilistic data structure, used to test whether an element is a member of a set or not.
A query returns either:– possibly in set– definitely not in set
![Page 14: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/14.jpg)
Offline Reporting
![Page 15: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/15.jpg)
!!!!استاد
در سطح کشور داریمترددشمار تا دستگاه ۱۰۰۰.که تردد خودرو هارو به دیتابیس مرکزی ارسال می کنند
. رکورد میشه میلیون۵روزی تقریبا
... کنم؟ خیلی زیادهگزارش گیریچه جوری باید
![Page 16: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/16.jpg)
Schemaid BIGSERIAL PRIMARY KEY,station_id INT NOT NULL,passed_time BIGINT NOT NULL,lane INT NOT NULL,speed INT NOT NULL,headway FLOAT NOT NULL,direction INT NOT NULL,is_speed_offender INT NOT NULL,is_headway_offender INT NOT NULL,is_direction_offender INT NOT NULL
![Page 17: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/17.jpg)
SELECT COUNT(*), AVG(speed), AVG(headway), SUM(is_speed_offender),SUM(is_headway_offender),SUM(is_direction_offender)
FROM vbvWHERE passed_time <= 1408652000 AND passed_time >= 1407615200GROUP BY station_id, Lane, DATE_TRUNC('day', passed_time);
![Page 18: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/18.jpg)
Problems
● Record Oriented Storage Engine● Parrallel Query Processor
![Page 19: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/19.jpg)
Record Oriented Engine
id station_id time speed headway direction
1 457812 1407115200 75 2.3 1
2 368525 1402215200 45 1.5 -1
3 458512 1407634200 112 4.7 1
4 369585 1407664200 96 0.9 1
![Page 20: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/20.jpg)
Column Oriented Engineid 1 2 3 4
station_id 457812 368525 458512 369585
time 1407115200 1402215200 1407634200 1407664200
speed 75 45 112 96
headway 2.3 1.5 4.7 0.9
direction 1 -1 1 1
![Page 21: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/21.jpg)
MonetDB
● Column Oriented DBMS● Parallel Query Processor● Just In Time HashIndex
Random CRUD? Fail...
![Page 22: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/22.jpg)
Counting
![Page 23: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/23.jpg)
!!!!استاد
تعداد کلیک هاییه سرور تبلیغاتی داریم که میخاییم . بشماریمکاربران روی لینک های مختلف رو
....بعدش از مشتری پول بگیریم بر همین مبنا
![Page 24: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/24.jpg)
Options
● Good old SQL● Cubrid● MongoDB● Cassandra / Hbase / Riak● Estimate Counting
![Page 25: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/25.jpg)
Good Old SQL
UPDATE `tbl_ads` SET `counter` = `counter` + 1 WHERE `link` = ?
![Page 26: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/26.jpg)
Cubrid● Standrad ACID RDBMS● Built in Counter
SELECT INCR(counter)FROM tbl_adsWHERE link = ?
![Page 27: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/27.jpg)
MongoDB● NoSQL Document Oriented DB● JSON Like Query Language● NO ACID● Built in Counter
db.ads.update({link: ?}, {$inc: {counter: 1}})
![Page 28: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/28.jpg)
Cassandra / HBase / Riak
● Maximum Performance● Distributed Counter
– Each node counts independently
![Page 29: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/29.jpg)
Cassandra / HBase / Riak● Cassandra
– Key/Value– Column Oriented
● HBase– Key/Value– Column Oriented
● Riak– Pure Key/Value
![Page 30: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/30.jpg)
Cassandra
UPDATE tbl_ads SET counter = counter + 1 WHERE link = ?
![Page 31: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/31.jpg)
Estimate CountingHyperLogLog is an approximate
technique for computing the number of distinct entries in a set (cardinality). It does this while using a small amount of memory. For instance, to achieve 99%
accuracy, it needs only 16 KBWikipedia
![Page 32: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/32.jpg)
ComparsionDBMS CAP BASE
Cassandra AP *
MongoDB CP -
HBase CP -
Riak AP *
![Page 33: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/33.jpg)
Hierarchical Structure
![Page 34: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/34.jpg)
!!!!استاد
متادیتای.میخام یه چیزی شبیه دراپ باکس بسازم فایل ها و فولدر ها رو چه جوری باید ذخیره کنم؟
...سیستم شیرینگ فایل و فولدر واقعا پیچیده است
![Page 35: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/35.jpg)
Join Hell
Storing parentchild relation ship with unlimited depth is the main problem
with Relational & Document Oriented DBs. Doing analytics is even harder.
![Page 36: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/36.jpg)
Options● Neo4J● Hybrid Technologies
– OrientDB– ArangoDB
![Page 37: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/37.jpg)
Neo4J● Most robust implementation● Cypher for query language● Server side user space code● Graph algorithms● CA● Blueprint API
![Page 38: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/38.jpg)
Hybrid TechnologiesOne engine to rule them all:– Graph layer– Key/value layer– Document Layer– Configurable CAP
![Page 39: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/39.jpg)
Hybrid Technologies● ArangoDB
– Key/Value– Document Store– Graph Store
● OrientDB
– Document Store– Graph Store
![Page 40: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/40.jpg)
Session Store
![Page 41: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/41.jpg)
!!!!استاد
چند تا میخام برای وب سایت پروژه از باید سشن کاربرها رو یه جای وب سرور . استفاده کنم
.دیگه ذخیره کنم
چه راه حلی پیشنهاد میدین؟
![Page 42: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/42.jpg)
Session: Key/Value Structure
![Page 43: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/43.jpg)
Redis
● Use Hash data structure● Partition data over multiple servers● Enable replication● Take snapshot for persistence
![Page 44: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/44.jpg)
Cassandra / HBase / Riak● More reliable than Redis
– Lower performance● Cross Data center replication
– Cassandra– Riak
● REST Interface– Riak
![Page 45: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/45.jpg)
NTP vs Vector Clock● Cassandra uses NTP
– Clock drift can cause unexpected results– Spend some time on NTP– Not just yum install ntpd
● Riak uses vector clock– Whatever happens to system time you
are rescued!!!
![Page 46: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/46.jpg)
Time Series
![Page 47: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/47.jpg)
!!!!استاد
.تو سطح کشور تعداد زیادی سنسور دماسنج داریم
ثبت و ارسال می کنن ثانیه یک بار۳۰هر اطلعات دمایی رو .همین جا برای ذخیره و بازیابی این همه رکورد
راه حل چی پیشنهاد میدین؟
![Page 48: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/48.jpg)
Time Series
Time series is a sequence of data points, measured typically at successive points in time spaced at uniform time intervals
![Page 49: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/49.jpg)
What we need?● Best Scalibility
Key/Value
● Best Store/Retreive
Column Store
![Page 50: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/50.jpg)
![Page 51: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/51.jpg)
CREATE TABLE temperature (sensor_id INT,capture_day INT, -- 13930101capture_date BIGINT,temperature INT,
PRIMARY KEY ((sensor_id, capture_day),capture_date)
);
![Page 52: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/52.jpg)
● Sensor 1, 13930101 goes to server 1● Sensor 2, 13930101 goes to server 2● Sensor 1, 13930102 goes to server 3● Sensor 2, 13930102 goes to server 4● Sensor 1, 13930103 goes to server 1● Sensor 2, 13930103 goes to server 3● Sensor 1, 13930104 goes to server 4
![Page 53: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/53.jpg)
Options● Cassandra● Hbase
DBMS CAP Base
Cassandra AP *
Hbase CP -
![Page 54: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/54.jpg)
Sparse Columns
![Page 55: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/55.jpg)
!!!!استاد
اکثر ستون هاش نال هستیه سری جدول دارم .ده شده به کوئری هام ..
...به ابرفض دیگه بریدم
![Page 56: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/56.jpg)
Sparse Columnsid user_id(FK) f1 f2 f3
1 234 null 12 null
2 542 null null 987
3 644 null null null
WHERE `f1` = 12 AND `f1` IS NOT NULL
![Page 57: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/57.jpg)
EAV PatternCREATE TABLE tbl (id SERIAL PRIMARY KEY,user_id INT REFERENCES user (id),key TEXT NOT NULL,value TEXT NOT NULL
);
WHERE user_id = ? AND key = 'can_upload'
![Page 58: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/58.jpg)
PostgreSQL: HstoreCREATE TABLE user (id SERIAL PRIMARY KEY,username TEXT NOT NULL,setting HSTORE
);
WHERE user_id = ? AND setting->'can_upload' = true
![Page 59: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/59.jpg)
PostgreSQL: JSONCREATE TABLE user (id SERIAL PRIMARY KEY,username TEXT NOT NULL,setting JSON
);
WHERE user_id = ? AND setting->>'can_upload' = true
![Page 60: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/60.jpg)
Document Store● MongoDB
– JSON Query Language– Schema less– CP
● BaseX– XML– Schema less– XPath and XQuery
![Page 61: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/61.jpg)
DB as a File System
![Page 62: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/62.jpg)
!!!!استاد
همه جا نوشتن نباید فایل ها رو تو دیتابیس ذخیره کرد.من اصلن توجیح نیستم
....اونقدر ها هم کار احمقانه ای به نظر نمیاد
![Page 63: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/63.jpg)
DB as a File System
● ACID Sucks at storing files● Storing small files● Backup handling
![Page 64: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/64.jpg)
MongoDB: GridFS● Designed for storing small files
– Multiple mega bytes at maximum● Crazy simple to use● No permission● You can't mount as a file system
– Only programming API is available
![Page 65: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/65.jpg)
Riak Cloud Storage
● Extremly large files● Amazon S3 Compatible● Comprehensive permission system● Mountable as a file system
![Page 66: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/66.jpg)
![Page 67: Which DBMS and Why?](https://reader034.vdocument.in/reader034/viewer/2022051817/547e53045906b592718b465a/html5/thumbnails/67.jpg)
?