Download - How do you decide where your customer was?
![Page 1: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/1.jpg)
Burak IŞIKLI Erkan HASPULAT
How Do You Decide Where Your Customer
Was?
![Page 2: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/2.jpg)
Who are we?
Burak IŞIKLISenior Software Engineer, Turkcellburak.isikli at turkcell dot com.tr@burakisikli, github.com/burakisikli
Erkan HASPULATSenior Software Engineer, Turkcellerkan.haspulat at turkcell dot com.tr@erkanhaspulat, github.com/ehaspulat
![Page 3: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/3.jpg)
Turkcell
• It is the first and only Turkish company ever to be listed on the NYSE.
![Page 4: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/4.jpg)
Our Hadoop Journey
2013-…15TB data processing daily6TB log transferring daily
350 jobs running jobs daily
![Page 5: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/5.jpg)
Our Hadoop Journey• ETL
CDR Analysis, Log Processing...• Analytics
Fraud Analytics, Clickstream, Recommendation Engine
• Data Lake Customer Journey
![Page 6: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/6.jpg)
Architecture
RDBMS
Logs
Tuna
Unix/Internal Tools
HADOOP
DB Dashboards
Archive Ad-Hoc Analysis
Mining
External Systems
Alarm Systems
![Page 7: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/7.jpg)
Issues
• Default python is 2.6 but Spark Ipython works with Python 2.7+
• Security&Auditing Issues Copying Data by Masking Dynamic Data Masking SOX Compliance
![Page 8: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/8.jpg)
Location Analysis
Find the subscriber’s location using cell information.
11 billions rows/day 0.5 Tb/day 2.5 hours processing time
Hadoop Streaming w/Perl Sqoop
![Page 9: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/9.jpg)
Recursive Process1. Subscriber calls2. XDR is generated3. FTP/SCP process is started4. Put into HDFS
Dispatcher
![Page 10: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/10.jpg)
Dispatcher
Why not Flume?Ftp, scp, rsync? Think about 6TB/dayRsync and ftp works serial!
![Page 11: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/11.jpg)
DispatcherExperiment - CDRsEvery 15 secs up to 1mb/file, total 10mb gzipped files
java.lang.OutOfMemoryErrorPIG_HEAPSIZE=8000Failed reallocation of scalar replaced objectsJDK-8145996
Img: http://bit.ly/1QSRkGn
![Page 12: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/12.jpg)
Location AnalysisJoin? Perl?Mapper>Header/TrailerJoinColumn-fileName-Rest
Img: http://bit.ly/1TrUJhv
![Page 13: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/13.jpg)
A tree in the forest
Img: http://go.nasa.gov/1SX3wGl
![Page 14: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/14.jpg)
Volume
May 2014 Jul. 2014 Nov. 2014
1 TB 1.5 TB 2 TB
Data size is growing too fast!What about LTE?
Data
Time Mar. 2016
4.5 TB
![Page 15: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/15.jpg)
Adopt the volume
![Page 16: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/16.jpg)
Perl Job 212 min 104 minPig Job 77 min 18 min
Upgrade
• No space left on disk!• Hadoop upgrade
0.23 -> 1.3.1 -> 2.7.1• Linear scalability
Nodes 1M + 4D 1M + 1SM + 16D
Disk 15 TB 698 TB
CPU 20 Core 224 Core
Memory 1024 GB 1.5TB
Version 0.23 HDP 2.3.2
![Page 17: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/17.jpg)
Industry-Specific Analysis
Competitor ComparisonE.g. Shopping center comparison in Istanbul based on city, district, demographic information (age, sex, income, job… etc.)
![Page 18: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/18.jpg)
Industry-Specific Analysis
But how?Perl?Hive or Pig?What else?
![Page 19: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/19.jpg)
Industry-Specific Analysis
But how? Hive external partitionALTER TABLE t1 ADD PARTITION(DAILY_CALENDAR_ID=‘20160101') LOCATION '/user/…/tlc/daily_calendar_id=20160101'"
![Page 20: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/20.jpg)
Movement Index
Subscribers journeys is provided to determine an analysis with which they transport between cities via signaling data• Airline companies• Bus companies• Local government• Survey companies
![Page 21: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/21.jpg)
Movement Index
Simple Euclidean Distance
Equ: https://en.wikipedia.org/wiki/Euclidean_distance
Finding the change of locationFirst, find out the closeness of each cell using coordinates
![Page 22: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/22.jpg)
Movement Index
Euclidean Distance Hive Query – Cross Join
![Page 23: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/23.jpg)
Movement Index
Finally, all needs to be done is simple another query
Img: http://bit.ly/1MKAiuT
![Page 24: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/24.jpg)
Movement Index
But one problem!
java.io.IOException:java.lang.IllegalArgumentException: Column [daily_calendar_id] was not found in schema! Are you kidding me!!
Img: http://bit.ly/1N3QKRQ
![Page 25: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/25.jpg)
Movement Index
Just another bug:HIVE-11401Workaround solution: hive.optimize.index.filter=false;Permanent solution: Hive 2.0
Img: http://bit.ly/1W1USZM
![Page 26: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/26.jpg)
Is it enough?Img: http://bit.ly/1ZUEsm7
![Page 27: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/27.jpg)
Ongoing Projects
Movement Predicton Spark ML
Real Time Location Analysis Spark Streaming
Hadoop on SQL: Spark SQL, Impala… etc.
Img: http://bit.ly/25DxPsq
![Page 28: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/28.jpg)
Acknowledgements
Special thanks toCaner CANAKUğur Cumhur ÇELİK
Img: http://bit.ly/1RVnde7
![Page 29: How do you decide where your customer was?](https://reader033.vdocument.in/reader033/viewer/2022052405/58713b971a28abf0568b6e75/html5/thumbnails/29.jpg)
Thank You!