big data - harisfazillah jamel - startup and developer 4th meetup 5th november 2016

24
Big Data Harisfazillah Jamel Startup and Developer 4th Meetup 5th November 2016

Upload: linuxmalaysia-malaysia

Post on 14-Apr-2017

106 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

Big DataHarisfazillah Jamel

Startup and Developer 4th Meetup

5th November 2016

Page 2: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

Why Big Data?Big Data is not only for big player

Big Data is also for Us. Startup and developers

Data is raw gold. Information about us is the end product.

Data define us. Web Server log, web page analytic and comments about or products.

Page 3: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

What Is Big Data?Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy. (Wikipedia)

Lets redefine big data for us.

Page 4: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

What Is Big Data?

Volume . Variety . Velocity . Veracity

● Very big data● Multiple sources● Stream in data● Accuracy of the data

Page 5: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016
Page 6: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

Redefine Big Data For Startup4 important terms :-

● Data Sets● Data Processing● Analytic● Visualization

Big Data is big. We need to focus

Page 7: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

What Should We Call Our Big Data?● Small Data● Startup Data● No Data

We need to visualize our data since day 0

It’s a must

Page 8: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

Why Big Data?Big data analytics examines large amounts of data to uncover hidden patterns, correlations and other insights. (SAS)

We need to know our own insight. Visualize our future.

Page 9: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

Data SetsWe don’t have any data (No data) or lack of data - Hendak cari data kita cari data

Our own data or

We have a place to start. www.data.gov.my

Page 10: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

Data Set : Our Own Data?● Web server log

○ IP address of the visitors. IP2Country● Web access analysis

○ Most visited pages● Comments from our users.

○ Good, bad, Like, Dislike.

Page 11: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016
Page 12: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

Issues With The Data?Lack of useable information.

We need to collect data on our own.

Ini peluang business untuk startup.

Page 13: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

What Need To Be Collected?

Page 14: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

Good Bad Like DislikeWhat we want to know from big data and any data that we

analysis is this :-

GOOD BAD LIKE DISLIKE

Sentiment analysis

Page 15: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

When Who Where What Why HowWhen - @timestamp is important for data analysis.

Who - Anonymous is important but we need to know male or female and his or her age.

Where - Anonymous is important, but we still need the IP address to know from which country or state or county.

What - The operating system, the browser's version

Why - Keywords thats lead them

How - How they know about us

Page 16: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

How To Visualize Our DataI’m a fan of ELK

Elasticsearch Logstash & Kibana

ELK is one of Big Data tools

Page 17: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

Index The Data With ESUsed Elasticsearch to Index our data.

One misconception. ES is not for storage.

Don’t used ES to store our data.

Data need to be archived elsewhere.

Page 19: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

Kibana

We can use Kibana to view our data in ES.

Page 20: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016
Page 21: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

DKANWe can store data with DKAN. DKAN follow CKAN.

The open source open data platform with a full suite of cataloging, publishing and visualization features that allows organizations to easily share data with the public.

http://www.nucivic.com/dkan/

Take advantage DKAN Datastore API

Page 22: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016
Page 23: Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016

GeoSpatial Is ImportantOur data need to have spatial information (GPS Coordinate)

We can used GeoServer to have our own Map Server.

http://geoserver.org/