big data
DESCRIPTION
Big Data. Anton Boyko. Agenda. What is Big Data? Why Big Data? How to Big Data?. What is Big Data?. Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, manage , and process the data within a tolerable elapsed time. Data growth. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Big Data](https://reader036.vdocument.in/reader036/viewer/2022062517/5681348a550346895d9b6f85/html5/thumbnails/1.jpg)
Big DataAnton Boyko
![Page 2: Big Data](https://reader036.vdocument.in/reader036/viewer/2022062517/5681348a550346895d9b6f85/html5/thumbnails/2.jpg)
Agenda
• What is Big Data?• Why Big Data?• How to Big Data?
![Page 3: Big Data](https://reader036.vdocument.in/reader036/viewer/2022062517/5681348a550346895d9b6f85/html5/thumbnails/3.jpg)
What is Big Data?
Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time.
Gigabytes
Terabytes
Petabytes
…
![Page 4: Big Data](https://reader036.vdocument.in/reader036/viewer/2022062517/5681348a550346895d9b6f85/html5/thumbnails/4.jpg)
Data growth
Big Data
Volume 10x
Velocity 4.3
Variety 85%
![Page 5: Big Data](https://reader036.vdocument.in/reader036/viewer/2022062517/5681348a550346895d9b6f85/html5/thumbnails/5.jpg)
How to process Big Data?
Traditional way Appropriate
way
![Page 6: Big Data](https://reader036.vdocument.in/reader036/viewer/2022062517/5681348a550346895d9b6f85/html5/thumbnails/6.jpg)
Move data to compute
![Page 7: Big Data](https://reader036.vdocument.in/reader036/viewer/2022062517/5681348a550346895d9b6f85/html5/thumbnails/7.jpg)
Move compute to data
• Fast storage vs. fast CPU and fast networking
• Linear scalability
![Page 8: Big Data](https://reader036.vdocument.in/reader036/viewer/2022062517/5681348a550346895d9b6f85/html5/thumbnails/8.jpg)
Map/Reduce workflow
File system File system
Mappers(find
matches)
Reducers(combine matches)
Mappers(inverse keys and values)
Reducer (combine results)
DFS temp
![Page 9: Big Data](https://reader036.vdocument.in/reader036/viewer/2022062517/5681348a550346895d9b6f85/html5/thumbnails/9.jpg)
Map/Reduce – how it workspublic class NamespaceMapper : MapperBase{ //Override the map method. public override void Map(
string inputLine,MapperContext context)
{ var reg = new Regex(@"(using)\s[A-za-z0-9_\.]*\;"); var matches = reg.Matches(inputLine);
foreach (Match match in matches) { //Just emit the namespaces. context.EmitKeyValue(match.Value,"1"); } }}
public class NamespaceReducer : ReducerCombinerBase{ //Accepts each key and count the occurrences public override void Reduce(
string key,IEnumerable<string> values,
ReducerCombinerContext context) { //Write back context.EmitKeyValue(key,values.Count().ToString()); }}
![Page 10: Big Data](https://reader036.vdocument.in/reader036/viewer/2022062517/5681348a550346895d9b6f85/html5/thumbnails/10.jpg)
Traditional RDBMS vs. Map/Reduce
RDBMS
• Terabytes of data
• Static schema• Interactive and
batch access• Nonlinear
scaling
Map/Reduce
• Exabytes of data (or more)
• Dynamic schema• Batch access
only• Linear scaling
![Page 11: Big Data](https://reader036.vdocument.in/reader036/viewer/2022062517/5681348a550346895d9b6f85/html5/thumbnails/11.jpg)
Hadoop – implementation of Map/Reduce engine
![Page 12: Big Data](https://reader036.vdocument.in/reader036/viewer/2022062517/5681348a550346895d9b6f85/html5/thumbnails/12.jpg)
Hadoop ecosystem
![Page 13: Big Data](https://reader036.vdocument.in/reader036/viewer/2022062517/5681348a550346895d9b6f85/html5/thumbnails/13.jpg)
Offering
• ODBC for Excel• PowerPivot• Windows Server or Windows Azure• C#, Java, JavaScript
![Page 14: Big Data](https://reader036.vdocument.in/reader036/viewer/2022062517/5681348a550346895d9b6f85/html5/thumbnails/14.jpg)
Demo
![Page 15: Big Data](https://reader036.vdocument.in/reader036/viewer/2022062517/5681348a550346895d9b6f85/html5/thumbnails/15.jpg)
Pricing
Head Node
• Single extra large instance (8 CPU 14 GB)
• $0.32 per hour• $238 per
month
Compute Node
• One or more large instances (4 CPU 7 GB)
• $0.16 per hour• $119 per
month