what big data can do for an entire nation …...what big data can do for an entire nation robson...
TRANSCRIPT
WHAT BIG DATA CAN DO FOR AN ENTIRE NATIONRobson SerafinTechnical Support Engineer IIIDell [email protected]
2016 EMC Proven Professional Knowledge Sharing 2
“The great impact is taking place in finding ways to do things differently, where it was never possible
before, giving to these organizations a new potential of innovation driven by Software - BIG DATA.”
(Rodrigo C Gazzaneo)1
In common with business, a nation’s government needs to deal with an “avalanche” of data to make
well-informed decisions. Having these decisions in practice faster could deliver the right moves and
bring solutions that may benefit an entire nation at several levels, i.e. economic, educational,
cultural, social, etc.
The purpose of this article is to share in a simple and easy way what and how the new technologies
of Computer Science like Big Data and Cloud computing can assist politicians and leaders in the
management of tasks to address issues predictable and likely to happen during their administration.
These tasks involve hearing and watching what is happening, correlating them with the past
historical data through computing power analyzes and from the result takes the right direction to
solve them and proactively prevent future collapses. Being born a Brazilian, my article is based on
what I see every day in my country. However, all topics described herein are very much aligned with
the events that surround other societies in general. The solutions presented and discussed in this
article can safely be applied to any Nation with minimum customization.
Many aspects of Big Data and Cloud Computing use will be left out of this document, as they are
huge topics to explore. Nevertheless, an idea of how Big Data and Cloud can assist in a Nation
performing routine analysis on a daily basis will be explained and its results will be discussed. This
article will not go deep into the processing mechanisms or solutions available.
Disclaimer: Examples on the product design, published in this document do not necessarily reflect
real government infrastructure
2016 EMC Proven Professional Knowledge Sharing 3
Table of Contents
Explaining Big Data and Cloud Computing .................................................................................. 5
Tools to handle Big Data ..................................................................................................................................7
The way Data is processed with Hadoop .........................................................................................................8
A little info about the Hadoop Projects .............................................................................................................9
Cloud Computing ...............................................................................................................................................9
Differences Between Traditional Datacenter and Cloud Datacenters .......................................................... 10
What Big Data can do for a Nation ................................................................................................................. 11
Computer Science for the society ................................................................................................................. 11
Nation Macro Statistics ................................................................................................................................... 11
Solutions Overview ......................................................................................................................................... 12
Solutions: Overall view .................................................................................................................................. 13
Economy ...................................................................................................................................................... 13
A cup of joy means higher productivity .................................................................................................... 15
Education ..................................................................................................................................................... 17
Big Data and Cloud working as a service for Education .......................................................................... 17
The value of Big Data in the Education market ........................................................................................ 18
Adaptive Learning .................................................................................................................................... 20
Society .......................................................................................................................................................... 20
What is the gain with Big Data for the Society? ........................................................................................ 21
Creating an Intelligent Nation, Community By Community ...................................................................... 22
Culture .......................................................................................................................................................... 22
Security and National Defense ................................................................................................................. 23
Health Services ............................................................................................................................................ 27
Precision Medicine ................................................................................................................................... 28
Electronic patients medical records ......................................................................................................... 28
Internet Of Things ..................................................................................................................................... 29
Large Amount of Collected Data ≠ Right Collected Data ......................................................................... 30
Science and Researches .......................................................................................................................... 31
2016 EMC Proven Professional Knowledge Sharing 4
Big data in biomedicine ............................................................................................................................. 32
Politics Administration ................................................................................................................................... 32
Infrastructure ............................................................................................................................................... 33
Smart Cities .............................................................................................................................................. 33
Giant Infrastructure behind a Nation Territory that needs powerful data analysis .................................... 35
Disaster Management ............................................................................................................................... 36
Transparency vs. Corruption ........................................................................................................................ 39
Open Data means more government transparency ..................................................................................... 41
Big Data and the Challenges for the Future ................................................................................................. 41
First steps to start with Big Data Solution .................................................................................................... 42
Few products Available and Architecture examples ..................................................................................... 42
Big Data Processing Architecture Design .................................................................................................... 46
Storage Products .......................................................................................................................................... 47
Conclusion ....................................................................................................................................................... 48
Disclaimer: The views, processes or methodologies published in this article are those of the
authors. They do not necessarily reflect Dell EMC’s views, processes or methodologies.
Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.
2016 EMC Proven Professional Knowledge Sharing 5
Introduction
Based on Joseph Stalin’s statement, a Nation is a historically constituted, stable community of
people, formed on the basis of a common language, territory, economic life, and psychological
make-up manifested in a common culture.2
A Nation forms its own public political administrative system that can be called State3 also known as
an organized political community living under a single system of government.
Under this statement comes the information where this document was elaborated.
Big Data and Cloud Computing may be relevant when one analyzes the following points:
Large historical content
Large number of people sharing all the content in common system
Vast territory of information that in some cases can extend abroad
Tremendous economic factors that power the economy and exchange of goods between
other nations
Such huge content in its own culture that is almost impossible to measure and can dictate
several aspects of its formation, some not even known or not well documented
Highly complex society organization
Massive environment decisions that affect everybody directly and indirectly
Education system that is responsible for passing knowledge to the next generation and can
be considered as the pillars of current society
Military Security mechanisms to defend the Nation in all sectors (politic, economically and
socially)
Demanding Health system to cover all its people
Incredibly complex infrastructure to keep its livelihood
Interrelationship between communities
The above-mentioned points are not static information but on-the-fly data, generated everyday like a
live “organism” constantly moving. Managing it all has just become a huge challenge. Government
Employee responsible for different levels of society MUST have accurate data information ready
and available at all times to generate the reports that enable the right decision to be made and
furthermore, cross-reference them with past and present reports to predict future actions.
There is no better place to apply Computer Science / Big Data technology schemes than a Nation.
Data analysis can provide more details about its resources and administration in a variety of ways
from macro to micro views, either isolated or mixed, and cross-referenced information.
Big Data and other Computer Science strategies cannot replace politics-governance. It should be
seen as a powerful framework tool that can provide greater transparency and assist in common
decisions for the development and improvement of society.
With great Power comes great responsibility!
Politicians are responsible for administering their Nation. The use of power by some people affects
the behavior of others in several different ways – some unwanted. This is one of the reasons why
organization of ideas and best approaches with minimum mistakes possible in the analysis of
2016 EMC Proven Professional Knowledge Sharing 6
information are essential. The right Information available and clearly presented after careful analysis
can assist its governance team members on their daily tasks. The result of this will be a great place
to live and to interact with others in a common sense of equilibrium.
Explaining Big Data and Cloud Computing
Before we begin where Big Data Analytics and Cloud Computing can
help a Nation, first let’s give some background on Big Data and Cloud
and how data is processed.
Big Data4 is large and complex data processing, which traditional systems cannot handle due to
limitations on several layers. The main focus of Big Data is on:
Analysis of content in real time and cross-referencing it with historical information or other
data content.
Data captured using a variety of ways (sensors, the internet, personal devices, etc.)
Accurate Searches
Shared content among several sectors and organizations (education, economic, social, etc.)
Operating on modern Storage systems that can process the amount of data and make it
highly available anytime, anywhere.
Fast transfer of the volume of data across the sectors for processing analysis of the
information
Privacy of content where only the right people will have access.
This document will show application for Big Data and Cloud Computing – some real and some
illustrative – focusing on the public sector and research institutions. For Big Data we use an actual
scenario with the real example of a large amount of data is from CERN, the Nuclear Atomic facility
based in Europe that generates 40 TB per second of data during a phase of research captured from
several sensors installed on their appliances.
Another example much closer to our daily activities is an Airbus plane, whose turbines can generate
10 TB every 30 minutes – around 640 TB of logs from internal sensors in a single long flight. These
two examples give an idea of how much data is being generated on a daily basis and how this
amount of data is handled.
2016 EMC Proven Professional Knowledge Sharing 7
How big is your data? – David Hellman @ Myriad Genetics
Byte of data: one grain of rice
Kilobyte: cup of rice
Megabyte: 8 bags of rice
Gigabyte: 3 container Lorries
Terabyte: 2 container ships
Petabyte: covers Manhattan
Exabyte: covers the UK (3 times)
Zettabyte: fills the Pacific Ocean
Big Data challenge points also known as 5 Vs
Big Data is based on 5 V’s: Velocity | Volume | Veracity | Variety | Value.
From that, it is possible to generate innumerous calculation,
collection, and storing of a huge volume of data to develop
knowledge and direction for possible results.
Data Science or Data Analytics platforms can be broken down
into three distinct parts: Acquisition, Computation and Serving.
In other words, Collection > Processing > Results.
Tools to handle Big Data5
Nowadays, data is being generated from everywhere. An individual using a smartphone, uploading
photos and videos to social networks, sensor systems spread around the globe (Internet of Things)
sending precious log information almost about everything, surveillance video cameras creating
hours and hours of video images, audio recording systems, etc. Beyond a large amount of content
information being generated the data type also can be unstructured –not created under a relational
database structure with tables and columns – which create a layer of complexity to be processed. If
it was just structured data the amount of information to be processed is so big that none of the
current Database Application on supercomputers could process it. To sort that out, there is a new
range of Tools created to handle these large Big Data datasets called Hadoop and its amazing
Projects. As part of open source, Hadoop is offered in numerous distributions like Hortonworks,
Cloudera, MapR and Pivotal. Storage companies such as Dell EMC offer products to operate with
Hadoop where the focus is on scalability to handle the data with efficiency. I will present some case
scenarios with the tools for Big Data in place.
2016 EMC Proven Professional Knowledge Sharing 8
How Data is processed with Hadoop
The current supercomputers and infrastructure designs present limitations as they do not allow
scalability for such large amount of data. To overcome such limitation, Google introduced a new
model of processing called MapReduce. Later, Yahoo, inspired by Map Reduce papers and Google
File System, came up with an open source toolset called Hadoop with a special File System called
HDFS where a Dataset coming from MapReduce is spread on workers computers to be processed
in a scalable way that is almost limitless The idea behind it is not to work on super power
computers, but in numerous low-cost x86 computers also known as commodity hardware.
A basic understanding of how data is processed in Hadoop:
Hadoop is just the framework for Data Analytics to take place as it takes care of a layer on
Infrastructure needs to process a large amount of data. Previously, one would need to have a huge
team to support the infrastructure while always evaluating the need to increase performance.
Fighting within the limits of the appliances while working within the IT budget was another source of
headaches for IT.
With the Cloud and Big Data framework tools, the infrastructure is simplified and the IT budget can
be directed to the Data Analytics team – also known as the Data Scientists – who will use predictive
and prescriptive analytics to create value in the areas requested.
Leverage "BY" Analysis
This is an exploratory technique of examining a strategic entity by its data attributes. The analysis
takes place with:
Additional data sources
Additional dimensional entity characteristics
Additional areas for analytics exploration
2016 EMC Proven Professional Knowledge Sharing 9
Analytics into Action
Deliver analytics-driven scores and recommendations to the key units of teams involved.
A little info about the Hadoop Projects
Along with a set of tools to work with Big Data Applications, Hadoop has – in addition to its core
components MapReduce and HDFS – additional functionality that operate on multiple levels, called
Projects.
The most popular Projects and its brief description:
Hive: Data warehouse infrastructure for providing data summarization, query, and analysis.
HBase: Open source, non-relational, distributed database.
Mahout: Distributed and scalable machine learning algorithms that provide recommendations on
users’ taste.
Pig: a high-level platform for creating Map Reduce programs using the language called Pig Latin.
Oozie: Workflow scheduler system to manage Hadoop jobs.
Flume: Distributed service for collecting, aggregating and moving large amounts of log data for an
online analytic application.
Sqoop: Tool designed to efficiently transfer bulk data between Hadoop and structured data stores
such as relational databases.
Cloud Computing6
Cloud computing is a new IT model based on
on-demand infrastructure also known as
converged infrastructure and shared services.
Within this new technology shared resources,
data and information are provided to computers
and other devices on-demand. The main focus
is sharing resources to achieve high
performance and economies of scale similar to
electricity grid. The difference is it is offered on
the network.
This new model appears as convenient on-
demand network access to a shared pool of configurable computing systems such as networks,
servers, storage, applications and services that can be rapidly provisioned and released with
minimal management effort.
The National Institute of Standards and Technology defines Cloud Computing as:
On-demand self-service: Resources such as server time and network storage can be provided as
required, as needed automatically without requiring human interaction.
Broad network access: The system can be accessed by heterogeneous thin or thick client
platforms (e.g. mobile phones, tablets, laptops and workstations).
2016 EMC Proven Professional Knowledge Sharing 10
Resource pooling: System resources are served as a pool of multiple consumers through multi-
tenant model assigned on-demand.
Rapid elasticity: Resources can be elastically provisioned and released to allow scale rapidly
outward and inward, all on-demand.
Measured service: Cloud systems automatically control and optimize resource use by leveraging a
metering capability at some level of abstraction appropriate to the type of service.
Differences Between Traditional Data Centers and Cloud Data Centers
Aging Infrastructure
High Maintenance Cost
High Risk
No strategic focus
Siloed infrastructure
End-of-service systems
Complex Support
Support Business demand
Rollout applications faster
Improve performance
Deliver better end-user experience
Minimizing risk
High available
Ready to go
Fit in strict project timelines
2016 EMC Proven Professional Knowledge Sharing 11
What Big Data can do for a Nation?7
After having this basic presentation about Big Data, Cloud Computing and the ways data can be
processed let’s see how that can help a Nation.
For any leader data accuracy leads to more confident decision-making and better decisions mean
greater efficiency, making the right investment and taking proactive action against future problems.
Analysis of information can bring correlations. For instance, in the health system it can be of use to
prevent diseases, on the security sector, to combat crime, and in education, to provide detailed
information on the discipline applied, reviews to prepare the next generation of future leaders where
they will already have had the education needed to perform their activities with a sense of public
welfare.
Computer Science for the society
Computer Science can act as an engine for eradicating poverty and improving the quality of life in
terms of better homes, strong education outcomes, and quality health. The only requirement is to
make good use of information, build upon that project and simulation through Big Data analytics
solutions and results. A list of fields and examples from other nations where Big Data can help on
the analysis and build most accurate reports for better administration are shown below.
This document will focus on Brazilian Nation Macro Statistics and Geographic Information.
Before we apply Computer Science and Big Data analysis, let’s look at some macro information
about the Brazilian territory division and geographic information. Going through the entire article you
will be able to find areas where Big Data is already in use and see many more that can have it
deployed. The interesting bit is that they all correlate in one way or another and all interact together.
Administrative division8: 1 Federation Unit Institution that rules over states units 26 states total managed by the Federal Unit + Federal district 5 570 Cities
Geographic Information
8 515 767,049 km² Total size area
7 491 km of marine coast
200+ Million habitants
4x Time Zones
6x Weather types (Equatorial, subtropical, tropical, semidry, tropical Atlantic and tropical of Altitude)
8x main Vegetation types all they linked with weather in the area
4x main Soil types
2016 EMC Proven Professional Knowledge Sharing 12
Solutions Overview9
Economy Advanced Data Analysis can help make decisions in real-time with efficiency, performance, and
scalability. It allows achieving conformity, reduction of penalties, to be more efficient, to be more
understanding of markets, to predict future
performance, and most importantly, to be ready to
take care of data diversity and increase loyalty and
investments.
Education Big Data and Cloud Computing Services can bring
Education to a completely new model combining
educational curricula built based on the top aspects
of students’ profile, great classroom planning, and
studies, discipline. Furthermore, information
needed to review results and give high-quality education to everybody will be handy for everyone.
Security
Big Data will provide real-time camera images that can be analyzed and combined with police
records to enable an efficient anti-crime police ready to operate and catch the criminals. Cloud
Computing can bring a unique high performance platform of service to the entire police department
across the country that is scalable. It also offers analysis for the army, air force and navy on all the
territory security preventing unwanted situations.
Health Services
In this area, Big Data can manage increasing growth of patients’ data. This will allow the processing
of massive data information in real-time to combine and detect fraud in the public health service,
keeping conformity, accelerating consults based on the reason of current speed, reports on several
sub-sectors, develop cost-reduction models, service quality improvements and obtain insight to
medical analysis. Also, Cloud Computer models can offer Application as a Service that could be
used by all hospital and clinics administered by the medical society.
Government The application of Big Data analysis helps to create a flexible infrastructure that can manage,
protect and analyze large amounts of data. This helps to successfully address situations of
conformity, budgetary limitation, and economic crises, even when times are tough and social,
economic problems and natural disasters require quick and effective solutions. Automated manual
processes consume a lot of resources and are likely to fail. However, with the use of Big Data
results can be achieved and thus improve services, empower small towns and reduce costs.
2016 EMC Proven Professional Knowledge Sharing 13
Infrastructure Here, Big Data will help to understand and analyze the actual needs of public transport, electricity
generation, disaster management, etc. Learning from problems will help develop a new strategy for
the future with the lowest latency over expected and unexpected circumstances.
Solutions: Overall view Economy Brazil is the second largest economy in Latin America and 7th in the world. Brazil’s economy is a Mix economy with huge natural resources. Economic active sectors are: agricultural, livestock, mining, manufacturing and services. Main export products per area are: Agriculture: coffee, orange juice, soybeans, ethanol from sugar cane. Livestock: Beef, Chicken, and Pork Manufacturing: aircraft, electrical equipment, automobiles, textiles, footwear. Mining: iron, ore, and steel.
Agriculture10 One of Brazil’s main financial bases is
agriculture production. Five years ago, an
Agriculture company called UTEVA was
reaping corn, soy, beans and wheat but on
every season, their technicians started to
collect more than just crops. They also
started to collect data information in their
company-owned farms, totaling territory size
over 3250 hectares.
UTEVA’s operational manager informed that
they have over 30 Gigabyte of data of
reports on soil mineral analysis, harvest
mapping, rain index, physical and chemical
ground analysis ingested on occasion.
The profitability overall after data analysis changed the way business is ruled. Nowadays, they
confirm that their harvest and business decision depend on collected data. All the data evaluated
together provided the answers on how to increase productivity and efficiency in the field.
Another important use is to have the necessary information to
master control on irrigation technics, fertilizing and genetic
engineering. This important feature will allow humankind to
increase production in the fields and reduce dependency on
rain periods and soil natural characteristics.
E.g.: Air Mapping about Soil Quality Analysis
2016 EMC Proven Professional Knowledge Sharing 14
In the Agro business area, Big Data is the key to achieving great results, as the information
collected can guide any business to the correct product production and estimate growing areas for
future investment.
There are plenty of sensors spreads all over collecting air data, soil, wind, plant development, etc.,
and the level of detail grows exponentially.
There seems to be no end to Data Analysis on agriculture. Precision Plating from Monsanto
Corporation sells and supports one product that, based on data collected, can provide correct space
between the seeds and best depth to plant with 99% accuracy for each area analyzed.
Another agriculture company, Stara, is producing an appliance that scans the corn plantation. The
scanning reads the leaves and immediately can evaluate if the crop is ready. If that is not the case,
it can also provide a detailed analysis of soil nutrients, helping in the correction with fertilizers when
there is a need and, in a short period, increase the productivity to achieve the top harvesting results
for the year.
In a nation where the main economic source comes from agriculture, data on the weather as well as
the market for their products is vital for growth. Farmers and breeders must optimize yield, reduce
waste, maintain food safety, and understand the environmental impact, supplier interaction, and
product delivery.
Livestock11 Brazil has an estimated flock of over 205 million heads
constantly growing. Production increased by 25% per
hectare over the last 10 years. Beef production increased
by 38% and exports by 731%. Unfortunately, even though
they use high technology combined with integration of
livestock-agriculture-forestry, pasture area decreased by
2%. This event resulted in some missing data collection of
field distribution.
In 2010, 80% of nearly 40 million heads slaughtered went
straight to internal market giving a per capita consumption of 37,4 kg and 20% exported to over 180
countries.
The beef production process comprises a wide variety of
stages that includes highly capitalized farmers and small
producer of meatpackers’ plants with the high
technological standard.
They are fully able to meet all external demands, as the
slaughterhouses need to meet the health legislation
requirements.
2016 EMC Proven Professional Knowledge Sharing 15
Types of beef: Lean beef, grain- fed, certified, grass-fed, and marbled beef.
Processing plants in several countries: Brazil, Argentina, Uruguay, Paraguay, Chile, United States,
Australia, United Kingdom, France, Netherlands, Italy, and China.
The Brazilian modern beef industry is responsible for over US$ 5 Billion exports and one million
jobs.
The amount of data that requires processing in this subject
is beyond any reference. Areas where Big Data analysis
can improve efficiencies:
In supply chains, items can be individually tracked.
Broadcasters can analyze how viewers react to shows on
social media.
Retailers can build customer demographics by collecting details and looking at them on a large
scale.
Animal monitoring, using collars that can monitor cows’ activity and detect changes in their behavior
(Remember, we are talking about 205 million heads if all get registered in the system). The
information collected from the collars could be used as a trigger that can alert farmers when cows
will yield the most of milk or vet requirement.
The analysis of these data can lead to incredible farm productivity boost on productivity and
completely change the traditional industry. Project and data analysis are key factors that contribute
to profit as they are present in every step of production thus contributing to efficiency and
consequent overall growth.
A cup of joy means higher productivity!12
According to Ron Shani, CEO of AKOL (Agricultural Knowledge Online), their Big Data Platform
allows farmers to know exactly what to do to take care of the harvest, when and how to do it, to
extract the best out of their fields even though what they need to do is just drink a cup of coffee in
the morning.
From data analysis performed with farmers in Serbia, they
noticed a clear relation defined between drinking coffee and rural
production. All the farmers that did not drink coffee in the
morning were not as productive as those who drank a cup of
coffee before starting their daily chores.
Chinese Authorities signed a contract with AKOL to start using a
technology called “Cloud Agriculture” for farmed fish. AKOL’s
system can provide analysis to fish farmers’ on when to clean
the lakes, feed the fish, and the amount of food through several sensors spread over the lakes.
AKOL systems are already operating on grape vineyards, crops, farmed fishing, chicken industry,
livestock, apiculture, milk production and in numerous kibbutz and moshav agricultural platforms.
2016 EMC Proven Professional Knowledge Sharing 16
The sensors are distributed and installed in trees, vineyards, fields, cows’ collars, milk extraction
systems, automated food systems and so on. All data is registered including environment
temperature, humidity, animal food consumption, soil, and plantation including “Integrated Smart
Pest Control”.
One example that applies to dairy producers, analysis can recommend liquid ingestion to the cows
during a very hot day to guarantee high milk production or offer special food supplements to prevent
pest or bacterial disease identified in the region. All the products are based on Microsoft Azure and
offered on SaaS Model (Software as a Service).
Big Data not only offers a raw massive productivity boost but also touches on cultural questions,
analyzing workers’ behavior for each area. After this analysis, it can offer a recommendation on the
satisfaction level of the workers to give them a cup of coffee before they go to work in the field.
Previously, analysis at that level was not possible in such a detailed manner with results based on
the cultural information. Only Big Data algorithms can process the cultural information as well.
The idea behind AKOL is simplification where possible and fixed price as a service so it is not
directed to big companies but to all producers as the application was created for smartphones, thus
simplifying access. All data collection is processed in the cloud systems. Then results are forwarded
to the farmers.
Access to accurate information is incredibly useful not only for the producers but to the entire export
mechanism, especially to the European Union. As agro-business became a global business there
are several questions and concerns about the source of commodities sent from remote locations. All
normative about the amount of pesticides used on the production, kind of fertilizers, kind of food and
plants given to animals, milking model used, storage of eggs for chicken products, including country
laws about slavery workers need to be available on few clicks.
The system also includes full compliance with Global Agricultural Practices (GAP) specific methods
which, when applied to agriculture, create food for consumers or further processing that is safe and
wholesome. Each product receives an ID card that contains detailed history of production from
genetic details till its delivery through all the process with documentation at each stage.
Large Agro-Business companies state that Big Data will certainly be responsible for the next
Agriculture Revolution. Assistance on data processing of all information collected will be the ground
where this revolution will take place
Other National Production
Brazil has also appeared to be a potential player in the Oil & Gas
industry in the future. As natural reserves for Oil & Gas worldwide
have their countdown running, the news that Brazil Oil & Gas under
the sea could be bigger than Russia’s shows a new picture for the
future. However, there is a big problem and Big Data is the new
technology available to solve it. Several local universities are
dedicating their analytical scientists to develop analyzing programs
in order to analyze tons of data about soil on 3D models.
2016 EMC Proven Professional Knowledge Sharing 17
Economic growth is not based only on product extraction, sale and profits It is much more complex
than that and it also requires constant researchers and efficient production with the lowest waste,
delivering the right product to the right place at the right time. The normal analytic calculation cannot
take all items involved in this formula analyzed correctly nor combine historical information from last
decade. Data information involves other sets of complex information such as rain period and
quantification, mineral and fertilizer supply, temperature best growth period, great logistic, even
potential prediction on natural disasters.
Education
13
Big Data and Cloud working as a service for
Education
Do you want to know how a student studies at home?
What is their preference? How he/she learns? In the
2014 World Cup, Germany won the championship and
everybody asked what the formula for success was.
The answer was simple: …combine long-term
planning, discipline, good players and lots of
information about what happens on the field
The same formula applies to Education. The only thing needed is to combine good teachers,
dedicated students, great classroom planning and studies, discipline and above all, lots of
information to achieve the results as expected with high quality. The question is: how to collect this
information?
The new customized learning platforms
built on Cloud Design allows teachers to
have all their material organized. An
example of this is that there is no longer
a need to have flash drives. Most of the
materials used are uploaded to the
cloud. Homework is self-corrected
online. From student interaction with
online material teachers may evaluate
student behavior online, measure
content accessed and material involved,
if questions received answers and the
instructions to get them answered.
Moreover, evaluation of teachers is possible as well, assessing if the content presented meets the
criteria and if the dialogue among the students meets expectation. All concerns about what teachers
offer and how this can affect the learning process and the activities performed by the students can
all be subject of Big Data. This is Big Data serving Education.
2016 EMC Proven Professional Knowledge Sharing 18
Big Data makes it possible to understand students’ aspirations. The hybrid teaching service where
students can have a mix of an in-seat classroom with online remote sessions is growing
exponentially in Brazil and worldwide. It allows a collaborative learning scenario where
customization is possible and it is open to receive feeds in a number of new ways.
Based on the information collected, we can learn and answer several questions such as: Did
students learn? Did the material presented interest the students? What are the areas that need
more explanation? Analyzing data stored previously will make it easy to see all the differences in
the learning process and create action plans. All this opens a wide range of opportunities that with
the proper tools will help to develop new education standards and also customize learning
processes.
The value of Big Data in the Education market Researchers show that the correct use of data analysis can provide a huge benefit to teachers,
government, and students. A university adopted a system to work on the proactive side rather than
the reactive, called Early Warning System. It collected several variables from the students such as
academic history, grade achievement, class attendance, homework, time taken for correction,
internal electronic academic material consumption, etc.
To their surprise, they found a pattern of behavior that warned them of student success or failure in
the course taken. They came up with an action plan that included an advice program aimed to help
students on different ways to study to achieve success. Not only did these benefit students as they
succeeded in their courses, it increased the number of students per teacher and university’s
financial boost as well.
On another research, students with low grades were also the students that had the worst team
relationship and the worst grade markers had a relationship with other low-grade markers.
Meanwhile, students with high marks had a much higher relationship with classmates. From these
results, teachers could work to promote new relationship interactions between the high mark
students with low-grade students.
Not all researchers have reached a
conclusion on behavior or root causes but
they brought to attention some particular
situations. For example, another school that
had online e-books collected data about how
many times e-books received access, how
many pages were read, if pages were skipped
randomly, how many times students went
back and forth and if there marks in the text.
This is interesting. For some unknown reason
still under investigation, all the students that
marked texts in the e-books got the worst
grade score. From our own experience of life,
we have always assumed that the more
someone marked a text, the smarter this person was. However, that was not the case here. So, the
2016 EMC Proven Professional Knowledge Sharing 19
questions one may ask are: is the material not good enough? Or, is the order of the chapters
affecting the understanding?
Big Data’s new way of Education analysis is building a new educative model and breaking down
old-fashioned models. This is a completely new market for education not only in Brazil but
worldwide.
EXPECTED EDUCATIONAL CHANGES TILL 2020 WORLDWIDE!
2016 EMC Proven Professional Knowledge Sharing 20
Adaptive Learning14 Some US companies specialized in data analysis and created the auto-adaptive method of
education based on student profile. The engine behind it is based on student analysis and provision
of content in different ways as student advance on the discipline. This model is called Adaptive
Learning and it applies to almost of all disciplines.
Adaptive Learning and Data Mining is converging into one common point where it will be possible to
explore the correlation between learning and content. To facilitate adaptability of users, content will
be generated in pieces on different formats like text, video, and audio.
To understand what is happening a US company,
Knewton Software, is following individuals as they
improve and comparing them to other students of
the same discipline, semester after semester. The
pattern of the compared results can measure levels
of difficulty, format, interaction type, teachers
involved, classroom type, etc., and build specific
conclusions.
An example of the result could be described like this: student X analysis could show that he can
obtain higher productivity when the subject studied regardless of the difficulty is presented during
morning classes through videos with teachers who had
recently graduated. The point that can be applied is that
content will be automatically adapted to the student’s
profile. All this information came from the analysis of the
data gathered from the study conducted on the students.
Another example was student Y having difficulties with discipline Z during his third year and had bad
grades on final exam W. That analysis can allow teachers to improve their effort on the specific
subject before the third year. Actually, the software will generate reports with alerts and warnings for
every calculated pattern.
Society (Social livelihood)15 Big Data for improved diagnosis of social conditions
In any Nation, distribution of wealth points out the poverty macro view. However, it is almost
impossible to assess and analyze real distribution. The following case is based on an example of
the Brazilian society. The government presented a distribution of wealth in the country that showed
10% of the people were rich and 90% lived in different levels of poverty. These reports are created
using nationally representative household surveys, which require labor and time and conducted
after long periods, sometimes every 4 years. To fully understand the distribution of wealth, the first
step is to have the current real distribution of poverty maps.
2016 EMC Proven Professional Knowledge Sharing 21
To analyze wealth distribution in society, we can make use of some known information such as:
> A set of income thresholds that can vary by family size and composition
> An income-based method created by some government programs to help people with low
incomes
> A consumption-based method to measure what households actually spend
Moreover, there is also a new model that helps to evaluate society based on the mobile phone’s
accessibility and use that can create a large volume of data on social interactions, mobility, and
more. Using that information correlating to other models, governments can deploy more accurate
actions based on real society conditions.
The power of mobile phone call data records (CDRs)16 is an immeasurable source of information. CDR allows a view of the communication and mobility patterns of people at an unprecedented scale. Such maps can facilitate improved diagnosis of poverty and also assist public planners in initiatives with appropriate interventions, specifically at the decentralized level, to conduct human poverty eradication ensuring as a consequence higher quality of life. Big Data analysis can also include gender, the urban/rural gap, or ethnic/social divisions. The accuracy of the poverty maps can assist in policy planning for inclusive and sustained growth of all sections of society. As mentioned earlier, analysis can get much more accurate as more socio-economic indicators are inserted for the analysis. Mapping call
data records, mobility, and economic activity is just the beginning of the data collection.
What is the gain with Big Data for Society?17 The high deployment speed of Big Data solutions and all its implication are directly related to the benefits it produces for the modern society that uses Data Analysis. Research conducted by Economist Intelligent Unit in 2012 with business executives that had annual profit above U$ 500 million, confirmed that 70% of these executives reached their objectives based on Data Analysis while 45% believed that they could get even greater results if they had more access to Big Data. If the analytic applications used on corporate business have shown extraordinary results, the public sector in all levels of governments can bring excellent results too. According to a US report, the public health system estimates a gain of over US$ 300 million annually. Based on that, a governmental initiative worth to be mentioned is the Massachusetts Big Data Initiative. The mission for the Massachusetts State project is to become the world leader in Big Data innovative solutions. Two years after its creation, the report presented is impressive and inspiring. Within 2 years with high public power incentives, the initiative involved 500+ companies in all stages from startups to corporations that work on several areas such as Application, Analytical Tools, Data System management, etc., also on vertical business solution (e.g. manufacturing, energy, retailer, etc.)
2016 EMC Proven Professional Knowledge Sharing 22
Over 200 million US$ were applied in education for researchers and teaching of Big Data training courses and from that over 5600 professionals are being educated to implement analytical solutions that improve productivity and assertiveness on company decisions and public institutions. The State project scopes have stimulated private investment in companies in the State to help to create a virtuous circle and achieve the project mission. In Brazil, the gold rush “AKA, data rush”, to analyze the past, predict the future and take better decisions to improve business processes or increase public investment return is just starting. The results obtained so far are clearly about Big Data’s value for society.
Creating an Intelligent Nation, Community By Community (From I-Canada)18 Intelligent Community Development Plan includes:
A collaboration and process framework and governance model Core i-community platforms Benchmarks Sustainability planning Measurement models to track progress and returns on investment. Meeting Community Goals
On every community, different priorities can come up when the transformation is an ongoing process. To become an Intelligent Community i-Community proposes a few programs that focus on the community’s inherent skills and advantages. Some communities like Singapore have a strong manufacturing sector and logistics is important to them. Others such as Stratford in Canada are building new digital media capacity based on their heritage as a creative theatrical community. Each should have a design and application of Intelligent Community plan, flexible but with a target defined.
Culture19 The Cultural sector can join Big Data as well. At the moment
in the UK, there is the audiencefinder.org web page that is a
free national audience data and development tool that
assists a cultural organization to understand, compare and
apply audience insight. This web page and a few others
have an approach based on cultural policy to aggregate
information about cultural consumer behavior as well as the allocation of public funding and
measurement impacts. The value is a two-way exchange, to be regular and honest between funder
and funded.
2016 EMC Proven Professional Knowledge Sharing 23
The tools used in the analysis compare behavior like demographic characteristics. The idea is to
use some element of big data-type approaches to rethinking components of traditional decision-
making and data-driven approaches to drive insight and change behavior.
Big Data comes into place analyzing the social networks commentary about the performance to
show the connections made about the subject, public achieved, etc. The cross-reference about the
show analysis needs to bring data scientists from other non-cultural fields into the sector to explore
the needs and build the capacity.
That analysis benefits not only a performance show but also artists like painters or writers that need
their work released and gets a feedback on it.
Few people in the sector would say that data analysis and art can’t or even shouldn't be mixed.
Around the world, artists are working with data in amazing ways and people are using dashboards
to control their social media feeds or popularity decrease. Assuming that the foundation of the raw
materials is strong and the analytics robust, data-driven decision-making could be a key element to
increasing artistic impact and commercial resilience both for individual organizations and the sector
as a whole.
Experience, creativity, and necessity are a powerful combination. Looking for new ways, such as
sentiment or semantic analysis, to measure aspects of artistic impact could also be an important
new tool for the cultural sector. The measurability of everyday life is growing at an amazing rate.
The developing expectations of audiences for personalization and the levels of service provided by
digital companies such as Amazon demonstrate that data is already a key tool for the cultural
sector.
For the Rio2016 Olympic Games there is an online portal available to register popular artists and
literature projects to give exposure to new writers and poets from the periphery.
The registration will allow the organization team called “Celebra“ to map and distribute the artists
and traditional festivals, with typical Brazilian foods from various regions during the Games so
visitors can have much more interaction to the mixture of culture that forms this nation.
Security and National Defense20 Security Public Sector is seeking new efficient technologies worldwide to support their operations.
All institutions that work in this area have one or more complex structures dedicated to Intelligence
Agencies.
Currently, investigation schemes are divided into activities to receive, store, and process data
information from all sort of sources (structured and unstructured data), such as individual National
Registration, social network profiles, and vehicles, audio files to combine them and offer accurate
results.
2016 EMC Proven Professional Knowledge Sharing 24
To increase complexity to this huge Data Lake, it is necessary to understand the link between them
and find how they are geographically distributed. To achieve all this data analysis and obtain
efficient results, Intelligence Agencies are working with powerful Big Data tools that enable all the
information processing.
From fiction movie, “Minority Report”(2002), into
reality of security defense, Big Data is the new
Technology to help police departments from small
to large cities, states, and countries to match
individual behavior and prevent or at least provide
proactive analysis that police can use against
crimes that can most likely happen in that area.
At the moment, there is a trial solution in place used
by London Police Department where a large volume
of data available on the Internet is the target for
public security investigation. The solution is so real
that is bringing up several concerns about limits on
data collection and ethnic use of monitoring systems
by authorities to avoid misuse of the information.
Accenture developed the solution used by the police department. They get information from
Facebook pages of individuals and other data from various applications to match information and
generate a report on risk assessment.
According to the public security chief from Accenture, the software does not predict actions. It just
directs analysis and presents results of people with a high-risk combination. It is a tool designed to
help Police more efficiently.
The tests ran used data collected during 20 weeks and related to 5 years of historical information.
On this test, the software combined information from 32 London criminal groups over 4 years and
generated the probability that those people could commit new crimes. The results were compared to
the incidence of crimes during the 5th year to evaluate software precision. It proved to be efficient.
Big Data analysis cannot reduce crime by itself, and should be used carefully to prevent misuse of
power by the authorities. For example, a misuse of information can be unfair when making targets
of a given group of people, classifying them as potential criminals, says the director of ONG Big
Brother Watch who fights for civilian freedom protection.
Image from Web site: http://www.thewrap.com/steven-spielberg-hiring-
godzilla-writer-for-minority-report-tv-series-exclusive/
2016 EMC Proven Professional Knowledge Sharing 25
In Brazil, an agreement signed between Secretary of
Public Security from Sao Paulo state and Microsoft
promised that over this coming year they will make
use of Data Analysis and help police by warning
them for criminal pattern matches. Similar solutions
are offered to Spain and Singapore; Spanish police
force is using applications to identify areas with
potential crimes while in Singapore the software is
generating reports based on video monitoring of
crowded places, street traffic, and regional
commemorative events.
The company PredPol developed Intelligent Crime
Mapping software which is in use in 12 US cities, UK
and Uruguay. Making use of high database
information, it estimates days and hours that crimes are most likely to happen in some areas.
According to an FBI report, the experiments are successful. For example in Santa Cruz the system
helped local police reduce 19% of assaults and 24 in the act crimes in the areas indicated on the
map.
Until now, traditional crime mapping could only point out areas where issues occurred in the past
while PredPol can make use of historical data and new information analysis to build a potential
mapping for future criminal actions, according to researcher Jeff Brantingham, University of
California Los Angeles and co-founder of the tool. The PredPol system can predict around 200%
more crimes than any current method used by police according to reports from Los Angeles and
Kent, England.
Karin Breitman, from EMC R&D in Rio de Janeiro, says that data prediction is not something new. It
has been in use by the private sector for a few years but now due price reduction and much more
capacity on data collection and processing, data analysis is expanding exponentially on several
public sectors like security.
The processing and combination of massive data information through mathematic algorithms can
generate patterns on people moving and point out potential risk zones.
According to Carlos Tunes, IBM Brazil Big Data Executive, the use of this new technology in the
security field is helping authorities work with data and dynamically generate real-time reports based
on video camera images, social network and several sensors from IoT devices. The solution is able
to look at someone and evaluate not that specific circumstance but a whole set of information in a
context like the 2016 Olympic Games where data analysis showed its power on preventive and
reactive measures.
At the beginning of this article, one can read “With Great Power Comes Great responsibility”. This
sentence comes from a manager from Technology Society Center – FGV-Rio University. Brazil is
facing a growing public security sector privatization and a lack of basic principles on the limits of
citizen surveillance which is a bad mixture when comes to Big Data analysis. According to this
manager, a Big Data solution can bring some surveillance assistance that benefits society, but for
that to happen this monitoring must have limits proportional to its requirement on real cases where it
is necessary. It doesn’t seem correct to monitor the entire society to catch a group of individuals.
2016 EMC Proven Professional Knowledge Sharing 26
In Sao Paulo city, a Domain Awareness System called locally as “Detecta”, will integrate with
several databases from several Public Security sectors to index them and classify on predefined
models. As the security surveillance cameras will also be integrated into the system, it will allow
sending warning alerts based on the criminal patterns identified. That will allow a better distribution
of police among the city on areas where criminal situations are most likely to happen, says Glauco
Carvalho, commander from Sao Paulo Military Police.
At the moment, the system contains data from Transit Department and Military Police Department.
The commander gives an example where they can make a basic search for someone in the
“Detecta” system and it will point out all vehicles registered by that person and any criminal relation
that this person had in the past, including a photo if already registered by the police. Another
interesting alert, already in place, is related to abnormal moves, for example if a large number of
people run to some direction it immediately sends an alert to the Operational Center of the Police
Department that can look in the cameras around the area to validate potential police requirement.
Security in the Air21
There is a huge development on systems and devices for the aerospace sector with IoT and Big
Data. This allows monitoring and data analysis of satellites, airplanes and any flying device that
could elaborate on potential failure or attack reports based on software and hardware data
collection historically and real time.
Several countries are already studying and building automated airplanes, which will fly routes
predefined with no pilot totally managed by a computer system. Sounds crazy but it is proved that in
the last decades almost all the civilian airplane accidents were caused by a wrong human reaction
to the circumstances. These wrong reactions occur on all levels from parts exchange when sensors
alerted to replace parts that were neglected by the mechanical teams, to wrong manual command
changes by pilots.
How to guarantee the Security Information about individuals is not stolen or misused? Continuous Auditing Mechanisms is needed on those who can access government data and fully
monitoring activity those auditing reports can be analyzed and related to a potential malicious
context on external information. Non-authorized travel, credit fluctuation classification changes, new
startup companies investments, etc…, all these data collections could work for auditing big data
analysis to identify risks on the individuals that handle others personal information.
The Splunk22 system can monitor all these kind of IT data, matching the abnormal behavior and
correlating to other on-demand external sources of data inside or outside of an office. It not only can
point out the potential malicious action from an individual but also help to differentiate an accidental
policy violation.
2016 EMC Proven Professional Knowledge Sharing 27
Health Services23 Taking care of people is highly intensive and involves a huge amount of variables. Imagine
monitoring hospitals, doctors, clinics, etc. Sao Paulo city alone has 12 Million inhabitants, 42x
Hospitals and 120+ thousand Doctors registered. Now if we look the entire state of Sao Paulo, the
numbers get bigger; 645 cities, 42+ Million inhabitants and 881x Hospitals (Data extracted from
2011 census done by the Secretary of Health Services from Sao Paulo state).
The health system in Brazil is divided into 2 main
streams; the Public Service and the Private Service.
Let’s concentrate on the Public Health Service24 to
show the Big Data and Cloud possible application. To
control all the expense of Public Health Service and
apply analysis will first require a collection and
registration of all the medical institutions in a single
portal. From this registration each patient that uses the
Public Health system will automatically appear in the
system with data/hour of use, and if any procedure is
needed or medical product used it can be linked to the
patient record. This will be a huge relational database and from a macro perspective, this
information can clearly indicate real product consumption. Furthermore if it is compared to the
invoices of hospital purchase it will work as an anti-fraud tool in the health service.
The data collected is not limited to the patient and resources consumption but to the people
involved like nurses, doctors, cases attended, specialization required, time of the high volume of
people requesting assistance, areas where more doctors should be allocated, etc… The
combinations are just incredible.
The use of Big Data is bringing a full IT
transformation analytics by improving
patient care, eliminating waste and
coordinating treatment and care plans.
These changes can bring new
information about the Health System
administration, transparency, and
responsibilities from the health services
providers as well as important data from
patients for researchers on diseases.
The 3 main areas of Health Service Big data can work:
Precision Medicine
Electronic patients medical records
Internet Of Things
2016 EMC Proven Professional Knowledge Sharing 28
Precision Medicine
Most scientific knowledge is still based on large averages.
For example, a recent analysis of strokes cases showed
that use of new oral anticoagulants reduces the risk of
strokes and systemic embolic events by 19%. The
average doesn’t say that the risks were lowered to 19%, it
says some people had risks reduced 100% (did not have
the stroke) while 0% had the stroke.
That means the new oral anticoagulants lowered the
chance of strokes in the population as whole but do not
show to whom it worked and no other information if there
is something combined that produced the results. In the
tests taken, in a group of nearly 30 thousand patients almost a thousand had a stroke even with the
medication.
Who are the people for whom it did not work? Which other cause could be related? Maybe they
were women above 60 or have a particular ethnic background or had smoked the whole life or lived
in an area where the air could be contaminated by an industrial chemical product, etc… The reality
is they don’t know.
The precision medicine objective is to have much more accurate information with a full registration
of the patient who will take the medicine and compare it to other databases like doctor records in an
attempt to find patterns. That will not bring 100% precision but it can double the efficiency in the
analysis and the results of lives saved can increase a lot.
Electronic patient medical records
It is critically important that clinicians, staff, and patients have information, tools, and resources at
their fingertips at all points of care. Unfortunately, the
patient records in several countries like Brazil are still far
from the ideal model. At the moment, there are patient
records for each unit completely isolated from others, in
some cases they are still under paper registration. Like
paper registration, the records are very hard to get
updated or transferred or easily comprehended.
Furthermore, these records are badly stored due lack of
proper management.
There is a strong movement to unify the patient record and scan all the current paper records to
make all them electronically searchable. In big cities, they are already using online application that
can be accessed by other health services which makes the consulting much faster as they have a
precise data history handy.
The idea to have the record unified and used all over the country will result in a phenomenal
improvement in the entire health system. It should reduce time taken to fill up the forms and queue
2016 EMC Proven Professional Knowledge Sharing 29
on consulting also will help in disease research and preventive diagnoses. The online unified patient
record is already in use in the UK and confirms the real benefits mentioned here.
Internet Of Things (IoT)
This probably is the most expected item in Big Data world,
as the collection of information come from almost everything
allowing precise analysis combined together. In the health
system, the possibilities are incredible. To a medical
caregiver, a patient's vitals and behaviors may be constantly
monitored, which increases the effectiveness and efficiency
of treatment. Another example is wearable devices that can
report real time heart analysis and in the case of a potential
incident, can alert the person and direct them to the nearest
hospital.
In the epidemiologic field, the analysis can identify diseases at early stages based on patterns or
maybe potential vectors of viruses and areas affected within a very short period alerting local
authorities to take immediate action.
Machine learning
In the Data Analytical area there is already a growing segment to create algorithms that can
automatically calculate data and upon result take next actions without human intervention.
A current example of Machine Learning is the decision trees that can be used when variables
depend on classification tree or regression tree. In Brazil, they performed analysis to predict the
reason a town could have Infant Mortality Rate (IMR) below the National rate average. (14,7 deaths
for each 1000 born alive)
Two extra variables added were prenatal consults above 7 and illiteracy rate, both from the year
2010. The period selected from 2008 to 2012.
For this analyze on regression tree they used rpart from R. Below you will see the source-code
used.
ML <- read.csv("https://sites.google.com/site/alexandrechiave/mlexemplo/mlexemplo.csv")
IMR <- ML$IMR
IMR[IMR==0] <- "IMR below"
IMR[IMR==1] <- "IMR above"
prenatal<- ML$prenatal
illiteracy <- ML$illiteracy
install.packages ("rpart")
install.packages ("rpart.plot")
library ("rpart")
library("rpart.plot")
2016 EMC Proven Professional Knowledge Sharing 30
model.rpart <- rpart (IMR ~ prenatal + illiteracy)
rpart.plot (model.rpart, type=0, extra=2, varlen=10)
png ("IMR.png")
rpart.plot (model.rpart, type=0, extra=2, varlen=10)
graphics.off()
Without human intervention, the algorithm identified 2 predictive points also known as the nodes of
the tree.
Proportion of women with more than 7 prenatal consults above or 67%
Illiteracy rate lower than 8.1%
The graphic on below shows that the algorithm making use of the two variables identified the correct
position of towns where the National average is 64.9% of cases (3.610 from 5.565).
The most popular machine learning methodology still shows some limitations, due to the over-fitting
and the possible increase in the number of spurious associations. Anyway, the scientists expect to
have it solved in the near future.
Large Amount of Collected Data ≠ Right Collected Data
Determining the difference of the importance in quantity and quality of data is not an easy task.
From this statement, it is possible to divide it into 3 groups:
Group of Individuals without statistics knowledge
Group of Individuals with low statistics knowledge
Group of Individuals that works with statistics
The first group believes that the solution to research problems is to increase the number of data
collected (usually they believe in an election that the mistakes during the previews were caused due
a low number of people evaluated).
2016 EMC Proven Professional Knowledge Sharing 31
The second group believes opposite of the Big Data approach. They think that a large amount of
data causes incorrect analysis results due to sampling problems.
The third group dealt with biased samples which always occupied a good part of the scientists’
time.
It is true that Big Data results may not represent the reality of population due to the fact that the data
sampling may not come from all the population layers. For example, data from smartphones and
wearable devices will come in great part from medium to high economic class of individuals. The
same may occur in the medical records as not all the health professionals will have the knowledge
to use it.
Some traditional methodologies are being incorporated into big data in an attempt to solve the
sampling issues. One of them is to add an extra value according to the individual representation in
the population evaluated.
Science and researchers “If the bee disappeared off the surface of the globe then man would only have four years of life left. No more bees, no more pollination, no more plants, no more animals, no more man. If the bee disappears from the surface of the earth, man would have no more than four years to live” - Albert Einstein
Big Data visualization reveals several world changes on animal and insect behavior. Scientists are
aware of climate changes and know there are several changes in the environment but some of them
are so dramatic – like birds migration changes – that it is creating some concerns like: Is migration
increasing? Is temperature the root cause? Or rain? Are there correlations that point to climate
change?
The power of Big Data and the new models of analysis will allow scientists and researchers to
provide accurate information on global warming, diseases dissipation, deforestation and plague
occurrences on a level that everyone could understand. Anyone will be able to join the fight to
prevent bad things from happening and make a difference.
Current isolated researchers models are inefficient while some things are highly analyzed others are
left behind and they just might be all related. The idea behind new technologies is to provide
government with tools to build an online catalog where all universities that perform research
programs could upload information. Authorities can study this information thoroughly and as a
result, faster investment applied in the area.
Suppose there are four education centers studying a special disease that is spreading all over the
country. They could join the research team members and distribute among them the investment for
that research. There is a scenario like that in Brazil, started at the end of 2015; Brazilian authorities
reported a high dissemination of pathogen Zika Virus all over the country. However, lack of
integration of health system and research centers resulted in a massive delay on how to isolate the
epidemic chaos.
2016 EMC Proven Professional Knowledge Sharing 32
In this example mentioned above, let us emulate the use of Big Data and new IT technologies
available like Cloud Computing. Using Cloud technology a unique system managed by the
government health system could allow Hospitals all over the country to register patients with their
symptoms and data collections could trigger alerts to local authorities of the high volume of cases
matching the specific symptoms. Having that information immediately available can effectively allow
authorities to take decisions quickly and the application of resources where it is needed as
immediate prevention while others could be directed to universities matching studies on the area.
Sounds like basic and simple information but if other information could be ingested in the system –
like epidemic cities size, life quality, economic information, travel flow mapped – it could provide a
rich and detailed report that several points could be covered by the preventive system like
increasing garbage collection on the area where most of the cases were identified, immediate public
system advised to deliver leaflets containing a mosquito combat action plan, etc. While mosquito
growth takes place within few days, Big Data analytic reports would take hours, resulting in a full
proactive action plan to combat the epidemic at very earlier stages.
Big data in biomedicine
Genomic data translated into treatment. Nowadays scientists are putting together a massive high
volume of information coming from genomic sequence projects, patient records and research in the
laboratory. This data brings a new era of technology and medicine alignment called “precision
medicine“ that can deliver treatment on individual needs.
The challenges faced in the past are no longer a problem due to the new tools of analytical data that
can process millions of Gigabytes of data bringing clear answers to the medicine questions.
The ability to work in parallel with genome biomedical analysis smartphones and other wearable
devices are generating continuous flows of health data from a large number of people.
This data analysis allows a much more detailed understanding of a disease. Numerous research
organizations are assembling cloud-based 'information commons' to standardize, store and share
the data.
Politics Administration25
According to McAfee and Bryjolfsson (MCAFEE and BRYNJOLFSSON, 2012)26, decisions should
be made on data analysis and they will be the best decisions taken ever.
Making use of Big Data, administrators decisions will be based on evidence rather than intuition. It
is proved the more an organization adopts decisions based on data analysis the greater operational
and financial results. The same statement applies to any government that adopts the use of Big
Data in their administration. The results should show an incredible improvement in the country
management overall providing a solution now and simulate reactions in the future.
2016 EMC Proven Professional Knowledge Sharing 33
Infrastructure – Smart Cities
Environment
Google Earth is a powerful tool that is helping Brazilian scientists track Amazon Forest
deforestation. Unfortunately, the researchers thinking don’t seem to be wide enough as that is just a
dot in the image of what it can do.
Let’s improve this image, not only animated thin blue lines and rivulets showing which wind is
blowing around the globe. Having a closer look, orange dots point out fires while a thick haze of red
boxes can highlight poor air. Global Forrest Watch ONG designed a tool in the late 90s to map and
help track down illegal forest fires and provide up-to-date information where deforestation occurred.
The map uses satellite data to track forest change, and provide information on forest fires around
the world. It also tracks the total amount of forest cover on earth.
The focus on new researchers on medicine treatment using compound extracted from Amazon Forest is also being targeted for the analytical data where concerns are not only to the investment or profit but on an overall sustainable extraction and production with no more forest destruction. Analysis on soil, water, plant distribution, mapping areas are the key to making large-scale production keeping the minimum impact on nature.
Smart Cities27 They are Cities or communities that are making use of IT technologies and communication to improve their public services, cost reduction and improve the contact between citizens and the government.
How is that possible? Big Data Analytics is the answer The Smart City concept gained attention over the last
few years during the global urbanization. Back in
2014, 54% of the world population lived in urban
areas, the growth rate was 1.84% year-over-year till
2020 – this automatically triggers greater need for
social services.
To achieve those objectives, it is necessary to build
up mechanisms to collect data and process them to
get correct results on the same level and amount of
data created by the exponential growth of population.
2016 EMC Proven Professional Knowledge Sharing 34
Expertise in public management, engineering, architecture and urbanism make total sense over the
data generated by society. This is where Big Data Analytics come in! This is the technological way
for governments to understand, classify and make correct use of the big sets of data generated from
digital social media.
Examples of Cities already using Big Data Barcelona This is an international reference of a smart city using Big Data in several ways. Using smartphones
apps used by tourists, the city management can control the people flow by organizing police patrols
in the area, for daily routine or special times. Streets have lights/metal sensors to detect available
parking space to direct the drivers. This also helps urban mobility teams understand the patterns of
vehicle flows and parking places. There are also functioning sensors measuring air temperature,
humidity, pollution and noise detection.
Singapore In 2014, Singapore created the Smart Nation Plan, which uses Big Data to build an efficient
transport system; through sensors they can detect congestion traffics and map car position offering
better routes avoiding the affected areas through GPS. They are currently studying a new system
through GPS to charge tolls when vehicles use restricted zones. The idea is to provide exact car
location, find out distance traveled in the high-density traffic area and charge per usage. There are
approximately over a million vehicles sending position data and getting charged on a daily basis.
The system can also learn the daily vehicle route and estimate tax charges or suggest alternate
routes with different prices/time travel.
London The capital of the UK is also investing in Big Data solution for improving the public transport system.
This includes transport card data collection combined with underground utilization, maintenance
routine schedules, and people habits to evaluate routes and estimate usage.
Another very interesting use of Big Data is the 3D maps showing the cabling distribution to schedule
the maintenance estimated time and precise intervention.
Smart City concept is only beginning; the idea is to have it in every city in the world. The examples
of intelligent cities are isolated so it still not in the politicians administration plan and sometimes
appears as an add-on of government campaigns several years later.
2016 EMC Proven Professional Knowledge Sharing 35
Giant Infrastructure behind a Nation Territory that needs
powerful data analysis28
Electrical Power Generators are the main discussion point in
any society. In Europe, many countries have their power
generators based on Natural Gas Sources that comes from
Russia through a long distance pipe system. Information on
gas flow, pressure, temperature changes, failure monitoring,
etc., already is a huge volume of data to be taken care of.
Other European countries make use of nuclear power plants
and that can produce several terabytes of data just for
monitoring.
Using Brazil as an example, the main electrical power sources are water dams, also called
hydropower stations, whose analysis require combinations of weather information, as the rain is the
main important source of the dams. Back in 2014/2015 due climate changes and “El Nina” natural
phenomenon the water dams went so low that it caused a high alert with potential power cuts
through all over the country.
What to do with Big Data in this scenario?
Having data collection about electrical distribution, city population growth, rain period and water
volume changes in the water dams could provide a powerful report on how the government
regulatory organization could work proactively to prepare the electrical system to combine more
efficient electrical power sources.
Today not only Brazil but also other countries have the visibility that they need to develop new ways
to generate electricity and next power source systems. At the moment, there are several power
stations being built in farms that make use of animal waste that produces Natural Gas not used
previously. Also houses with Sun Power systems installed on the roof are helping increase
production of electricity.
The analysis made in several geographic locations based on sunlight combined with weather
showed that several cities can produce a huge amount of electricity to power industries and
businesses that are the main consumers of electricity during the day and overnight they could make
use of the hydropower stations. That method could proactively reduce water dam use during
2016 EMC Proven Professional Knowledge Sharing 36
periods with low rain estimative reports. Analysis of home, industries, and business electrical
consumption during the day and night is made possible due to the use of sensors installed on the
buildings that send real time data information to the Electrical Agency who controls the power
distribution.
Another category of a power source is Wind Farms distributed in locations where the wind flow is
measured and efficiently produces a large amount of electricity.
Every minute an Eolic turbine sensor records the wind speed and its own power output. And every
five minutes the information is dispatched to high-performance computers that could be 100 miles
away such as the one at the National Center for Atmospheric Research (NCAR). Artificial
intelligence software crunches the data from the Eolic turbine, along with data from weather
satellites, weather stations, and other wind farms. As a result, it provides wind power forecasts of
unprecedented accuracy making energy lower cost.
Smart Wind, Solar Power, Big data and artificial intelligence are producing ultra-accurate forecasts
that will make it feasible to integrate much more renewable energy into the grid.
Developing efficient electrical power generator systems from renewable sources combined with
correct distribution and innovative electrical systems can reduce and potentially totally deactivate
nuclear power plants and reduce risks of contamination.
Disaster Management29
Big Data analysis on weather is also key in catastrophe prevention and important information on
proactive measures. Sensors spread across the country and worldwide plus satellite data
information collected every minute can produce an accurate map of possible catastrophes caused
by tornados, floods, or earthquakes combined with risk areas pre-mapped that could prepare rescue
teams in advance on the workforce to attend those areas.
Analyzing Disaster Big Data to Support Disaster Prevention with Timely and Accurate
Forecasts
Modern sensors installed on several hardware appliances are constantly measuring seismic
movements and the reports are being analyzed in real-time by geological centers with highly
accurate computer calculations. Government agencies in charge of disaster prevention used to
2016 EMC Proven Professional Knowledge Sharing 37
strive to issue evacuation plans in a timely manner based on the information analyzed. Nowadays
they can make decisions, especially during the early stages of a disaster, helping countries
evacuate with advance alert warnings areas that are at risk of tsunamis.
Natural disasters have been increasing in intensity in the past few years. Rainfall precipitation
increased in some places around the globe and due to heavy rain and local torrential downpours
cause concern among several local authorities on the future disasters as they are facing serious
damage, such as floods and landslides every year during rain periods.
Example of real application of Big Data Analysis in Disaster Prevention:
Fujitsu has created a solution that can predict and estimate conditions in areas where there are no
sensors installed providing an advanced, next-generation disaster prevention solution.
The technology enables one-dimensional sensor information to be expanded into two-dimensional
information through simulations using big data on past rainfall during floods along with topographical
information.
Technology for Estimating the Occurrence of Disasters from Social Media Data
The technology uses a natural language processing technique to gather comments that include
keywords related to disasters, such as "flooding" and "inundation". By using a hearsay elimination
technique based on a probability model and machine learning, it is eliminated from the comments
collected by categorizing them into information based on sightings and observation, direct hearsay
information, and indirect hearsay information. Fujitsu analyzes comments about train stations,
crossroads, landmarks and other elements in order to estimate the specific location of the disaster
occurrence.
System for Estimating the Occurrence of Disasters
2016 EMC Proven Professional Knowledge Sharing 38
The tests taken place in Japan using real social media data during a flood from August 2012
showed a possible detection of the disaster with 80% accuracy.
Mathematical Optimization for Simulating Floods
Flood forecasting simulation technology developed by the Public Works Research Institute predicts
changes in the amount of river flow during rainfall.
The forecast program divides the country on cross-sectional 500-by-500 square meters, which
studies beyond the rain precipitation to also include soil type infiltration by rainwater and discharges
into rivers.
The technology used automatically adjusts and optimizes parameters minimizing errors between
simulated discharges and measured discharges using mathematical optimization algorithms to a
flood-forecasting simulator.
Mathematical optimization offers the best combination of parameters with a small number of
calculations. It is essential to use the optimized algorithm that best fits the simulation model.
Mathematical Optimization
2016 EMC Proven Professional Knowledge Sharing 39
Rainfall and Comparison between Actual and Simulated Discharge
Transparency vs. Corruption30
Transparency vs. corruption is a Big Problem worldwide which affects almost every Nation on all
layers of society and business.
BIG DATA cannot change corruption but it can bring much more transparency to the
political-administrative system.
How to achieve this
As described above, “corruption can occur with an office-holder or governmental employee act. Big
Data analysis application can be used in a simple design as a solution by performing real-time
monitoring of public sector individuals involved with internal and external contracts by dynamically
reviewing and analyzing local exchange and bank account operation or purchases. It also can be
integrated with other countries bank institutions making a full diagram of their business operations.
2016 EMC Proven Professional Knowledge Sharing 40
This approach needs not only analytical schema but also new law mechanisms to support the
technology and monitoring systems used.
As part of Big Data design, it can operate at several levels of corruption collecting data for much
more accurate results and auditing. For example, it can be deployed at a city level to capture all the
contracts signed by local public sector and also, to have a system to follow the contract operation,
such as a road construction.
Here is the example:
1 – A town needs a road to be built to connect two hospitals due to their specialties and patient
exchange to avoid the current congestion roads during peak hours.
2 – The city politicians create a bidding announcing the requirements for the road such as sizing,
timing to be constructed, and estimated budget available for that.
3 - Constructor Companies that have the infrastructure skills for that kind of operation apply to
attempt winning the bid for the new road construction with the lowest price.
4 – Not only the lowest price is involved in the deal but also the material used, warranty, time to
accomplish the construction, local jobs position, reports, etc.
5 – Currently in several countries the constructor companies that won the contract only get
monitored at the start and few stages pre-scheduled in the contract.
6 – From the starting time to the end of the construction, there is a lack of accurate report done by
the public team to follow the construction.
Here, a Big Data system could be used in conjunction with Constructor Companies and the local
city hall officials to report daily activities that could be recorded by CCTV public or private cameras
and be available via a few “clicks” online. Also, local citizens could watch and monitor the work
performed documenting with a mobile App linked straight to a Data Lake system updating the
delivery chronogram. On top of it, accurate reports and analysis of weather conditions or major
changes can also be recorded and provide up to date route map for the construction chronogram.
If more accurate information is required all the invoices detailing material purchased from 3rd party
companies can be available online for full matching control. Sounds crazy but computer systems
could do that through the analytical queries in Big Data. It can calculate the kind of cement used on
the mixture with sand and rubble to validate that the amount bought was delivered and used to fill
the estimated area pavement. All this data information could be available for any person within the
community to evaluate and confirm if the procedure was correct or not. It is full and clear
transparency for the society members.
Unfortunately current models are far from this new analytical model through Big Data. The leaders
of the nation are fully dependent on several layers of administrators that can corrupt themselves
accepting money for paperwork changes without a proper auditing solution. Even invoices get lost
due lack of proper archiving and accurate indexing of data.
It is impossible to stop corruption but using new models of application and Big Data analysis
systems can make corruption harder to occur.
2016 EMC Proven Professional Knowledge Sharing 41
Transparency brings more qualifications to a Nation and sequentially more economic investments to
internal industry for growth.
Open Data means more government transparency!
There are several projects related to “Open Data” based on a series of worldwide government
initiatives to attend to society l demands for transparency and efficiency in public money expenses
control. One of the most well-known initiatives is the Open Foundation Protocol which is looking
forward deploying an open standard protocol that allows efficient access to useful data for
application analysis related to public politic actions, communities actions, and operative insight
inspection, all this being done by Big Data analytics systems.
Moving towards this new initiative an example can come from the Brazilian government who is
deploying several projects to centralize Public Data to government data centers and offer consulting
access through web portals.
To achieve this new model of transparency and, security, the Public sector requires an innovative
computer system to ingest, process and archive data in high speed and also make it available all
the time. Due to the amount of data to be ingested and managed the infrastructure must be highly
scalable to store and make it available for the analytical queries. The necessary solutions that can
accommodate this request are Cloud Computing and Data Lake systems for Big Data analytics.
The result for society and government are fully positive as the time to consult the information is
reduced drastically. Furthermore it is scalable so new systems can be added to the solution as
required for the expansion. Big Data Analytical tools can work in parallel generating reports within
minutes showing a real transparency for the society and reducing waste of public money. Having
that analysis on hand can help public agents on decisions to take with high accuracy, lowest
failures, lowest unwanted expenses, prove right expenses with documentation, find breaches that
can lead to potential corruption schemes, audit every sector properly, etc.
Big Data and the Challenges for the Future
Several sectors from private and public institutions already raised a big challenge for Big Data to the
near future; privacy. The risks of confidential information been stolen and published will be more
and more real. The scientists’ awareness and more strict security measures are so far the best
approaches to this challenge. There are other methods like data encryption and exclusive data
access depending on the level of the requester to use the data that also is being studied by
scientific researchers.
Of course we will see scandals about private data content leaks due to negligence or failure in
procedure operation or maybe hacked systems but scientists should always look forward to
evaluating the risks and apply correction to these situations. The population should be informed of
the huge benefits in time, money and life saved that Big Data analytics brings to society.
Big Data seems to be in the correct scenario due to two main factors: 1) Pressure from society to
have public results published faster; 2) Affordable computer technology available for statistics
analysis;
2016 EMC Proven Professional Knowledge Sharing 42
First steps to start with Big Data Solution31
1) Identify the opportunities to apply the Big Data
a. Understand objectives
b. Show what is possible
c. Identify the right goals
d. Prioritize and choose the beginning project
2) Clear Proof of Value
a. Deliver project result in analytics
b. Show measurable valuable points
Few products available and architecture examples
Traditional IT data centers are not efficient, not scalable, and very hard to manage and support
creating a terrible layer of complexity to work with Big Data. Additionally, it has several limitation and
performance problems not allowing easy scalability.
As mentioned throughout this document, Big Data analytics alone cannot help a Nation much. It
requires infrastructure and a parallel solution like Cloud Computing to facilitate the implementation
and scalability growth for data processing and management.
Basic steps to start with
Identify which kind of data will be stored
Data classification and retention levels to be applied
Data Storage model with high availability, fault tolerance and is easy scalable
Archiving Solutions to work in parallel with the Storage models
Processing Scalable Systems to offer historical and real-time processing of Big Data
Application types to be used for high volume processing
Development Team
Development and Quality Assurance System
Production System
Support Team
Data Scientists and Analytical Teams
For the Public Sector, due to security reasons the ideal platform is a private Cloud with centralized
and highly secure measures but at the same time flexible for content management, as a different
level of the organization will have different access to the information.
For example, a classified “top security” content about agriculture production year achievements
should be processed and only top-level leaders could be aware of the results that should be stored
in separate areas with extremely limited access controlled and audited. Meanwhile, information
2016 EMC Proven Professional Knowledge Sharing 43
about weather, education and cultural content are classified as “informative“ content and should be
available to public at real-time.
Follow below a Private Cloud illustrative model of Data Center with its Big Data
infrastructure
Based on the political administration organization
model used in this article from Brazilian Nation
government (Federal > state > city) the architecture
design could be:
Federal Unit
1) The owner of the Data Center facilities where Storage and Cloud servers will reside.
2) Responsible for the infrastructure
a. Facilities Distribution
b. Electrical Power
c. Cooling System
d. Communication
e. Scalability
f. Hardware provisioning for Storage and Processing
g. Maintenance
h. Security
3) Deployment of Private Cloud Solution
4) Management of Cloud Services
Cloud Data Center Site Facility Topology distribution
2016 EMC Proven Professional Knowledge Sharing 44
The topology design example here can be used to distribute Cloud Converged Infrastructure
equipment per data center. The workload will be managed separately as each data center can
serve different users based on their physical location in the territory, like users in the north use data
center A while users from the south will use with more frequency data center D. All Data Centers
are interconnected forming a unique solution and each site can operate as Disaster Recovery from
the other or also do offline data processing based on the utilization.
Each data center has a full set of Cloud, Storage, and Big Data appliances installed running
independently from each other providing high security and resilience to data information stored.
Here is a summary of DELL EMC, VMware and Cisco products to cover all Cloud data center
infrastructures
> DELL EMC Storage - Enterprise storage-class of products to meet all Big Data application
performance, scalability, availability, data protection and security.
> VMware Virtualization - Virtualization and platform orchestration for multiple environments without
the need of additional hardware specific.
> Pivotal HD - Open source Enterprise Hadoop distribution that provides Apache Hadoop features
and Hadoop-related project features from Pivotal’s value-added extensions and analytics support
> Pivotal Big Data Suite - Full suite of integrated technologies to easily create data-driven
applications that meet any data processing and advanced analytic requirement at scale.
Full Overview
2016 EMC Proven Professional Knowledge Sharing 45
Private Cloud Model
Private cloud model infrastructure provides exclusive services to a single organization comprising
multiple units of the business. In this case, each division of government administration is considered
a business unit.
The services offered include self-service, multi-tenancy for the states and their cities, with the ability
to provide virtual machines and platforms as they are required, and changing computing resources
on-demand. It also offers Big Data products as a Service for the business units. The whole service
can be controlled through chargeback tools that track computing usage units charging only for the
resources used. That may sound strange as the Federal organization already used the money from
tax payers to create the architecture but looking from another perspective each unit of the
organization will pay back charges which can be applied to the development of the system and
personal qualification.
The main idea behind Cloud system is a centralized operational design offering data protection and
services through all the government unities.
There are two variations to the private cloud model:
1 - On-premise private cloud hosted by an organization within its own data centers.
2 - Externally hosted private cloud hosted external to an organization and is managed by a third-
party organization.
DELL EMC and VMware offer the VBlock solution on the Cloud Architecture design
VCE VBlock
VBlock is a full range of pre-integrated servers with shared storage, network devices, virtualization,
and management, all tied together for easy scalability. It has been created to offer extreme
efficiency on Hadoop deployment.
This Solution is easily deployed attending immediate needs for on-demand architecture scalability
2016 EMC Proven Professional Knowledge Sharing 46
Big Data Processing Architecture Design
DELL EMC offers the Greenplum architecture where data is automatically partitioned across
multiple 'segment' servers, and each 'segment' owns and manages a distinct portion of the overall
data. The entire communication goes through network interconnect without disk-level sharing or
contention to be concerned with - also known as 'shared-nothing' architecture.
Here MapReduce integration allows developers and DBAs to run both MapReduce and SQL in
Greenplum’s parallel data flow engine. MapReduce enables analytics to be run on petabytes of
data.
2016 EMC Proven Professional Knowledge Sharing 47
Pivotal HD for Big Data processing
The architecture in the diagram below shows Pivotal HD real-time, interactive, batch processing in a
single Hadoop platform.
Storage Products and Data Lake capability for high volume of data ingestion and future
growth
SAN and NAS Storage array architectures were not designed to store or protect data at large multi-
petabyte capacity levels.
This new era of massive content storage environments requires storage architecture based on an
object storage model which not only stores object data with string capacity level but also store these
new capacity levels at a manageable cost point.
DELL EMC Isilon
A pillar for Big Data appliances, Isilon appliance is a NAS storage platform for multi-protocol
support. It includes Hadoop that eliminates inefficient storage silos and provides a first class
security system and incredible speeds time.
Elastic Cloud Storage Appliance (Storage as a Service)
It is a solution for large geo storage distribution. A powerful hyper-scale geo-distributed object and
HDFS storage platform ECS has the ability to efficiently store billions of objects while delivering data
anywhere to any device. It also allows geo-scale analytics and Multi-Cloud API's to seamlessly
connect to public clouds. It operates from petabyte to Exabyte and beyond.
2016 EMC Proven Professional Knowledge Sharing 48
Conclusions
The goal of this article is to highlight several points where new Technologies like Big Data and
Cloud Computing can help leaders and politicians on the improvement of daily tasks during their
administration.
Within some Big Data examples shown all leaders can build their own strategy following the paths
already in use by other nations to evaluate the present issues in today’s society and from there,
develop plans to address one by one with efficiency.
While Big Data is a data analysis methodology enabled by recent advances in technologies and
architecture, Cloud Computing offers the fastest solution, secure and robust for big data
implementation.
The Big Data analytic tools can bring much more efficiency and results-driven action plans with low
cost. The benefits are not limited to cost reduction on several sectors evaluated by the tools but
several preventive measures with shortest time to be applied. A better Nation means a better future
for the next generation that will come after us. That also means a better world for everybody to
leave, as there will be best approaches on the sustainability with the lowest impact on the planet’s
natural resources and at the same time generating goods for the business exchange.
Not all but the main topics of a Nation daily routine that involves growing population, expanding use
of natural resources, improving productivity, sharing knowledge and efficiency have been covered in
this article. The IT field focus now is no longer to a local infrastructure like servers or known local
storage performance issues; it is a whole new layer of the IT infrastructure where resources are
granted on-demand and processing power for the huge volume of data analytics is fully distributed
to hundreds and thousands of computer machines.
Nations or sectors in the society that have not yet started to use Big Data strategy can start by
engaging data scientist teams and have the tools mentioned readily available for them to use
through Cloud Services. As you may have noticed the solutions available today can be ordered and
be ready to use within few weeks. Data Scientists will be responsible for designing the applications
and analytic systems to analyze all sort of data information that could be ingested.
A great deal of the information to start with the analysis is already available from sensors, mobile
devices, etc., and they are really important to have a clear picture in an actual Nation statistics
scenario.
Big Data and Cloud Computing is a whole new area in the IT industry and it is just at the beginning.
New ideas and areas to apply it will come soon.
From the results of these statistics not only leaders but also others from the society will be able to
help on action plans that will be fully beneficial for the whole society.
2016 EMC Proven Professional Knowledge Sharing 49
Appendix – List of Abbreviations
App – Application BDE – Big Data Extension BI - Business intelligence CCTV – Closed Circuit TV CDR – Call Data record CRM – Customer relationship management DB – Database DCA – Data Computing Appliance AKA “Greenplum” DCN – Data Center and Cloud Networking EDW – Enterprise data warehouse ERP – Enterprise resource planning FGV-Rio – Facudade Getulio Vargas Rio de Janeiro – (University Getulio Vargas Rio de Janeiro) GemFire – In-memory distributed data grid GPS – Global positioning system HAWQ – Parallel SQL query engine from Pivotal HDFS - Hadoop Distributed File System HR – Human Resources IMR – Infant Mortality Rate IoT – Internet Of Things IT – Information technology JDBC – Java Database Connectivity MADlib – Big Data Machine Learning in SQL for Data Scientists MapReduce – Message Passing Interface standard, having reduce and scatter operations MPI – Message Passing Interface MPP – Massively parallel processing NAS – Network Attached Storage ODBC – Open Database Connectivity OLAP – Online analytical processing OLTP – On-line transactional processing OS – Operating System PC – Personal computer RFID – Radio frequency identification tag RDBMS – Relational database management system Rpart – Recursive Partitioning and Regression Trees SaaS – Software-as-a-Service SAS – Statistical Analysis System SAN – Storage Area Network
2016 EMC Proven Professional Knowledge Sharing 50
Footnotes - References 1 Gazzaneo, Rodrigo C. – “TCC – BigData”, From PDF copy. March 2015.
2 Stalin, Joseph. Statement Part “Nation“, From Wikipedia, Web. https://en.wikipedia.org/wiki/Nation.
3 “State (polity)”, From Wikipedia, Web. https://en.wikipedia.org/wiki/State_(polity).
4 “Big Data“, From Wikipedia, Web. https://en.wikipedia.org/wiki/Big_data.
5 http://www.sas.com/en_us/insights/big-data/hadoop.html.
6 “Cloud Computing”, From Wikipedia, Web, https://en.wikipedia.org/wiki/Cloud_computing
7 “Sustainable Development”. From Wikipedia, Web https://en.wikipedia.org/wiki/Sustainable_development.
8 Statistics Data about Brazilian Nation. From Wikipedia. Web. https://en.wikipedia.org/wiki/Brazil
9 http://www8.hp.com/br/pt/industries/public-sector.html?compURI=1087532#.VqwSpFMrKuU
10 “A utilização do Big Data na agropecuária”, Web.
https://www.scotconsultoria.com.br/noticias/artigos/35032/A-utiliza%C3%A7%C3%A3o-do-Big-Data-na-agropecu%C3%A1ria. 4
th June 2014.
- “Good Agricultural Practice”, From Wikipedia Web. https://en.wikipedia.org/wiki/Good_agricultural_practice. - http://www.usp.br/portalbiossistemas/?p=6510 - Mariz, Cristiano. “Da terra brotam os dados”. Web. http://exame.abril.com.br/revista-exame/edicoes/1074/noticias/da-terra-brotam-os-dados. 10 Feb 2014. 11
http://www.brazilianbeef.org.br/texto.asp?id=18. - http://business-reporter.co.uk/2013/11/05/day-6-big-data-beef/ 12 “
Big data” israelense ensina produtores que uma xícara de alegria significa maior produtividade”
Ministry of Economy. “State Of Israel”. Web. http://itrade.gov.il/brazil/?p=4918. 23 June 2015 13
http://blog.qmagico.com.br/educacao/big-data-servico-da-educacao/ - https://www.linkedin.com/pulse/20140719195519-471910-o-valor-do-big-data-no-mercado-educacional - Campos, Newton. “Ensino Adaptativo: O Big Data na Educação”. Web. http://educacao.estadao.com.br/blogs/a-educacao-no-seculo-21/ensino-adaptativo-o-big-data-na-educacao/ 26 April 2014 14
Green-Lerman, Hillary. “Visualizing Personalized Learning”, “The Knewton Blog“ Web. https://www.knewton.com/blog/adaptive-learning/. September 10
th September 2015.
15 http://www.kdnuggets.com/2015/03/how-big-data-can-improve-lives-poor.html
16 http://www.brookings.edu/blogs/africa-in-focus/posts/2015/06/02-big-data-poverty-senegal
17 http://cio.com.br/opiniao/2015/09/01/o-big-data-a-servico-da-sociedade/
18 “i-Canada”, Web. http://www.icanadanetwork.ca/about-i-canada/. 2011
19 https://www.theaudienceagency.org/insight/using-the-evidence-to-reveal-opportunities-for-engagement
- http://www.rio2016.com/culture/
20 http://exame.abril.com.br/tecnologia/noticias/policia-de-sp-usara-sistema-baseado-em-big-data-para-combater-crime
- Jansen, Thiago and Matsuura, Sergio. Web. http://oglobo.globo.com/economia/tecnologia/autoridades-recorrem-controverso-cruzamento-de-dados-na-prevencao-de-crimes-14453408. 4 Nov 2014
- http://bigdatabusiness.com.br/estrategia-politica-saiba-como-ela-pode-se-beneficiar-com-a-mineracao-de-dados-2/ 21 http://embrapii.org.br/aeroespacial-e-defesa-2/ 22 http://www.splunk.com/pt_br/solutions/industries/public-sector/defense-and-intelligence-agencies.html
23 http://sistema4.saude.sp.gov.br/sahe/documento/leitosredeHospitalar.pdf
24 http://www.scielo.br/scielo.php?script=sci_arttext&pid=S2237-96222015000200325 - http://www.nature.com/nature/journal/v527/n7576_supp/full/527S1a.html - https://www.coursera.org/course/bigdatabrasil - Infinit Healthcare. Web http://www.infinithealthcare.com/resource-center/whats-up-in-healthcare-nov-16-22-2014/. 26 November 2014 25 http://bigdatabusiness.com.br/category/politica/ 26
Info extracted from - MCAFEE and BRYNJOLFSSON, 2012 - http://www.admin-magazine.com/HPC/content/download/5604/49345/file/IDC_Big%20Data_whitepaper_final.pdf 27 http://bigdatabusiness.com.br/como-smart-cities-usam-big-data/ 28
https://www.technologyreview.com/s/526541/smart-wind-and-solar-power/
2016 EMC Proven Professional Knowledge Sharing 51
29 http://www.forbes.com/sites/bernardmarr/2015/04/28/nepal-earthquake-using-big-data-in-a-crisis/#6076da81532f - “Analyzing Disaster Big Data to Support Disaster Prevention with Timely and Accurate Forecasts“, Web. http://journal.jp.fujitsu.com/en/2015/05/29/02/. 29 May 2015. 30
“Corruption”, From Wikipedia, Web. https://en.wikipedia.org/wiki/Corruption - http://blog.opovo.com.br/bigdata/2014/08/11/dados-abertos-mais-transparencia-para-acoes-governo/ 31
http://www.emc.com/big-data/expertise.htm - http://www.emc.com/big-data/solutions.htm
2016 EMC Proven Professional Knowledge Sharing 52
Dell EMC believes the information in this publication is accurate as of its publication date. The
information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL EMC MAKES NO
RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE
INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying and distribution of any Dell EMC software described in this publication requires an
applicable software license.
Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.