deteo. data science, big data expertise

17
DATA SCIENCE

Upload: deteo

Post on 05-Aug-2015

1.578 views

Category:

Documents


0 download

TRANSCRIPT

DATA SCIENCE

Data Science

Data science is the process of deriving valuable knowledge from "Big Data" consisting of structured, unstructured or semi-structured data that large enterprises produce.

Big Data

Big data is a set of techniques and technologies which operates wits data sizes beyond the ability of commonly used software tools to capture and manage within a tolerable elapsed time.

Data Mining

Data mining is a process that analyzes a large amount of data to find new and hidden information that improves business efficiency. Various industries have been adopted data mining to their mission-critical business processes to gain competitive advantages and help business to grow.

Machine Learning

Machine Learning is a process that gives computers the ability to learn without being explicitly programmed.

Examples: spam filtering, recommendation systems, sales predictions.

Business domains

Any kind of data analyses is based on two major components: technical tools and domain expertise. Deteo has significant practical experience in the following industries proven by long term cooperation with appropriate customers from:

• Banking sector• Insurance• Human resource management• IT and Telecom• Accounting • Retail

Business challenges we can address

New possibility for growth depends on the ability to analyze, predict and make decision based on existed data related to customers and market:

Retail• Market basket analysis to provide information on what products or services

combinations were purchased or consumed together. This allows to promote and optimize products and maximize profit.

• Analyze customer retention and locality based on recent purchases activities. • Data mining helps detect fraudulent behavior with credit card or online

transactions• Clustering/Segmentation for targeted marketing

Business challenges we can address

Bank and Insurance• Detect risky behavior of customers• Claim prediction based on information available from previous events• Fraud detectioneCommerce• Collaborative filtering and recommendation systems that make automatic

prediction about the interests of users by collecting preferences and tastes information from many similar users of such systems.

• Mining social networks could be applied both to target marketing and sentiment analysis

• Intranet search to provide capabilities to find and answer the questions based on information available within corporation or organization networks

• Analysis on streaming/online data to prepare information for further processing

Deteo Service Offerings

Approach

In scope of Data Science service offering we are able to complete the following scope of activities:

• Comprehensive review of customers’ current business, plans and systems• Recommendations on connecting Data science tools and approaches to

customers’ existing Business and IT infrastructure• Perform Data Analysis• Data Visualization and Advanced Reporting• Support and Maintenance or Solution Hand Over

Initiation

• Project initiation• Team setup• Define business

needs

Analysis

• Define business goals in technical metrics

• Analyze current infrastructure

• Analyze existing data• Analyze level of data

sensitivity• Develop required

algorithms• Validate algorithms on

small portion of data

Data Mining

• Prepare required infrastructure

• Perform data masking of sensitive data

• Run data mining algorithms

Results Analysis

• Root-cause analysis

• Risks assessment• Recommenda-

tions to fix

Reporting

• Transform mined data into graphics, charts and tables understandable for stakeholders

• Plan meeting where prepared reports are presented

Hand Over

• Prepare knowledge transfer plan

• Prepare technical and business documentation

• Provide training for customers experts

• Handover developed solution to customer

Iteration cycle: 3-6 weeks

Regular status meetings

Deteo Expertise

Case study: Car insurance

Business challengeWe received historical data about car accidents from insurance company for the last 5 years. Data was anonymized, so contained no personal information. Customer asked us to analyze this data. There was an assumption that insurance risk was not equal for different groups of cars.

Our solutionUsing Microsoft cloud stack of technologies for data analysis we run several experiments and have defined groups of cars with equal risk probability. Based on this information Customer was able to adjust his insurance fee card, so for two car groups insurance fee was decreased for 10% and customer proposition became more valuable on the market.

Business challengeWe received unstructured logs from server farm that represented servers and services activities. Idea was to analyze it and to find the most problematic servers and try to analyze the reasons.

Our solutionUsing Hadoop Apache technology stack we loaded and processed about 500 GB of text files. As a result, we identified servers that failed the most often and defined the most probable preconditions of the fault. Next step is to implement online logs processing and analysis in order to predict server or service fault.

Case study: Logs analysis

• Recommendation systems• Machine learning• Visualization• Data Mining

Stream processing

NoSQL databases Hadoop based infrastructure

• Microsoft HD Insight• Oracle BigData appliance• IBM InfoSphere BigInsights

Tools

• Hadoop, Spark, Hive, Pig• Azure• R, Python, Java

Vendors

• Oracle, Microsoft, IBM• Apache• QlikView, Tableau

Stream processing

• IBM InfoSphere Streams• Oracle Real-Time Decisions• Apache Storm in MS Azure

Data science

• Recommendation systems• Machine learning• Visualization• Data Mining

• MongoDB• Cassandra• Neo4j

When the data becomes a real problem of its size and variety – it’s time for Big Data solutions

Trainings and certifications

Deteo’s data science team has passed following trainings and certifications

Coursera • Machine Learning • Mining Massive Datasets • Computing for Data Analysis • R Programming

Online Stanford University• Statistical Learning Other• Hadoop: Map Reduce and Big Data• MongoDB for Developers • MongoDB for DBAs

Interested to know more about our abilities?

Please ping us at [email protected]