Download - Deteo. Data science, Big Data expertise
Data Science
Data science is the process of deriving valuable knowledge from "Big Data" consisting of structured, unstructured or semi-structured data that large enterprises produce.
Big Data
Big data is a set of techniques and technologies which operates wits data sizes beyond the ability of commonly used software tools to capture and manage within a tolerable elapsed time.
Data Mining
Data mining is a process that analyzes a large amount of data to find new and hidden information that improves business efficiency. Various industries have been adopted data mining to their mission-critical business processes to gain competitive advantages and help business to grow.
Machine Learning
Machine Learning is a process that gives computers the ability to learn without being explicitly programmed.
Examples: spam filtering, recommendation systems, sales predictions.
Business domains
Any kind of data analyses is based on two major components: technical tools and domain expertise. Deteo has significant practical experience in the following industries proven by long term cooperation with appropriate customers from:
• Banking sector• Insurance• Human resource management• IT and Telecom• Accounting • Retail
Business challenges we can address
New possibility for growth depends on the ability to analyze, predict and make decision based on existed data related to customers and market:
Retail• Market basket analysis to provide information on what products or services
combinations were purchased or consumed together. This allows to promote and optimize products and maximize profit.
• Analyze customer retention and locality based on recent purchases activities. • Data mining helps detect fraudulent behavior with credit card or online
transactions• Clustering/Segmentation for targeted marketing
Business challenges we can address
Bank and Insurance• Detect risky behavior of customers• Claim prediction based on information available from previous events• Fraud detectioneCommerce• Collaborative filtering and recommendation systems that make automatic
prediction about the interests of users by collecting preferences and tastes information from many similar users of such systems.
• Mining social networks could be applied both to target marketing and sentiment analysis
• Intranet search to provide capabilities to find and answer the questions based on information available within corporation or organization networks
• Analysis on streaming/online data to prepare information for further processing
Approach
In scope of Data Science service offering we are able to complete the following scope of activities:
• Comprehensive review of customers’ current business, plans and systems• Recommendations on connecting Data science tools and approaches to
customers’ existing Business and IT infrastructure• Perform Data Analysis• Data Visualization and Advanced Reporting• Support and Maintenance or Solution Hand Over
Initiation
• Project initiation• Team setup• Define business
needs
Analysis
• Define business goals in technical metrics
• Analyze current infrastructure
• Analyze existing data• Analyze level of data
sensitivity• Develop required
algorithms• Validate algorithms on
small portion of data
Data Mining
• Prepare required infrastructure
• Perform data masking of sensitive data
• Run data mining algorithms
Results Analysis
• Root-cause analysis
• Risks assessment• Recommenda-
tions to fix
Reporting
• Transform mined data into graphics, charts and tables understandable for stakeholders
• Plan meeting where prepared reports are presented
Hand Over
• Prepare knowledge transfer plan
• Prepare technical and business documentation
• Provide training for customers experts
• Handover developed solution to customer
Iteration cycle: 3-6 weeks
Regular status meetings
Case study: Car insurance
Business challengeWe received historical data about car accidents from insurance company for the last 5 years. Data was anonymized, so contained no personal information. Customer asked us to analyze this data. There was an assumption that insurance risk was not equal for different groups of cars.
Our solutionUsing Microsoft cloud stack of technologies for data analysis we run several experiments and have defined groups of cars with equal risk probability. Based on this information Customer was able to adjust his insurance fee card, so for two car groups insurance fee was decreased for 10% and customer proposition became more valuable on the market.
Business challengeWe received unstructured logs from server farm that represented servers and services activities. Idea was to analyze it and to find the most problematic servers and try to analyze the reasons.
Our solutionUsing Hadoop Apache technology stack we loaded and processed about 500 GB of text files. As a result, we identified servers that failed the most often and defined the most probable preconditions of the fault. Next step is to implement online logs processing and analysis in order to predict server or service fault.
Case study: Logs analysis
• Recommendation systems• Machine learning• Visualization• Data Mining
Stream processing
NoSQL databases Hadoop based infrastructure
• Microsoft HD Insight• Oracle BigData appliance• IBM InfoSphere BigInsights
Tools
• Hadoop, Spark, Hive, Pig• Azure• R, Python, Java
Vendors
• Oracle, Microsoft, IBM• Apache• QlikView, Tableau
Stream processing
• IBM InfoSphere Streams• Oracle Real-Time Decisions• Apache Storm in MS Azure
Data science
• Recommendation systems• Machine learning• Visualization• Data Mining
• MongoDB• Cassandra• Neo4j
When the data becomes a real problem of its size and variety – it’s time for Big Data solutions
Trainings and certifications
Deteo’s data science team has passed following trainings and certifications
Coursera • Machine Learning • Mining Massive Datasets • Computing for Data Analysis • R Programming
Online Stanford University• Statistical Learning Other• Hadoop: Map Reduce and Big Data• MongoDB for Developers • MongoDB for DBAs