deteo. data science, big data expertise

Click here to load reader

Post on 05-Aug-2015

1.576 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

1. DATA SCIENCE 2. Data Science Data science is the process of deriving valuable knowledge from "Big Data" consisting of structured, unstructured or semi-structured data that large enterprises produce. 3. Big Data Big data is a set of techniques and technologies which operates wits data sizes beyond the ability of commonly used software tools to capture and manage within a tolerable elapsed time. 4. Data Mining Data mining is a process that analyzes a large amount of data to find new and hidden information that improves business efficiency. Various industries have been adopted data mining to their mission-critical business processes to gain competitive advantages and help business to grow. 5. Machine Learning Machine Learning is a process that gives computers the ability to learn without being explicitly programmed. Examples: spam filtering, recommendation systems, sales predictions. 6. Business domains Any kind of data analyses is based on two major components: technical tools and domain expertise. Deteo has significant practical experience in the following industries proven by long term cooperation with appropriate customers from: Banking sector Insurance Human resource management IT and Telecom Accounting Retail 7. Business challenges we can address New possibility for growth depends on the ability to analyze, predict and make decision based on existed data related to customers and market: Retail Market basket analysis to provide information on what products or services combinations were purchased or consumed together. This allows to promote and optimize products and maximize profit. Analyze customer retention and locality based on recent purchases activities. Data mining helps detect fraudulent behavior with credit card or online transactions Clustering/Segmentation for targeted marketing 8. Business challenges we can address Bank and Insurance Detect risky behavior of customers Claim prediction based on information available from previous events Fraud detection eCommerce Collaborative filtering and recommendation systems that make automatic prediction about the interests of users by collecting preferences and tastes information from many similar users of such systems. Mining social networks could be applied both to target marketing and sentiment analysis Intranet search to provide capabilities to find and answer the questions based on information available within corporation or organization networks Analysis on streaming/online data to prepare information for further processing 9. Deteo Service Offerings 10. Approach In scope of Data Science service offering we are able to complete the following scope of activities: Comprehensive review of customers current business, plans and systems Recommendations on connecting Data science tools and approaches to customers existing Business and IT infrastructure Perform Data Analysis Data Visualization and Advanced Reporting Support and Maintenance or Solution Hand Over 11. Initiation Project initiation Team setup Define business needs Analysis Define business goals in technical metrics Analyze current infrastructure Analyze existing data Analyze level of data sensitivity Develop required algorithms Validate algorithms on small portion of data Data Mining Prepare required infrastructure Perform data masking of sensitive data Run data mining algorithms Results Analysis Root-cause analysis Risks assessment Recommenda- tions to fix Reporting Transform mined data into graphics, charts and tables understandable for stakeholders Plan meeting where prepared reports are presented Hand Over Prepare knowledge transfer plan Prepare technical and business documentation Provide training for customers experts Handover developed solution to customer Iteration cycle: 3-6 weeks Regular status meetings 12. Deteo Expertise 13. Case study: Car insurance Business challenge We received historical data about car accidents from insurance company for the last 5 years. Data was anonymized, so contained no personal information. Customer asked us to analyze this data. There was an assumption that insurance risk was not equal for different groups of cars. Our solution Using Microsoft cloud stack of technologies for data analysis we run several experiments and have defined groups of cars with equal risk probability. Based on this information Customer was able to adjust his insurance fee card, so for two car groups insurance fee was decreased for 10% and customer proposition became more valuable on the market. 14. Business challenge We received unstructured logs from server farm that represented servers and services activities. Idea was to analyze it and to find the most problematic servers and try to analyze the reasons. Our solution Using Hadoop Apache technology stack we loaded and processed about 500 GB of text files. As a result, we identified servers that failed the most often and defined the most probable preconditions of the fault. Next step is to implement online logs processing and analysis in order to predict server or service fault. Case study: Logs analysis 15. Recommendation systems Machine learning Visualization Data Mining Stream processing NoSQL databases Hadoop based infrastructure Microsoft HD Insight Oracle BigData appliance IBM InfoSphere BigInsights Tools Hadoop, Spark, Hive, Pig Azure R, Python, Java Vendors Oracle, Microsoft, IBM Apache QlikView, Tableau Stream processing IBM InfoSphere Streams Oracle Real-Time Decisions Apache Storm in MS Azure Data science Recommendation systems Machine learning Visualization Data Mining MongoDB Cassandra Neo4j When the data becomes a real problem of its size and variety its time for Big Data solutions 16. Trainings and certifications Deteos data science team has passed following trainings and certifications Coursera Machine Learning Mining Massive Datasets Computing for Data Analysis R Programming Online Stanford University Statistical Learning Other Hadoop: Map Reduce and Big Data MongoDB for Developers MongoDB for DBAs 17. Interested to know more about our abilities? Please ping us at [email protected]