agility for big data

14
Agility for Big Data My journey implementing an Agile method to Big Data applications

Upload: charlie-cheng

Post on 11-Aug-2014

1.929 views

Category:

Data & Analytics


7 download

DESCRIPTION

Agility for big data

TRANSCRIPT

Page 1: Agility for big data

Agility for Big DataMy journey implementing an Agile method to Big Data applications

Page 2: Agility for big data

Who I am

Page 3: Agility for big data

What is the hardest part about bringing agility to your big data applications?

Page 4: Agility for big data

“The more data you give the business, the more questions they will ask”

Jose Carlos EirasServed as CIO at Kraft Foods, Philip Morris, General Motors and DHL

Page 5: Agility for big data

Reporting over Workable Software

Page 6: Agility for big data

Reporting over Workable Software

• Problems experienced

• Customer don’t know about they want until they see that

• Very long feedback cycle because of waiting for quality data

• Developing workable software is much more expensive than generating a report manually

• Workable software without data to use is even more expensive

• Switching cost between tasks is high, but the switching cost between projects is even higher

• Releasing a feature to All Users will result in more questions coming in, either because of data quality or other valid reasons

• Very low product success rate, lots of resources wasted and low team spirit

Page 7: Agility for big data

Reporting over Workable Software

• Solutions

• Focus on a very specific customer group and generate reports for them

• Collect data that targets a very specific customer group, like: parents in Box Hill area who work in IT

• Manually generated reports

• Data quality easier to control over a small amount of data

• Deliver reports to end users in the most cost effective way: eg face to face, email, or open source BI tools

• Get feedback and test hypothesis

• Focus on a subset of data while discovering the value of existing data

• Apply new methodology to a subset of data in a much more effective way

• Data quality easier to control on a subset of data

• Focus on one customer and get feedback from the client

• Test hypothesis

Page 8: Agility for big data

Reporting over Workable Software

• Solutions

• Data Freedom - Empowering people (example - data scientists exploring data values)

• Provide an SQL-like interface for users to easily access the data

• Provide semantic schema so that users can easily find where to find right data

• Document your data if necessary to help other people understand, decipher and use data

• Provide easy-to-use report designs for accessing data like Pentaho, Jasper Report

• Provide easy to use scheduling tools like Oozie, or general BI tools

• Mentally, developers should provide support for other people to freely explor data in ways they like

• In the scenario that data must be accessed through developers, those developers should think about what stops other users from accessing data

• Safeguard to prevent cluster overloading

• The overall result will be to increase the speed of feedback - dramatically

Page 9: Agility for big data

Reporting over Workable Software

• More to try

• Automated data quality control

• Explore different ways for the customer service team to address data quality issues

• Sampling data for product discovery programs

• Explore ways to test a hypothesis in an even quicker manner – example: customer centric data collection and reporting

• Explore a wider scale of data freedom through web service

Page 10: Agility for big data

Continuous Delivery

• Continuous delivery, where to start?

• Problems: legacy systems, low unit test coverage, low functional/ integration test coverage, no acceptance testing, not enough testing data, and so on…

• Start with an easy problem so that it is achievable and will help to build team trust

• Must have – testing data and integration testing suites

Page 11: Agility for big data

Continuous Delivery

• Build pipeline //dev box//build//daily build server//alpha//beta//production//

• Testing Data - you will never cover all scenarios, so what do you do? Hybrid data fixtures with data manual produced, generated, and from production

• Versioning Data

• Keep data clean as code, refactor your data often

• Backward and forward compatibility

• Vertical slicing story, architecture and teams

• NoSQL database engines

• Start continuous delivery for some components NOW and learn from

Page 12: Agility for big data

Deployment != Release

• Separate deployment from release

• Tips

• Data batch toggles

• Feature toggles

• Customer/ Country/ Region releases

• Manually generated report area

• Don’t forget about “exclusive” toggles

• Leave release up to the production manager. They release and they organize press releases.

Page 13: Agility for big data

Q&A

What is the hardest part about bringing agility to your big data applications?

Page 14: Agility for big data

My Personal Information

• LinkedIn Profile: http://au.linkedin.com/pub/charlie-cheng/24/92/978/

• Twitter: @charlie_cheng

Are you looking for some training and find it is hard to select the right one?

We are running a customer discovery program on it at StudyIsFun.

Please contact me at [email protected] if you are interested.