what is a data scientist - a presentation i made to the canberra iapa

26
What is a Data Scientist? Authored By Russell Tibballs MACS CP MSR

Upload: russelljtibballs

Post on 25-Jan-2015

71 views

Category:

Data & Analytics


1 download

DESCRIPTION

A presentation I made to the Canberra IAPA meeting a couple of months back

TRANSCRIPT

Page 2: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

Caveat

This slideshow does not represent the views of the company I work for. It represents my evolving views at this point in time and is mainly intended to provoke thought and discussion.

Page 3: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

Google searches for “data scientist”

2011 saw the rise of “Big Data” and the term Data Scientist.2011 saw the release of Money Ball, starring Brad Pitt as a geek.2012 Nate Silver correctly predicted the winner of all 50 states and the District of Columbia when the pundits were claiming Obama had lost.

Page 4: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

So how is a Data Scientist portrayed - Super Human

‘Data Scientists perform data science. They use technology and skills to increase awareness, clarity and direction for those working with data. The data scientist role is here to accommodate the rapid changes that occur in our modern day environment and are bestowed the task of minimising the disruption that technology and data is having on the way we work, play and learn. Data Scientists don’t just present data, data scientists present data with an intelligence awareness of the consequences of presenting that data.’

Page 5: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

Super Human Continued

A large IT company – ‘What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge. Good data scientists will not just address business problems, they will pick the right problems that have the most value to the organization.’

Page 6: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

The Super Technologist

Page 7: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

DATA SCIENCE

Mark Biernbaum suggests ‘Data Science is going 99% too fast’. His complaint is that the “science’ is not peer-reviewed and the techniques are often questionable. He believes Data Scientists should slow down, specialize, and above all - have the methodologies peer-reviewed.

Page 8: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

My Problem with Current Definitions

I have been to a number of industry briefings where supplied definitions are often very ‘pie in sky’ and elitist. The definitions are designed to indicate ‘you can’t possibly do this yourself and there is no way any of your existing staff will qualify for the role’. This may not be intended; however it is the result.

Page 9: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

So What can we do about that? Recognise that Data Science is a

science that has a broad brush stroke across all industry sectors.

Recognise that there are many specialty areas.

Recognise that it is not a technological implementation.

Recognise that there will be many levels of expertise.

Page 10: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

Recognise that there are many specialty areas.

There is not one version of data science. There is data science applicable to the

research sectors of Maths, Physics, Meteorology, and Medicine that will rarely be applied elsewhere.

There is data science our friends in the NSA and local equivalents will specialise in.

There is the data science economist and financial sectors will specialise in.

Etc, etc … ad nauseam.

Page 11: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

Recognise that it is not a technological implementation.

Being able to query unstructured data in a HDFS does not make you a data scientist.

Being able to analyse Splunk data does not make you a data scientist.

Being able to filter petabytes of data on a MPP RDBMS does not make you a data scientist.

Page 12: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

So What is a Scientist.

Page 13: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

The important aspects of any definition of a Job Title.

The most important thing to remember here is that we are talking about a Job Title, and a Job Title should be meaningful.

Secondly what should qualify someone for that title.

Page 14: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

So what is important about the title ‘Data Scientist’?.

Page 15: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

Where did this title originate?

‘On November 10, 1998, he (Jeff Wu) gave his inaugural lecture entitled “Statistics = Data Science?” in honor of his appointment to the H. C. Carver Collegiate Professorship in Statistics at the University of Michigan.[14] In this lecture, he first focused on the identity of statistics in science. He then characterized statistical work as data collection, data modeling and analysis, and problem solving and decision making. In conclusion, he proposed that statistics be renamed to Data Science.

Page 16: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

So What is a Scientist?

From the Oxford Dictionary:‘A person who is studying or has expert knowledge of one or more of the natural or physical sciences :a research scientist’.Note. A scientist is not necessarily a research scientist; they can be a practicing expert in a field.However all scientists share one feature, they are trained in a science and they apply scientific method to obtain understanding of a focus of interest, and their methods and conclusions are subject to peer review.

Page 17: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

A comment from a recently retired Scientist

My neighbor has recently retired after a long career as a scientist and academic. We were discussing the increasing growing exclusivity of the term scientist a few weekends ago. In his words, ‘In the 1970s a scientist had degree, by mid 80s they needed honors, in the 90s they needed a masters or PHD, now they need several Post-Doctoral projects under their belt to be considered a ‘real’ scientist.’ However, he believes someone who is qualified (has a science degree) and who is practicing their studied discipline, is a scientist.

Page 18: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

A Slight Detour.What qualifies a professional

I see the Data Scientist as a specialty of the Computer Science profession.

We have lawyers who specialise in corporate, family, criminal, and other aspects of the law.

The accounting, architecture, engineering, teaching and medical professions have several specialties and recognised levels of expertise in each field.

These professional’s have academic training, and in many cases acceptance by a professional body is what makes them acceptable as professionals in the public eye. That is a model I strongly believe the ICT industry needs to adopt or at least move towards.

I believe the academic achievement makes the qualification. The acceptance by a professional body should give standing within the profession and wider community.

Page 19: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

The Australian Qualifications Framework - AQF

The AQF has 10 levels Level 1 – Certificate I Level 2 – Certificate II Level 3 – Certificate III Level 4 – Certificate IV Level 5 – Diploma Level 6 – Advanced Diploma, Associate Degree. Level 7 – Bachelor Degree Level 8 – Bachelor Honors Degree, Graduate

Certificate, Graduate Diploma Level 9 – Masters Degree Level 10 – Doctoral Degree

Page 20: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

A THE BOTTOM LEVEL OF THIS SPECTRUM OF QUALIFICATIONS.

Summary Graduates at this level will have knowledge and skills for initial work, community involvement and/or further learningKnowledge Graduates at this level will have foundational knowledge for everyday life, further learning and preparation for initial work

Skills Graduates at this level will have foundational cognitive, technical and communication skills to:

•undertake defined routine activities•identify and report simple issues and problems

Application of knowledge and skills: Graduates at this level will apply knowledge and skills to demonstrate autonomy in highly structured and stable contexts and within narrow parameters

Page 21: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

At the highest level of the spectrum of the AQF 10 – The Doctorate

Summary Graduates at this level will have systematic and critical understanding of a complex field of learning and specialised research skills for the advancement of learning and/or for professional practice

Knowledge Graduates at this level will have systemic and critical understanding of a substantial and complex body of knowledge at the frontier of a discipline or area of professional practice

Skills Graduates at this level will have expert, specialised cognitive, technical and research skills in a discipline area to independently and systematically: engage in critical reflection, synthesis and evaluation develop, adapt and implement research methodologies to extend and redefine existing

knowledge or professional practice disseminate and promote new insights to peers and the community generate original knowledge and understanding to make a substantial contribution to a

discipline or area of professional practice

Application of knowledge and skills Graduates at this level will apply knowledge and skills to demonstrate autonomy, authoritative judgment, adaptability and responsibility as an expert and leading practitioner or scholar

Page 22: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

The Degree

Summary Graduates at this level will have broad and coherent knowledge and skills for professional work and/or further learning

KnowledgeGraduates at this level will have broad and coherent theoretical and technical knowledge with depth in one or more disciplines or areas of practice

Skills Graduates at this level will have well-developed cognitive, technical and communication skills to select and apply methods and technologies to: analyse and evaluate information to complete a range of activities analyse, generate and transmit solutions to unpredictable and

sometimes complex problems transmit knowledge, skills and ideas to others

Application of knowledge and skills Graduates at this level will apply knowledge and skills to demonstrate autonomy, well-developed judgement and responsibility: in contexts that require self-directed work and learning within broad parameters to provide specialist advice and functions

Page 23: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

The Vendor’s Course

The vendors course will usually be about how to apply a tool to a problem.

It is not generally designed to provide you with knowledge that can be applied outside the scope of their tool’s environment.

It would generally not qualify within the AFQ guidelines.

Page 24: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

So how does the AQF apply to the question of Data Science

If the person working in the field of applying ‘Data Science’ has a degree (AQF level 6 or above) in a related subject, ie Maths, Statistics, or Economics; or a higher degree including Grad Cert and Diplomas they can be expected to: apply knowledge and skills to demonstrate

autonomy, well-developed judgment and responsibility: in contexts that require self-directed work and

learning within broad parameters to provide specialist

advice and functions

Page 25: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

Quo Bono. Who benefits from this approach

The Public - they will have greater confidence in the profession.

The employer – they get the assurance that employee has the skills at the right levels to do the work.

The employee – because they will know what is expected of them and know they will be able to deliver.

The professional body and industry through greater faith and confidence by the public in the profession in general.

Page 26: What is a data scientist - a presentation I made to the Canberra IAPA

Authored By Russell Tibballs MACS CP MSR

But!!! There needs to be demand from within the

industry for this to happen. Some group like the IAPA needs to take on the

responsibility of working out the Professional specialisations and required frameworks for acceptance of professional into those specialisations.