ai hiring: playbook · choose expected skills across machine learning, inferential statistics, deep...

January, 2019

AI Hiring: PlaybookData Scientist roles

Zinnov (Draup) Point of View

Source : DRAUP

2

2

What are the roles in AI

ROLES SKILLS SAMPLE WORKLOADS

• Develop scalable tools and leverage ML and deep learning models to solve real-world problems in areas such as Speech Recognition and NLP

• Collaborate with all lines of businesses and functions in the Corporate Investment Banks: Markets, Global Investment Banking, Corporate Banking, Technology and Operations

Applied Data Scientist – NLP

Data Scientist skillsAdditional Skills Required: NLP Libraries - NLTK, SpaCy, GensinAdditional Knowledge of transfer & Sequential learning RNN & LSTM

Data Scientist Skills Additional Skills: Gurobi, CPLEX, Symphony, Axioma, OpenCV, Caffe, Torch, TheanoAdditional Knowledge of transfer & Sequential learning CNN, RNN & LSTM

• Use Deep Learning techniques arising in Computer Vision• Experimental models to leverage facial recognition for advanced security and KYCs• Video surveillance analytics• Building facial based authentication for payment using selfies

Applied Data Scientist – Computer Vision

• Work on statistical and ML techniques & develop segments, predictive models, experimental designs & decision analysis

• Gather, manipulate & analyze large data sets from multiple sources & develop algorithms to optimize customer segmentation, customer retargeting, operational optimization etc.

Python, SparkML, machine learning APIs and computational packages (TensorFlow, Theano, PyTorch, Keras, Scikit-Learn, NumPy, SciPy, Pandas, statsmodels)Deep Learning(Optional): CNN, RNN/LSTM, GAN

Data Scientist

Data Scientist Skills Additional Skills: Kaldi, Attila, HTK, Sphinx, SRILM, OpenFSTUnderstanding of Dialog Management, Automatic Speech Recognition(ASR), Audio Signal Processing

• Develop a vocal-tract-length-normalization training workflow and designed an VTLN adaptation process for ASR

• Develop an online information retrieval system for language model adaptation for ASR• Develop an acoustic modelling switching based on feature classification, model adaptation, and

acoustic data clustering

Applied Data Scientist – Speech Recognition

Source : DRAUP

3

3

Job Responsibilities : Key business areas for roles in AI

SALES & MARKETING FINANCE CUSTOMER EXPERIENCE PRODUCTOPERATIONSROLES

• Automating call distribution

• Customer Segmentation and Analytics

• Customer Sentiment Analysis

• Financial trading (High frequency trading enabled by AI)

• Predictive maintenance & replacement

• Data collection from sensors in real time

Applied Data Scientist –NLP

Applied Data Scientist – Computer Vision

• Context aware marketing• Personalized marketing• Customer segment analytics

• Anticipating future customer purchases and presenting offers accordingly

• Improving media buying• Monitoring social media

comments to determine overall brand affinity and issues

• Tailoring promotions (online or offline)

• Customer Service Automation

• Facial and voice-based biometrics

• Optical Character Recognition

• Customer emotion analytics

• Visual search capability

Data Scientist

• Customer Support through Chatbot

• Automated Voice Response• Social Analytics &

Automation

Applied Data Scientist – Speech Recognition

• Financial Voice assistance• Real time speech analytics

(RTSA) technology

• Parsing & Machine Translation

• Part of Speech tagging

• Leveraging machine vision to tag the images taking into the Account of users’ preference and improve product discovery

• Image recognition & visual analytics

• Medical imaging insights

• Financial Analysis• Algorithmic Trading• Investment strategies• Scanning legal and regulatory

text for compliance issues

• Voice Authentication

• Remote KYC based on facial recognition

• Chatbots• Design Patterns

Prediction

• Autonomous Vehicles

• Part of Speech Tagging

• Speech recognition analysis

Source : DRAUP

4

4

Hiring a Data Scientist is tough in US

- Attrition rate: 9% 14%

- Average Salary ~84K ~110K

- Average Tenure

in a roles

Software Engineer Data Science

1.5

Years

2.4

Years

Source: DRAUP Talent Module

Note: Average values derived from analysis of 100 US MSAs updated as of Dec, 2018)

2018 2019 2020 2021

AI Job Openings

~100K

Global AI Job Openings

~450K - Average Time to

Hire:

48

Days

37

Days

Source : DRAUP

5

5

AI Tools and Technologies Stack : What IT Tools should a Data Scientist know

Infrastructure &

Processing

AI Frameworks

Applied AI as-a

Service

Big Data Platform

GPUs CPUs AWS Lambda

TensorFlow CAFFE TORCH THEANO CNTK Keras

EMR DataBricks

Azure FunctionGoogle

Function

Spark

Image- Recognition

AASNLP AAS Vision AAS Bot AAS Speech AAS

Amazon Lex Amazon PolyMSFT

BoTFramework

AWS Rekognition

Programming

Languages

R

Pyth

on

O

P

E

N

C

L

Integrated AI

Platforms

AZURE ML AWS Deep Learning AMI Google ML

DataBricks ML AWS SageMaker IBM Watson Studio NanoNets

NoSQL DBs

SciKit Learn

Google Vision API

Source : DRAUP

6

6

CS

Fundamentals &

Programing

Inferential and

Bayesian

Statistics

Data Modelling

Applied

Frameworks/Libraries

Domain

Understanding

Big Data

Engineer- ML

Applied ML

Engineer

Core ML

EngineerBusiness/Data

Analyst

Different Personas of a Data Scientist – Understand what the Team needs

1. Big Data Engineer- MLHave an emphasis or specialization in distributed

systems and big data. A data engineer has

advanced programming and system creation skills.

He/She can do some basic to intermediate level

analytics

2. Applied ML EngineerData Scientists that focus on leveraging ML

algorithms to solve NLP, Speech or Vision

problems

3. Core ML EngineerData Scientists that build core techniques and

frameworks for Machine learning use cases

4. Business/Data AnalystBusiness and data focused developers that build

queries and conduct statistical analytics to solve a

business problem

Source : DRAUP

7

7

Skill Anatomy of a Data Scientist in Banking & Finance: What to look for in a profile

Ph.D or Masters or Bachelors degree in Computer Science, Statistics, Mathematics, Economics, Finance, Engineering

Must know: Python, R

Good to have: Java, SQL

Analytical thinking, verbal communication, cross-functional understanding, investigation synopses, curiosity, challenge driven, creativity, passionate &

resilient, problem solving, team player

Education

Languages

Core Skillsets/Concepts

Experiences

Behavioral Skills

Frameworks/Libraries, Tools

Foundational

Banking Specializations

Recommendatory

Regression Analysis, Decision Trees, Ensemble Algorithms, Neural Networks, Time Series Analytics, Clustering, Random Forest, Gradient Boosting,

Text Mining

Transformation of traditional technology areas:

Personalized marketing: Targeted customer analytics, personalized offers on loans, cards, traditional credit scoring systems

Customer experience: Personalized customer service, sentiment analysis, customer churn prediction, predictions for investments, Customer Lifetime

Value (CLV)

Fraud detection and risk management: Text mining and information retrieval task, understanding of AML/CTF legislation (Anti Money laundering and

Counter Terrorism Financing), credit risk modelling, analysing loan data and credit card history, AML pattern recognition

Customer service automation: Machine learning enhanced chatbots, virtual customer services, voice activated financial applications

Emerging technology areas:

Robo advisory: Algorithm-based investment advisory services by monitoring events, stock prices, bond price trends etc.

Alternative credit scoring: Credit scoring based on large data sets such as social media footprint, non-credit payment history (e.g. rent, utility

payments), employment history, education background, browsing history

Core Skills:

Scikit-Learn, TensorFlow, Caffe, PyTorch, Keras, Hadoop

Adjacent Skills:

SAS, Hive, MatLab, Spark, MXNet

Source : DRAUP

8

8

Big Data Engineer-

MLApplied ML

EngineerCore ML Engineer Business/Data

Analyst

Most used Data Scientist titles in the Banking & Finance industry

Big Data Engineer, Big

Data Developer, Big Data

Application Architect,

Hadoop/Spark Developer

Big Data Engineer,

Programmer Analyst- Big

Data & Data Lake

Integration, Big Data

Developer


Data Architect, Big Data

Solution Engineering, Big

Data Developer

Big Data Developer, Big

Data Hadoop Developer,


Data Consultant, Hadoop

Developer

Technology Analyst,

Fraud Analyst, Automation

Specialist, Cyber Security

Data Analytics, Data

Scientist - Robotics

NLP Engineer, Machine

Learning / NLP Engineer,

Applied Machine

Learning Engineer-Fraud

Prevention

NLP And Deep Learning

Consultant, NLP

Software Engineer

Technology Analyst,

Stats Analyst,

Algorithmic Trading

Strategist

Machine Learning

Engineer, Machine

Learning Consultant

Machine Learning

Engineer, Machine

Learning Consultant

Machine Learning

Consultant, Machine

Learning Developer,

Machine Learning

Engineer

Associate- Machine

Learning, Machine

Learning Strategist,

Machine Learning

Researcher

Technical Analyst,

Credit Risk Analyst,

Tableau Developer,

Finance Analyst,

Quantitative Analyst

Quantitative Analytics

Consultant, Credit Risk

Analytics Consultant,

Tableau Developer

Quantitative Research

Analyst, Data Analyst,

AML Data Science

Analyst, Risk

Management Analyst

Quantitative Analyst,

Investment Banking

Analyst, Compliance

Analyst, Equity Sales

Strategist

Source : DRAUP

9

9

1. Type of projects: Data scientists usually prefer to work in high impact projects which

involve emerging banking areas rather than working on ad-hoc jobs that require getting

numbers from a database or ETL (Extract, Transform, Load).

What do Data Scientists look for in a job?

2. Location: Location plays a major role for data scientists to chose a job as they prefer to

work in tech hotspots of the industry that they want to venture into. New York, San Francisco

and Dallas are the preferred locations by data scientists in the banking and finance industry.

The presence of AI and data science pool and digital centres are highest in these areas.

Examples of appealing projects

Within US

New York

San Francisco

Dallas

3. Tools & Technologies: Open-source tools and technologies are preferred by data

scientists as the support from the community and feeling of contribution is higher there. The

infrastructure that the company has also plays a major role - lack of infrastructure is a major

turn-off for them.

4. Flexibility: Data scientists do not prefer sticking to a typical 9-to-5 banking job. They prefer

higher flexibility in terms of work timings, leaves, clothing preference and availability of remote

working opportunity.

Outside US

London, UK

Toronto Bangalore

Flexible work timing Casual Dress Code

More than 30 days of leaves Work from home option

Preference in terms of flexibility

Robo Advisory

5. Team and guidance: Data scientists look for teams where there is guidance and thought

leadership in terms of the technology area. Learning is much higher in companies which have

a balance in terms of experience, rather than a firm which is bottom heavy.

Finance firms with good AI leadership

Preferred tools

TensorFlow Spark SQL

RapidMiner

Keras

Build, optimise and

train new or existing ML

models

Fraud Detection

Apply neural network to

detect fraudulent

activities

Anaconda

Chatbots & Voice Banking

Implement conversational

banking which generates real

time conversations

Source : DRAUP

10

10

Factors contributing to employee experiences

Job Security & PromotionsTraining & Management Flexible Work Culture Pay Benefits & Fair Performance

Peer Companies

Positive Factors Negative Factors

• Extensive training program and

continued education support

• Progressive management

• Good place to learn technical

skills

• Regular work shifts

• Good work life balance

• Flexible timings

• Good parental leave policy

• Medical/Dental insurance

• Tuition reimbursement

• Higher compensation for

freshers

• Higher advancement

opportunities and ease of

mobility within company

• Diverse workforce

• Comprehensive training

program

• Flexibility in schedule

• Remote working opportunity

• Higher paid time off

• Tuition reimbursement • Lack of technology

infrastructure compared to

peers

• Lower job security

• Extensive learning programs

ranging from skills-based

offering and high potential

leadership programs

• Work life balance is not good

• Short lunch and tea breaks

• Long working hours

• Good Health/ Medical/ Dental/

Vision benefits

• Higher compensation when

compared to peers

• Better career advancement

opportunities only in New York

office

• Constant change of leadership

• Micro-management

• Unrealistic expectation

• Remote working opportunity

• Flexible work hours• Higher compensation for

freshers

• Good family health insurance

and annual bonus

• Lower job security

• Lack of advancement

opportunities

• Slower growth of entry level

employees

Hiring Strategy: Key talent attrition & retention factors for peer employers

• Micro-management

• Quarterly layoffs

• No compensation for additional

responsibilities

• Bottom heavy in terms of

experience

• Frequent employee restructure

• Good work life balance

• Generous holiday allowance

• Employee share schemes

including free shares

• Option to choose additional

cash lump sum or other benefits

(Retail, holiday voucher)

• No new learning, same work

gets repeated

• Lack of recognition and

appreciation

Source : DRAUP

11

11

An optimal JD – How should HRs create one that attracts the best talent

Define Foundational

Skills

Make it Inclusive

Highlight key

organizational values that

data scientists prefer

1. Use gender neutral titles in job descriptions.

Avoid including words in your titles like “hacker,” “rockstar,” “superhero,” “guru,” and “ninja,” and use neutral, descriptive titles like

“engineer,” “project manager,” or “developer.”

2. Avoid use of gender-charged words

Examples: “Analyze” and “determine” are typically associated with male traits, while “collaborate” and “support” are considered female.

3. Use Draup platform to screen JD for inclusive language that doesn’t switch some people off.

• Culture that promotes learning and innovation

• Action oriented and fast paced environment

• Culture that supports risk taking behaviour

Define Specializations Define Expected Responsibilities across key business function⁻ Sales & Marketing⁻ Finance⁻ Operations⁻ Customer Experience

Choose Expected Skills across Machine learning, Inferential Statistics, Deep Learning NLP or Computer Vision

Specify use cases the hire is expected to work on

• Identify core skills required – Statistics, Machine Learning, NLP or Computer vision or Speech Recognition

• Specify Programming environment – R or Python or any other

• Provide Infrastructure preferences – Cloud and GPUs

• Define expected frameworks understanding based on team’s current tech stack

Source : DRAUP

12

12

Discovery : How can HRs find pool of candidates

What are the kewords for Each title

Role Search Keywords

Data Scientist("Data Scientist" OR “Applied Researcher" OR “Data Modeling Scientist" OR “Data Modeling Specialist" OR “Data Modeling Engineer" OR “Data Mining Scientist" OR “Data Mining Specialist" OR “Data Mining Specialist" OR “Algorithm Scientist" OR “Algorithm Engineer" OR “Algorithm Specialist")

Applied Data Scientist-

NLP

("Data Scientist" OR “Applied Researcher" OR “Data Modeling Scientist" OR “Data Modeling Specialist" OR “Data Modeling Engineer" OR “Data Mining Scientist" OR “Data Mining Specialist" OR “Data Mining Specialist" OR “Algorithm Scientist" OR “Algorithm Engineer" OR “Algorithm Specialist" OR "Deep Learning") AND ("NLP" OR "Natural Language Processing” OR “NLTK” OR “NLG” )

Applied Data Scientist-

Computer Vision

("Data Scientist" OR “Applied Researcher" OR “Data Modeling Scientist" OR “Data Modeling Specialist" OR “Data Modeling Engineer" OR “Data Mining Scientist" OR “Data Mining Specialist" OR “Data Mining Specialist" OR “Algorithm Scientist" OR “Algorithm Engineer" OR “Algorithm Specialist" OR "Deep Learning") AND ("Computer Vision" OR "Image Processing“ OR “OPENCL”)

Data Scientist

– Speech Recognition

("Data Scientist" OR “Applied Scientist" OR “Data Researcher" OR “Applied Researcher" OR “Data Modeling Scientist" OR “Data Modeling Specialist" OR “Data Modeling Engineer" OR “Data Mining Scientist" OR “Data Mining Specialist" OR “Data Mining Specialist" OR “Algorithm Scientist" OR “Algorithm Engineer" OR “Algorithm Specialist" OR "Deep Learning") AND ("Speech Recognition" OR "Automated Speech Recognition" OR "ASR" OR "Voice Recognition" OR "Acoustics" OR "PERL")

Source : DRAUP

13

13

Evaluate a Candidate: Draup Framework for evaluating capabilities of a Data Scientist

Advanced Statistical

Concepts Understanding of Bayesian and inferential statistics,

including z-test, t-test, regression, Forecasting etc.3

Programming for Data

Science Knowledge of R or Python 3

Knowledge of Machine

LearningAlgorithms like Naïve Bayes, SVN, Decision Trees,

Random forest etc.5

Knowledge of Deep

learning and Neural Net

(Not Mandatory)

Deep Learning Frameworks- Tensorflow, Keras or

Theano3/4/5

Big Data SkillsKnowledge of Hadoop ecosystem or Distributed File

Systems2

Persuasive

CommunicationAbility to convey results to business stakeholders 1

Analytical IQ &

Problem SolvingAbility to break down problems 4

Behavioral

competenciesCuriosity, Challenge Driven, Creativity 3

Domain UnderstandingKnowledge of banking function corresponding to

roles4

Parameter Description Weightage (out of 5)

AdvancedStatisticalConcepts

Programming

Knowledge ofMachine Learning

Knowledge ofDeep learning and

Neural Net

Big Data SkillsPersuasive

Communication

Analytical IQ &Problem Solving

Behavioralcompetencies

DomainUnderstanding

Draup Framework for Data Scientist Evaluation

Source : DRAUP

14

14

To evaluate exceptional data scientists look at their contributions to opensource programs and competitions

Look Beyond the Resume

Evaluate Candidates holistically based on their portfolio and Open

Source Contributions

• Kaggle has ~100,000 data scientists

• Review Candidates portfolio and contribution to

challenges through Kaggle Kernels

• To hire experts, refer the Kaggle ranking boards

• With 12.2 million members contributing to 31.1

million projects, GitHub is the largest online

community of developers

• A developer’s area of interest and proficiency can

be understood from the Contributions and

Repositories sections of the profile.

DRAUP Platform provides an integrated view of Kaggle, Github and Stack overflow

profiles of a data scientist

Source : DRAUP

15

15

Data Scientist – Computer Vision: Sample Talent Profiles (1/2)

Yedidyah DordekEducation: MSc, Machine

Learning, Signal Processing

Key Skills: Python, Matlab, OpenCV, Scikit-learn, Tensorflow, Keras

• Develop computer vision and deep learning algorithms for real-time and high speed vision

systems

• Research, fast prototyping, feasibility studies, specification and implementation of product

• Customer support, defining project requirements end to end from planning to integration

Note : DRAUP’s proprietary talent module was used to analyze talent by locations and skill sets

Key Skills: Digital Image Processing, TensorFlow, Keras

• Develop sophisticated computer vision algorithms to extract relevant information from

histology images and leverage that information using machine learning and statistical

algorithms to predict cancer progression and response to treatment.

• Develop random forest classifiers combining texture features (local binary patterns and

Haralick features) for object classification

• Develop convolutional neural network (deep learning) algorithms for automatic object

detection

Sr Data Scientist – Computer VisionExperience in Current Role: 1+ Years

Total Experience: 10+ Years

Data Scientist – Computer VisionExperience in Current Role: 1 Years


Yu MaoEducation: MS Computer Vision,

Carnegie Mellon University

Key Skills: Tensorflow, Kaffe, OpenCV

• Produce data visualizations to communicate up and across the company Design of

Experiment (DOE) for engineering studies and large scale user studies Conduct/Support

data collection and analysis with other groups.

• Define feature specs and expected user experience based on data

• Build tools for analysing and visualizing data

Data Scientist – Computer VisionExperience in Current Role: 11 Months


Shahab ArabshahiEducation: PhD Physics,

Florida Institute of Technology

Key Skills: Python, SQL, Image Processing, IDL, Matlab, MPI Library

• Develop algorithms in the fields of Computer Vision, Machine Learning and Deep Learning.

• Work with the system, physics, software, qualification and applications groups

• Provide software specifications and production code on time to meet project milestones

• Engage in customer facing activities to aid algorithms' proliferation at customer sites

Senior Data Scientist – Computer

VisionExperience in Current Role: 4 Months


Nishant VermaEducation: PhD Biomedical

Informatics, University of Texas

Source : DRAUP

16

16

Data Scientist – Natural Language Processing: Sample Talent Profiles (2/2)

Key Skills: Java, Python, Ontology Creation, Computational Linguistics

• Develop and engineer NLP software (Java) to linguistically process large volumes of data

• Statistically evaluate the performance of in-house NLP tools using Python

• Regression testing of customized software

• Write documentation for both customized and in-house software

Note : DRAUP’s proprietary talent module was used to analyze talent by locations and skill sets

Data Scientist - NLPExperience in Current Role: 3 Months


Key Skills: NLTK, SpaCy, xSQL, Bayesian Statistics

• Optimize user experience by data-mining and analysing chat transcripts between

customers and tech support agents.

• Extract data with xSQL; mined data with APIs (AWS, IBM Watson, Google, Intercom)

• Apply statistical analyses and machine learning techniques such as clustering, regression,

natural language processing, etc.

• Coordinate projects along data analytics life cycles, punctuated by demands from internal

and external customers.

Senior Data Scientist - NLPExperience in Current Role: 1+ Years


Wenqi DongEducation: MS CV ML,

University of Michigan

Key Skills: Gensin, RNN & LSTM

• Responsible for the implementation and evaluation of state of the art algorithms for natural

language processing, machine learning and combinatorial optimization.

• Maintain a hybrid model for natural language understanding in smart home dialog system

• Set up a deep learning model for multi-domain intent and slots detection from Automatic

Speech Recognition results

Data Scientist - NLPExperience in Current Role: 9 Months

Total Experience: 2 Years

Key Skills: Perl, MapReduce, Python Text Mining

• Perform data mining to support new features and analyse large datasets to glean

actionable insights

• Design classifiers and ranking algorithms and perform language processing and query

analysis

• Perform ad-hoc statistical analysis and craft metrics to measure the success of the service

Data Scientist - NLPExperience in Current Role: 1+ Years


Ebrain MirambeauEducation: MS Computational

Linguistics & NLP, University of

Washington

Sharon ChouEducation: PhD Electrical

Engineering, Stanford

University

Bing ZhaoEducation: PhD CS, Carnegie

Mellon University

Source : DRAUP

17

17

Summer Thompson

MBA, Wilmington University

Wilmington, Delaware

Insider Threat

Monitoring (May 2017 – Present)

Fraud Analyst

Recommended Progression

Acquired SkillsFinancial Analysis, Customer

Service, Loans, Credit

• Certified AML Specialist

• Certified Fraud Examiner

• AML Professional Certification

• CISSP

Neighbouring SkillsPython, Data Mining, Information

Security, SQL, SAS, R, Splunk, Anti

Money Laundering

Certifications Required

Keyonna Morrison

Central Piedmont Community

College

Charlotte, North Carolina Area

Corporate Banking

Specialist(Aug 2018– Present)

Fraud Analyst


Acquired SkillsLeveraged Lending, Customer

Service, Commercial Banking,

Financial Analysis




• CISSP

Neighbouring SkillsData Analytics, Python, Data Mining,

Information Security, SQL, SAS, R,

Splunk, Anti Money Laundering


Sample Profiles that possess the skills to move into Fraud Intelligence Analyst roles

Raj Devnani

Professional Accounting, Macquarie

University

New York, New York

Credit Risk (Aug 2017 – Present)

Fraud Analyst


Acquired SkillsInvestment Banking, Credit Analysis,

Equity Research, Credit Risk




• CISSP

Neighbouring SkillsPython, Data Mining, Information

Security, SQL, SAS, R, Splunk, Anti

Money Laundering


Accounting, University of

Arkansas

Fayetteville, Arkansas Area

Teller Customer

Service(Jun 2016 – Present)

Fraud Analyst


Acquired SkillsCustomer Service, Strategic

Planning, Project Management




• CISSP

Neighbouring SkillsData Analytics, Python, Data Mining,

Information Security, SQL, SAS, R,

Splunk, Anti Money Laundering


Nicholas Teaster

Source : DRAUP

18

18

Stephanie Talebli

BBA Finance, The University of

New Mexico - Robert O. Anderson

School of Management

Greater New York City Area

Credit Analyst (2017 – Present)

Quantitative

analyst


Acquired SkillsCredit Analysis, Financial Analysis,

Commercial Banking, Portfolio

Management

• Chartered Financial Analyst

(CFA)

• Certificate in Quantitative

Finance (CQF)

Neighbouring SkillsPython, C++, C#, SQL, R, SAS,

Quantitative Research, Statistical

Data Analysis


Chris Cziesla

Bachelor's Degree Economics

and Computer Science,

Claremont McKenna College

San Francisco Bay Area

Investment Banking

Analyst (Jun 2018– Present)

Quantitative

analyst


Acquired SkillsData Analysis, Financial Analysis,

Microsoft Office, Customer Service


(CFA)

• Certificate in Quantitative Finance

(CQF)

Neighbouring SkillsPython, C#, SQL, R, SAS,


Data Analysis


Sample Profiles that possess the skills to move into Quantitative analyst roles

Hanchen Liang

Master of Science (M.S.) Financial

Engineering

University of Michigan

Greater New York City Area

Consultant, Pricing

and Valuation (Jan 2016 – Present)

Quantitative

analyst


Acquired SkillsData Analysis, Quantitative Analytics,

SQL, Financial Analysis, Financial

Modelling


(CFA)

• Certificate in Quantitative

Finance (CQF)

Neighbouring SkillsPython, C++, C#, R, SAS, Quantitative

Research, Statistical Data Analysis


Bachelor's Finance and

Entrepreneurship, University of

Utah

Salt Lake City, Utah

Equity Research

Analyst (Jan 2019 – Present)

Quantitative

analyst


Acquired SkillsFinancial Analysis, Analytical Skills,

Investments, Business Strategy,

Microsoft Office

• Chartered Financial Analyst (CFA)

• Certificate in Quantitative Finance

(CQF)

Neighbouring SkillsPython, C++, C#, SQL, R, SAS,


Data Analysis


Noe Bellet

Source : DRAUP

19

19

US: ~65% of AI & Big Data Talent in US is concentrated across Bay Area & Seattle; Central and Eastern region’s talent is largely spread across start-ups

Seattle 63K+ Talent

22%

Others

14%

Bay Area 110K+ Talent

39%

Phoenix 9K+ Talent

3%

Dallas 17K+ Talent

6%

Austin 10K+ Talent

4%

Boston 35K+ Talent

12%

~ 280,000AI/Big Data TalentTech Companies: Microsoft,

Amazon, Expedia, Facebook,

Banks & Financial Services:

Capital One, JP Morgan Chase

Tech Companies: Google,

Facebook, Apple, Oracle, Uber


Wells Fargo, Bank of America

Tech Companies: Microsoft, Wayfair,

Amazon, Google


Fidelity Investments, State Street Corp

Tech Companies: Microsoft,

IBM, AT&T, Verizon


Wells Fargo, American Express

Tech Companies: IBM, Microsoft


Citi, Bank of America, JPMorgan

Chase

Tech Companies: Dell,

General Motors, Oracle


Charles Schwab, Citi

1. Higher pay: Tech Giants like Amazon, Google,

Facebook attract tremendous technology talent

due to the lucrative compensation they offer

2. Subject matter experts and thought

leadership: Presence of thought leadership and

subject matter experts in an organisation helps

companies retain the data science talent

3. Sense of security: Tech companies are

predicated on data science initiatives which

inculcates a sense of security within talent and

also promises new challenges

Top Data Science Practices

Source : DRAUP

20

20

AI/Big Data

start-ups

G500

companies

AI/Big Data

Universities

Bay Area: Hotspot for mature AI & Big Data talent pool in Data Science and Data Management roles; talent cost is much higher than other areas in US

Seattle

San Francisco Bay Area

~110,000Employee talent

AI/BD Talent Hubs

San Francisco

Palo Alto

Mountain View

Santa Clara

Sunnyvale

San Jose

San Mateo

$195K Median Cost

Boston

Dallas

Austin

Phoenix

36%

Data Scientist

35%

Analyst:Data Management

11%

DatabaseEngineering

8%

Data Architect

10%

Applied AI

Headcount Distribution(By Skills)

56K+

~ 1,500

11

60K+

Enterprise & Consumer Software,

Semi-Conductor, Consumer

Electronics, BFSI

Key Employers Total Employable Talent

Top Verticals

Total Number

Note : DRAUP’s Talent Simulation Module . We have analysed ~2,000 tech companies and ~10,000 start-ups.

Source : DRAUP

21

21

Bay Area – x

Data

Scientist

Database

EngineerApplied AI

Data

Architect

Analyst –

Data

Management

~2000 25% 4%64%

~3000 6% 6% 16%

~4600 4%Responsible for key AI-based product development activities across major business units such as Tensor Flow, Waymo, DeepMind, Android, Youtube, Chrome, Maps etc.

~1000 21% 10% 58%Oracle AI Platform Cloud Services team is based out of the Bay Area centre.

~1200 15% 1% 2%Uber’s major share of talent pool responsible for Fleet management services, location intelligence and autonomous driving group are based out of Bay Area.

AI & Big Data

Headcount

Top 5 tech companies AI & Big Data head count ~12000

Facebook’s key AI product priorities such across DeepText, Facial recognition, Oculus Advanced Development Group, Instagram & WhatsApp etc are based out of Bay Area

Apple’s HQ - 80% of R&D talent is out of this center. Apple’s core ML teams for Siri, Developer’s platform, iOS, iCloud, etc are all based out of this center

66% 3% 11%16%

2% 5%

16%56%

1%10%

4%78%

Note : DRAUP’s Talent Simulation Module. We have analysed ~2,000 tech companies and ~10,000 start-ups.

Bay Area: Data Science is the most employed role across top players with nearly every engineering priority focussed on building cross industry AI platform

Source : DRAUP

22

22

35%

Data Scientist

30%

Analyst:

Data

Management

20%

Database

Engineering

5%

Data Architect

10%

Applied AI

Headcount Distribution

(By Skills)

Seattle, Washington


$185K Median Cost

Boston

Dallas

Austin

Phoenix

Bay Area

AI/Big Data

start-ups

G500

companies27K+

~ 350

AI/Big Data

Universities 4

36K+

Enterprise & Consumer Software and

BFSI


Seattle: 2nd largest talent hotspot in US with majority of talent in Data management and Data Scientist role employed across G500 employers and vertical specific start-ups

Key Employers Total Employable Talent

Top Verticals

Total Number

Source : DRAUP

23

23

Data

Scientist

Database

Engineer

Applied

ML

Data

Architect

Analyst –

Data

Management

~2000

Build Deep Learning tools and APIs, and contribute to open source frameworks such as MXNet and Keras. Help build industry-leading conversational technologies and machine learning systems that powers Alexa.

~300

Design, build, scale, and optimize the data infrastructure as a highly innovative service that enhance and innovate the Expedia e-Commerce ecosystems. Build full-stack data from multiple data sources leveraging cloud systems, and structured/unstructured data.

~2100

Develop new NLP capabilities and text understanding APIs in Text Analytics Cognitive Service. Extract insights from unstructured data and build predictive solutions for core NLP problems.

~200

Core Machine Learning Team in Seattle develops and optimizes various algorithms including Neural Networks, Boosted Decision Trees, Sparse Linear Models, and Deep Learning for several ranking teams including Ads, Feed, Search, Instagram and others.

~200

51% 18% 27% 3% 1%

24% 44% 15% 17% 0%

61% 23% 9% 5% 2%

66% 10% 20% 2% 4%

22% 29% 42% 7% 0%

Starbucks AI and analytics team in Seattle is using customers’ spending patterns data to develop insights that would help in generating personalized product promotions such as user based reward cards.

AI & Big Data

Headcount

Top 5 tech companies AI & Big Data head count ~30000


Seattle: Microsoft Azure ML, Facebook Deep Learning and Amazon Core NLP teams are based out of Seattle; Brick and Mortar players like Starbucks offer personalised customer service by using ML technology in analytics

Source : DRAUP

24

24

40%

Data Scientist

30%

Analyst:

Data Management

15%

Database

Engineering

8%

Data Architect

7%

Applied AI

Seattle

Boston


$155K Median Cost

Boston

Dallas

Austin

Phoenix

Bay AreaKey Employers Total Employable

Talent

8.4K+

~ 300

10

Top Verticals

26K+

Total Number

Healthcare, Retail


AI/Big Data

start-ups

G500

companies

AI/Big Data

Universities

Boston: AI/BD employed talent is consolidated in start-ups from Healthcare and Retail industries; Boston Universities such as MIT and North-eastern University provide mature engineering talent pool

Headcount Distribution

(By Skills)

Source : DRAUP

25

25

Data

Scientist

Database

Engineer

Applied

ML

Data

Architect

Analyst –

Data

Management

~30 22% 0%

~130 10% 2% 5%

~72 5%

~130 2%Wayfair launched “Search with Photo” a new feature that leverages artificial intelligence to assist consumers in the product buying process

~190 17% 15% 0% 13%

Creating AI/ML driven products that combine natural language understanding with batch and real time sales and service models with the goal of improving the omni-channel customer experience in a measurable way

ML/Big Data

Headcount

Top 5 tech companies AI/Big Data head count ~550

Key focus areas are Computational Biology, Computer Science, Cryptography, Machine Learning, Systems and Security

The R&D team in Boston area develops software automation infrastructure for Amazon's Kiva robotic systems in an integrated service-oriented cloud computing environment.

Embedded Systems and Mobile Apps for Android are some of primary activities carried out Boston Area

19%0%6%

3% 13%

13%

10% 0%

55%

72%

70%

62%

70%

16%


Boston: Along with Tech giants, traditional Retail and Banking players are also key employers of Data Science skills primarily working for digital transformation of backend and frontend operations

Source : DRAUP

26

26

1

2

3

4

5

6

Maturity of ML/BD courses

Maturity of the courses has been calculated by analyzing

depth of courses, number of enrollments, no. of citations of

publication by professor teaching the course etc.

No. of ML/BD courses

Total number of ML and Big Data courses taught in the

university

No. of ML/BD publications

Number of ML and Big Data publications done by the

professors/phds of the universities

No. of Masters/Phds

CoE of tech companies

Start-ups born

Number of ML and Big Data publications done by the

professors/phds of the universities

If tech companies have opened Centre of Excellence for AI

or Big Data by tying up with the university

Number of startups born from the university

Note : The ranking shown is a sample

Note : DRAUP’s Talent Module analysed 100,000+ global universities to identify top universities and key courses in software engineering, ML and Big Data

No. of ML/BDcourses

No. of ML/BDpublications

No. of Masters/Phds

CoE of techcompanies

Startups Born

Maturity of AL/MLcourses

CMU Cornell UCB MIT Stanford

University Assessment: Stanford has the maximum number of CoE collaborations with tech companies as well as mature ML courses; CMU has the most number of ML/BD courses and publications

Source : DRAUP

27

27

University Courses: Maturity analysis of course-works related to AI & Big Data skills

Intermediate Courses Advanced CoursesBeginner Courses

Mature courses require students to complete course on Advanced Probability Theory and Advanced Statistical Theory and a project

on Advanced Data Analytics

Intermediate courses have a prerequisite to complete one or two of the beginner courses and

require students to complete projects on real world data

Beginner courses do not have any specific prerequisites but prior experience in calculus, probability, statistics etc is

recommended

• Analogical Reasoning• Decision Theory• Fuzzy Logic• Logic Programming• Machine Discovery• Machine Learning• Planning• Qualitative Physics and Model based

reasoning• Search• Temporal Reasoning• Philosophy of AI

Key Courses

• Cognitive Modelling• Genetics Algorithms• Knowledge Representation• Computer Vision• Non-Monotonic reasoning• Robotics• Cognitive Science• NLP

Key Courses

• AI & Manufacturing• AI & Medicine• AI & Legal reasoning• Artificial Life• Computational Biology• Emotion• Neural Networks• Distributed AI• Integrated AI Architectures• Intelligent Tutoring• Expert System

Key Courses

Universities are specializing in courses with different levels of maturity in the field of AI/ ML & Data Science

Types of course on the basis of maturity level


Source : DRAUP

28

28

Carnegie Mellon University

Tech Collaboration & CoEs

Top AI Awards1. Continuously ranked

amongst Top 5 schools 2. Faculty and alumni have

won multiple prestigious awards and million dollar research grants

Marquee AI Alumni

1. Andrew Ng, Prof at Stanford, co-founder of Coursera

Key Start-Ups Born Key Programme Offered

Tech

Collaboration

& COEs

Major PhD & Professors profiles

Entry Level• Introduction to Machine Intelligence

• Concepts in Artificial Intelligence• AI, Society and Humanity

Intermediate

• Neural computation • Cognitive robotics• Introduction to deep learning • Introduction to Natural language processing

Mature

• Deep reinforcement learning and control • Vision sensors• Human-Robot interaction • Computational perception

CMU researchers are working with Amazon to improve Alexa.

Sony to collaborate with CMU on AI and robotics research. Initial R&D efforts will focus on food preparation, cooking and delivery

PROFILE 1DESIGNATION: Professor, Machine learning Department

Education: : Ph.D. in Molecular Biology and Biochemistry,Ph.D. in Computer Science No of publications: 102

Research Areas: Machine learning, statistical methodology, large-scale computational system and architecture etc.

Current works: Foundations of statistical learning , Framework for parallel machine learning on big data computational and statistical analysis of genes, application of statistical learning in social networks.

PROFILE 2DESIGNATION: Professor, Machine learning Department

Education: Ph.D. in Learning Deep Generative Models,Masters in Optimization Algorithms for LearningNo of publications: 52

Research Area: Deep Learning, Probabilistic Graphical Models, and Large-scale Optimization.

Current works: : Structured Control Nets for Deep Reinforcement Learning , Neural Models for Reasoning over Multiple Mentions using Coreference, Neural Map: Structured Memory for Deep Reinforcement Learning etc.

K&L gates, a law firm gave $10m to CMU to study the ethics of AI

General Motors have set up a GM-CMU collaborative research lab on autonomous driving

Argo.aiRaised $ 1B from Ford

Wombat SecurityAcquired by Proofpoint for

$225m

PetuumHas raised over $108 m

US: Leading tech giants such as Apple, Google and Amazon have collaborated with CMU for research in the field of AI, Robotics and Deep Learning


Source : DRAUP

29

29

John Hopkins University

Tech Collaboration & CoEs:

Marquee AI Alumni:

1. President of Drive.ai

Key Start-Ups Born Key Programme Offered

Tech

Collaboration

& COEs

Major PhD & Professors profiles

Entry Level• Computer graphics

• Parallel Programming• Digital Health and Biomedical Informatics

Intermediate

• Algorithms for Sensor-Based Robotics• Natural language processing• Machine Translation• Representation Learning

Mature• Advanced Topics in Genomic Data Analysis• Deep Learning for Image Understanding• Vision as Bayesian Inference• Modern Biomedical Imaging Instrumentation

and Techniques

US: John Hopkins has collaborated with healthcare companies such as Medopad and Bayer to leverage predictive analytics in healthcare

Acquired by Baidu

Has raised $7.7m in funding

Collaboration with UK medtech firm Medopad to leverage deep data sets for predictive analysis for at-risk patients and communities.

The Felix project, funded by Lunstgartenfoundation, wants to develop deep learning algorithms to study MR and CT images of pancreas.

OurCrowd and The Johns Hopkins University have partenered brings Israeli health IT startups to Johns Hopkins for clinical trials and technology validation

PROFILE 1DESIGNATION: Associate Professor, Computer Science

Education: B.S in Computer Science, BS in Computer Engineering, Ph.D in Computer ScienceNo of publications: 40

Research Areas: Natural Language processing, Machine learning, Health informatics, Clinical NLP , Computational Epidemiology

Current works: Social monitoring pf public health, Bayesian Modeling of Lexical Resources, Multi-task Domain Adaptation for Sequence Tagging, Harmonic Grammar, Optimality Theory, and Syntax Learnability

PROFILE 2DESIGNATION: Assistant Professor, Computer Science

Education: BS in Computer Science, MS in Computer Science , PhD in Computer Science and Linguistics No of publications: 42

Research Area: natural language processing, artificial intelligence, machine learning, linguistic semantics

Current works: Neural Machine Translation Using Natural Language Inference, Neural Models of Factuality, Cross-lingual Semantic Parsing, Semantic Proto-Role Labeling


Source : DRAUP

30

30

Description Focus Areas Presence in United States

• Supports the data science profession with practical resources for data professionals while improving the practice of data science, accrediting schools, and establishing model ethical codes.

Data Science, Data Engineering

Yes

• An international community of 12,500 operations research and analytics professionals and students. The association has presence in over 90 countries

• Conducts meetings and conferences, helps in professional development and recognises excellence in the field of operation research and data science & analytics

Operations Research, Management Science, Data Analytics, Business Intelligence

Yes

• Promote research and responsible use of artificial intelligence• Aims to increase public understanding of artificial intelligence, improve the

teaching and training of AI practitioners, and provide guidance for research planners and funders regarding the importance and potential of current AI developments and future directions

Artificial Intelligence

Yes

US: Top Data Science and Artificial Intelligence associations in the United StatesASSOCIATIONS

Association for the Advancement of Artificial Intelligence

Institute for Operations Research and the Management Sciences

International Data Engineering And Science Association

OTHERS

Source : DRAUP

31

31

www.draup.com

https://twitter.com/zinnov

http://facebook.com/ZinnovManagementConsulting

https://www.linkedin.com/company-beta/30724/

https://www.youtube.com/channel/UCgEN1J9L2jTqBOlWsH65w8w

http://www.zinnov.com/

mailto:[email protected]?subject=How%20can%20we%20help%20you?

http://www.draup.com/

https://twitter.com/draupplatform

https://www.facebook.com/DraupPlatform/

https://www.linkedin.com/company/draupplatform/

mailto:[email protected]?subject=How%20can%20we%20help%20you?

http://draup.com/

ai hiring: playbook · choose expected skills across machine learning, inferential statistics, deep...

Documents