ai hiring: playbook · choose expected skills across machine learning, inferential statistics, deep...
TRANSCRIPT
January, 2019
AI Hiring: PlaybookData Scientist roles
Zinnov (Draup) Point of View
Source : DRAUP
2
2
What are the roles in AI
ROLES SKILLS SAMPLE WORKLOADS
• Develop scalable tools and leverage ML and deep learning models to solve real-world problems in areas such as Speech Recognition and NLP
• Collaborate with all lines of businesses and functions in the Corporate Investment Banks: Markets, Global Investment Banking, Corporate Banking, Technology and Operations
Applied Data Scientist – NLP
Data Scientist skillsAdditional Skills Required: NLP Libraries - NLTK, SpaCy, GensinAdditional Knowledge of transfer & Sequential learning RNN & LSTM
Data Scientist Skills Additional Skills: Gurobi, CPLEX, Symphony, Axioma, OpenCV, Caffe, Torch, TheanoAdditional Knowledge of transfer & Sequential learning CNN, RNN & LSTM
• Use Deep Learning techniques arising in Computer Vision• Experimental models to leverage facial recognition for advanced security and KYCs• Video surveillance analytics• Building facial based authentication for payment using selfies
Applied Data Scientist – Computer Vision
• Work on statistical and ML techniques & develop segments, predictive models, experimental designs & decision analysis
• Gather, manipulate & analyze large data sets from multiple sources & develop algorithms to optimize customer segmentation, customer retargeting, operational optimization etc.
Python, SparkML, machine learning APIs and computational packages (TensorFlow, Theano, PyTorch, Keras, Scikit-Learn, NumPy, SciPy, Pandas, statsmodels)Deep Learning(Optional): CNN, RNN/LSTM, GAN
Data Scientist
Data Scientist Skills Additional Skills: Kaldi, Attila, HTK, Sphinx, SRILM, OpenFSTUnderstanding of Dialog Management, Automatic Speech Recognition(ASR), Audio Signal Processing
• Develop a vocal-tract-length-normalization training workflow and designed an VTLN adaptation process for ASR
• Develop an online information retrieval system for language model adaptation for ASR• Develop an acoustic modelling switching based on feature classification, model adaptation, and
acoustic data clustering
Applied Data Scientist – Speech Recognition
Source : DRAUP
3
3
Job Responsibilities : Key business areas for roles in AI
SALES & MARKETING FINANCE CUSTOMER EXPERIENCE PRODUCTOPERATIONSROLES
• Automating call distribution
• Customer Segmentation and Analytics
• Customer Sentiment Analysis
• Financial trading (High frequency trading enabled by AI)
• Predictive maintenance & replacement
• Data collection from sensors in real time
Applied Data Scientist –NLP
Applied Data Scientist – Computer Vision
• Context aware marketing• Personalized marketing• Customer segment analytics
• Anticipating future customer purchases and presenting offers accordingly
• Improving media buying• Monitoring social media
comments to determine overall brand affinity and issues
• Tailoring promotions (online or offline)
• Customer Service Automation
• Facial and voice-based biometrics
• Optical Character Recognition
• Customer emotion analytics
• Visual search capability
Data Scientist
• Customer Support through Chatbot
• Automated Voice Response• Social Analytics &
Automation
Applied Data Scientist – Speech Recognition
• Financial Voice assistance• Real time speech analytics
(RTSA) technology
• Parsing & Machine Translation
• Part of Speech tagging
• Leveraging machine vision to tag the images taking into the Account of users’ preference and improve product discovery
• Image recognition & visual analytics
• Medical imaging insights
• Financial Analysis• Algorithmic Trading• Investment strategies• Scanning legal and regulatory
text for compliance issues
• Voice Authentication
• Remote KYC based on facial recognition
• Chatbots• Design Patterns
Prediction
• Autonomous Vehicles
• Part of Speech Tagging
• Speech recognition analysis
Source : DRAUP
4
4
Hiring a Data Scientist is tough in US
- Attrition rate: 9% 14%
- Average Salary ~84K ~110K
- Average Tenure
in a roles
Software Engineer Data Science
1.5
Years
2.4
Years
Source: DRAUP Talent Module
Note: Average values derived from analysis of 100 US MSAs updated as of Dec, 2018)
2018 2019 2020 2021
AI Job Openings
~100K
Global AI Job Openings
~450K - Average Time to
Hire:
48
Days
37
Days
Source : DRAUP
5
5
AI Tools and Technologies Stack : What IT Tools should a Data Scientist know
Infrastructure &
Processing
AI Frameworks
Applied AI as-a
Service
Big Data Platform
GPUs CPUs AWS Lambda
TensorFlow CAFFE TORCH THEANO CNTK Keras
EMR DataBricks
Azure FunctionGoogle
Function
Spark
Image- Recognition
AASNLP AAS Vision AAS Bot AAS Speech AAS
Amazon Lex Amazon PolyMSFT
BoTFramework
AWS Rekognition
Programming
Languages
R
Pyth
on
O
P
E
N
C
L
Integrated AI
Platforms
AZURE ML AWS Deep Learning AMI Google ML
DataBricks ML AWS SageMaker IBM Watson Studio NanoNets
NoSQL DBs
SciKit Learn
Google Vision API
Source : DRAUP
6
6
CS
Fundamentals &
Programing
Inferential and
Bayesian
Statistics
Data Modelling
Applied
Frameworks/Libraries
Domain
Understanding
Big Data
Engineer- ML
Applied ML
Engineer
Core ML
EngineerBusiness/Data
Analyst
Different Personas of a Data Scientist – Understand what the Team needs
1. Big Data Engineer- MLHave an emphasis or specialization in distributed
systems and big data. A data engineer has
advanced programming and system creation skills.
He/She can do some basic to intermediate level
analytics
2. Applied ML EngineerData Scientists that focus on leveraging ML
algorithms to solve NLP, Speech or Vision
problems
3. Core ML EngineerData Scientists that build core techniques and
frameworks for Machine learning use cases
4. Business/Data AnalystBusiness and data focused developers that build
queries and conduct statistical analytics to solve a
business problem
Source : DRAUP
7
7
Skill Anatomy of a Data Scientist in Banking & Finance: What to look for in a profile
Ph.D or Masters or Bachelors degree in Computer Science, Statistics, Mathematics, Economics, Finance, Engineering
Must know: Python, R
Good to have: Java, SQL
Analytical thinking, verbal communication, cross-functional understanding, investigation synopses, curiosity, challenge driven, creativity, passionate &
resilient, problem solving, team player
Education
Languages
Core Skillsets/Concepts
Experiences
Behavioral Skills
Frameworks/Libraries, Tools
Foundational
Banking Specializations
Recommendatory
Regression Analysis, Decision Trees, Ensemble Algorithms, Neural Networks, Time Series Analytics, Clustering, Random Forest, Gradient Boosting,
Text Mining
Transformation of traditional technology areas:
Personalized marketing: Targeted customer analytics, personalized offers on loans, cards, traditional credit scoring systems
Customer experience: Personalized customer service, sentiment analysis, customer churn prediction, predictions for investments, Customer Lifetime
Value (CLV)
Fraud detection and risk management: Text mining and information retrieval task, understanding of AML/CTF legislation (Anti Money laundering and
Counter Terrorism Financing), credit risk modelling, analysing loan data and credit card history, AML pattern recognition
Customer service automation: Machine learning enhanced chatbots, virtual customer services, voice activated financial applications
Emerging technology areas:
Robo advisory: Algorithm-based investment advisory services by monitoring events, stock prices, bond price trends etc.
Alternative credit scoring: Credit scoring based on large data sets such as social media footprint, non-credit payment history (e.g. rent, utility
payments), employment history, education background, browsing history
Core Skills:
Scikit-Learn, TensorFlow, Caffe, PyTorch, Keras, Hadoop
Adjacent Skills:
SAS, Hive, MatLab, Spark, MXNet
Source : DRAUP
8
8
Big Data Engineer-
MLApplied ML
EngineerCore ML Engineer Business/Data
Analyst
Most used Data Scientist titles in the Banking & Finance industry
Big Data Engineer, Big
Data Developer, Big Data
Application Architect,
Hadoop/Spark Developer
Big Data Engineer,
Programmer Analyst- Big
Data & Data Lake
Integration, Big Data
Developer
Big Data Engineer, Big
Data Architect, Big Data
Solution Engineering, Big
Data Developer
Big Data Developer, Big
Data Hadoop Developer,
Big Data Engineer, Big
Data Consultant, Hadoop
Developer
Technology Analyst,
Fraud Analyst, Automation
Specialist, Cyber Security
Data Analytics, Data
Scientist - Robotics
NLP Engineer, Machine
Learning / NLP Engineer,
Applied Machine
Learning Engineer-Fraud
Prevention
NLP And Deep Learning
Consultant, NLP
Software Engineer
Technology Analyst,
Stats Analyst,
Algorithmic Trading
Strategist
Machine Learning
Engineer, Machine
Learning Consultant
Machine Learning
Engineer, Machine
Learning Consultant
Machine Learning
Consultant, Machine
Learning Developer,
Machine Learning
Engineer
Associate- Machine
Learning, Machine
Learning Strategist,
Machine Learning
Researcher
Technical Analyst,
Credit Risk Analyst,
Tableau Developer,
Finance Analyst,
Quantitative Analyst
Quantitative Analytics
Consultant, Credit Risk
Analytics Consultant,
Tableau Developer
Quantitative Research
Analyst, Data Analyst,
AML Data Science
Analyst, Risk
Management Analyst
Quantitative Analyst,
Investment Banking
Analyst, Compliance
Analyst, Equity Sales
Strategist
Source : DRAUP
9
9
1. Type of projects: Data scientists usually prefer to work in high impact projects which
involve emerging banking areas rather than working on ad-hoc jobs that require getting
numbers from a database or ETL (Extract, Transform, Load).
What do Data Scientists look for in a job?
2. Location: Location plays a major role for data scientists to chose a job as they prefer to
work in tech hotspots of the industry that they want to venture into. New York, San Francisco
and Dallas are the preferred locations by data scientists in the banking and finance industry.
The presence of AI and data science pool and digital centres are highest in these areas.
Examples of appealing projects
Within US
New York
San Francisco
Dallas
3. Tools & Technologies: Open-source tools and technologies are preferred by data
scientists as the support from the community and feeling of contribution is higher there. The
infrastructure that the company has also plays a major role - lack of infrastructure is a major
turn-off for them.
4. Flexibility: Data scientists do not prefer sticking to a typical 9-to-5 banking job. They prefer
higher flexibility in terms of work timings, leaves, clothing preference and availability of remote
working opportunity.
Outside US
London, UK
Toronto Bangalore
Flexible work timing Casual Dress Code
More than 30 days of leaves Work from home option
Preference in terms of flexibility
Robo Advisory
5. Team and guidance: Data scientists look for teams where there is guidance and thought
leadership in terms of the technology area. Learning is much higher in companies which have
a balance in terms of experience, rather than a firm which is bottom heavy.
Finance firms with good AI leadership
Preferred tools
TensorFlow Spark SQL
RapidMiner
Keras
Build, optimise and
train new or existing ML
models
Fraud Detection
Apply neural network to
detect fraudulent
activities
Anaconda
Chatbots & Voice Banking
Implement conversational
banking which generates real
time conversations
Source : DRAUP
10
10
Factors contributing to employee experiences
Job Security & PromotionsTraining & Management Flexible Work Culture Pay Benefits & Fair Performance
Peer Companies
Positive Factors Negative Factors
• Extensive training program and
continued education support
• Progressive management
• Good place to learn technical
skills
• Regular work shifts
• Good work life balance
• Flexible timings
• Good parental leave policy
• Medical/Dental insurance
• Tuition reimbursement
• Higher compensation for
freshers
• Higher advancement
opportunities and ease of
mobility within company
• Diverse workforce
• Comprehensive training
program
• Flexibility in schedule
• Remote working opportunity
• Higher paid time off
• Tuition reimbursement • Lack of technology
infrastructure compared to
peers
• Lower job security
• Extensive learning programs
ranging from skills-based
offering and high potential
leadership programs
• Work life balance is not good
• Short lunch and tea breaks
• Long working hours
• Good Health/ Medical/ Dental/
Vision benefits
• Higher compensation when
compared to peers
• Better career advancement
opportunities only in New York
office
• Constant change of leadership
• Micro-management
• Unrealistic expectation
• Remote working opportunity
• Flexible work hours• Higher compensation for
freshers
• Good family health insurance
and annual bonus
• Lower job security
• Lack of advancement
opportunities
• Slower growth of entry level
employees
Hiring Strategy: Key talent attrition & retention factors for peer employers
• Micro-management
• Quarterly layoffs
• No compensation for additional
responsibilities
• Bottom heavy in terms of
experience
• Frequent employee restructure
• Good work life balance
• Generous holiday allowance
• Employee share schemes
including free shares
• Option to choose additional
cash lump sum or other benefits
(Retail, holiday voucher)
• No new learning, same work
gets repeated
• Lack of recognition and
appreciation
Source : DRAUP
11
11
An optimal JD – How should HRs create one that attracts the best talent
Define Foundational
Skills
Make it Inclusive
Highlight key
organizational values that
data scientists prefer
1. Use gender neutral titles in job descriptions.
Avoid including words in your titles like “hacker,” “rockstar,” “superhero,” “guru,” and “ninja,” and use neutral, descriptive titles like
“engineer,” “project manager,” or “developer.”
2. Avoid use of gender-charged words
Examples: “Analyze” and “determine” are typically associated with male traits, while “collaborate” and “support” are considered female.
3. Use Draup platform to screen JD for inclusive language that doesn’t switch some people off.
• Culture that promotes learning and innovation
• Action oriented and fast paced environment
• Culture that supports risk taking behaviour
Define Specializations Define Expected Responsibilities across key business function⁻ Sales & Marketing⁻ Finance⁻ Operations⁻ Customer Experience
Choose Expected Skills across Machine learning, Inferential Statistics, Deep Learning NLP or Computer Vision
Specify use cases the hire is expected to work on
• Identify core skills required – Statistics, Machine Learning, NLP or Computer vision or Speech Recognition
• Specify Programming environment – R or Python or any other
• Provide Infrastructure preferences – Cloud and GPUs
• Define expected frameworks understanding based on team’s current tech stack
Source : DRAUP
12
12
Discovery : How can HRs find pool of candidates
What are the kewords for Each title
Role Search Keywords
Data Scientist("Data Scientist" OR “Applied Researcher" OR “Data Modeling Scientist" OR “Data Modeling Specialist" OR “Data Modeling Engineer" OR “Data Mining Scientist" OR “Data Mining Specialist" OR “Data Mining Specialist" OR “Algorithm Scientist" OR “Algorithm Engineer" OR “Algorithm Specialist")
Applied Data Scientist-
NLP
("Data Scientist" OR “Applied Researcher" OR “Data Modeling Scientist" OR “Data Modeling Specialist" OR “Data Modeling Engineer" OR “Data Mining Scientist" OR “Data Mining Specialist" OR “Data Mining Specialist" OR “Algorithm Scientist" OR “Algorithm Engineer" OR “Algorithm Specialist" OR "Deep Learning") AND ("NLP" OR "Natural Language Processing” OR “NLTK” OR “NLG” )
Applied Data Scientist-
Computer Vision
("Data Scientist" OR “Applied Researcher" OR “Data Modeling Scientist" OR “Data Modeling Specialist" OR “Data Modeling Engineer" OR “Data Mining Scientist" OR “Data Mining Specialist" OR “Data Mining Specialist" OR “Algorithm Scientist" OR “Algorithm Engineer" OR “Algorithm Specialist" OR "Deep Learning") AND ("Computer Vision" OR "Image Processing“ OR “OPENCL”)
Data Scientist
– Speech Recognition
("Data Scientist" OR “Applied Scientist" OR “Data Researcher" OR “Applied Researcher" OR “Data Modeling Scientist" OR “Data Modeling Specialist" OR “Data Modeling Engineer" OR “Data Mining Scientist" OR “Data Mining Specialist" OR “Data Mining Specialist" OR “Algorithm Scientist" OR “Algorithm Engineer" OR “Algorithm Specialist" OR "Deep Learning") AND ("Speech Recognition" OR "Automated Speech Recognition" OR "ASR" OR "Voice Recognition" OR "Acoustics" OR "PERL")
Source : DRAUP
13
13
Evaluate a Candidate: Draup Framework for evaluating capabilities of a Data Scientist
Advanced Statistical
Concepts Understanding of Bayesian and inferential statistics,
including z-test, t-test, regression, Forecasting etc.3
Programming for Data
Science Knowledge of R or Python 3
Knowledge of Machine
LearningAlgorithms like Naïve Bayes, SVN, Decision Trees,
Random forest etc.5
Knowledge of Deep
learning and Neural Net
(Not Mandatory)
Deep Learning Frameworks- Tensorflow, Keras or
Theano3/4/5
Big Data SkillsKnowledge of Hadoop ecosystem or Distributed File
Systems2
Persuasive
CommunicationAbility to convey results to business stakeholders 1
Analytical IQ &
Problem SolvingAbility to break down problems 4
Behavioral
competenciesCuriosity, Challenge Driven, Creativity 3
Domain UnderstandingKnowledge of banking function corresponding to
roles4
Parameter Description Weightage (out of 5)
AdvancedStatisticalConcepts
Programming
Knowledge ofMachine Learning
Knowledge ofDeep learning and
Neural Net
Big Data SkillsPersuasive
Communication
Analytical IQ &Problem Solving
Behavioralcompetencies
DomainUnderstanding
Draup Framework for Data Scientist Evaluation
Source : DRAUP
14
14
To evaluate exceptional data scientists look at their contributions to opensource programs and competitions
Look Beyond the Resume
Evaluate Candidates holistically based on their portfolio and Open
Source Contributions
• Kaggle has ~100,000 data scientists
• Review Candidates portfolio and contribution to
challenges through Kaggle Kernels
• To hire experts, refer the Kaggle ranking boards
• With 12.2 million members contributing to 31.1
million projects, GitHub is the largest online
community of developers
• A developer’s area of interest and proficiency can
be understood from the Contributions and
Repositories sections of the profile.
DRAUP Platform provides an integrated view of Kaggle, Github and Stack overflow
profiles of a data scientist
Source : DRAUP
15
15
Data Scientist – Computer Vision: Sample Talent Profiles (1/2)
Yedidyah DordekEducation: MSc, Machine
Learning, Signal Processing
Key Skills: Python, Matlab, OpenCV, Scikit-learn, Tensorflow, Keras
• Develop computer vision and deep learning algorithms for real-time and high speed vision
systems
• Research, fast prototyping, feasibility studies, specification and implementation of product
• Customer support, defining project requirements end to end from planning to integration
Note : DRAUP’s proprietary talent module was used to analyze talent by locations and skill sets
Key Skills: Digital Image Processing, TensorFlow, Keras
• Develop sophisticated computer vision algorithms to extract relevant information from
histology images and leverage that information using machine learning and statistical
algorithms to predict cancer progression and response to treatment.
• Develop random forest classifiers combining texture features (local binary patterns and
Haralick features) for object classification
• Develop convolutional neural network (deep learning) algorithms for automatic object
detection
Sr Data Scientist – Computer VisionExperience in Current Role: 1+ Years
Total Experience: 10+ Years
Data Scientist – Computer VisionExperience in Current Role: 1 Years
Total Experience: 7+ Years
Yu MaoEducation: MS Computer Vision,
Carnegie Mellon University
Key Skills: Tensorflow, Kaffe, OpenCV
• Produce data visualizations to communicate up and across the company Design of
Experiment (DOE) for engineering studies and large scale user studies Conduct/Support
data collection and analysis with other groups.
• Define feature specs and expected user experience based on data
• Build tools for analysing and visualizing data
Data Scientist – Computer VisionExperience in Current Role: 11 Months
Total Experience: 4+ Years
Shahab ArabshahiEducation: PhD Physics,
Florida Institute of Technology
Key Skills: Python, SQL, Image Processing, IDL, Matlab, MPI Library
• Develop algorithms in the fields of Computer Vision, Machine Learning and Deep Learning.
• Work with the system, physics, software, qualification and applications groups
• Provide software specifications and production code on time to meet project milestones
• Engage in customer facing activities to aid algorithms' proliferation at customer sites
Senior Data Scientist – Computer
VisionExperience in Current Role: 4 Months
Total Experience: 3+ Years
Nishant VermaEducation: PhD Biomedical
Informatics, University of Texas
Source : DRAUP
16
16
Data Scientist – Natural Language Processing: Sample Talent Profiles (2/2)
Key Skills: Java, Python, Ontology Creation, Computational Linguistics
• Develop and engineer NLP software (Java) to linguistically process large volumes of data
• Statistically evaluate the performance of in-house NLP tools using Python
• Regression testing of customized software
• Write documentation for both customized and in-house software
Note : DRAUP’s proprietary talent module was used to analyze talent by locations and skill sets
Data Scientist - NLPExperience in Current Role: 3 Months
Total Experience: 6+ Years
Key Skills: NLTK, SpaCy, xSQL, Bayesian Statistics
• Optimize user experience by data-mining and analysing chat transcripts between
customers and tech support agents.
• Extract data with xSQL; mined data with APIs (AWS, IBM Watson, Google, Intercom)
• Apply statistical analyses and machine learning techniques such as clustering, regression,
natural language processing, etc.
• Coordinate projects along data analytics life cycles, punctuated by demands from internal
and external customers.
Senior Data Scientist - NLPExperience in Current Role: 1+ Years
Total Experience: 10+ Years
Wenqi DongEducation: MS CV ML,
University of Michigan
Key Skills: Gensin, RNN & LSTM
• Responsible for the implementation and evaluation of state of the art algorithms for natural
language processing, machine learning and combinatorial optimization.
• Maintain a hybrid model for natural language understanding in smart home dialog system
• Set up a deep learning model for multi-domain intent and slots detection from Automatic
Speech Recognition results
Data Scientist - NLPExperience in Current Role: 9 Months
Total Experience: 2 Years
Key Skills: Perl, MapReduce, Python Text Mining
• Perform data mining to support new features and analyse large datasets to glean
actionable insights
• Design classifiers and ranking algorithms and perform language processing and query
analysis
• Perform ad-hoc statistical analysis and craft metrics to measure the success of the service
Data Scientist - NLPExperience in Current Role: 1+ Years
Total Experience: 13+ Years
Ebrain MirambeauEducation: MS Computational
Linguistics & NLP, University of
Washington
Sharon ChouEducation: PhD Electrical
Engineering, Stanford
University
Bing ZhaoEducation: PhD CS, Carnegie
Mellon University
Source : DRAUP
17
17
Summer Thompson
MBA, Wilmington University
Wilmington, Delaware
Insider Threat
Monitoring (May 2017 – Present)
Fraud Analyst
Recommended Progression
Acquired SkillsFinancial Analysis, Customer
Service, Loans, Credit
• Certified AML Specialist
• Certified Fraud Examiner
• AML Professional Certification
• CISSP
Neighbouring SkillsPython, Data Mining, Information
Security, SQL, SAS, R, Splunk, Anti
Money Laundering
Certifications Required
Keyonna Morrison
Central Piedmont Community
College
Charlotte, North Carolina Area
Corporate Banking
Specialist(Aug 2018– Present)
Fraud Analyst
Recommended Progression
Acquired SkillsLeveraged Lending, Customer
Service, Commercial Banking,
Financial Analysis
• Certified AML Specialist
• Certified Fraud Examiner
• AML Professional Certification
• CISSP
Neighbouring SkillsData Analytics, Python, Data Mining,
Information Security, SQL, SAS, R,
Splunk, Anti Money Laundering
Certifications Required
Sample Profiles that possess the skills to move into Fraud Intelligence Analyst roles
Raj Devnani
Professional Accounting, Macquarie
University
New York, New York
Credit Risk (Aug 2017 – Present)
Fraud Analyst
Recommended Progression
Acquired SkillsInvestment Banking, Credit Analysis,
Equity Research, Credit Risk
• Certified AML Specialist
• Certified Fraud Examiner
• AML Professional Certification
• CISSP
Neighbouring SkillsPython, Data Mining, Information
Security, SQL, SAS, R, Splunk, Anti
Money Laundering
Certifications Required
Accounting, University of
Arkansas
Fayetteville, Arkansas Area
Teller Customer
Service(Jun 2016 – Present)
Fraud Analyst
Recommended Progression
Acquired SkillsCustomer Service, Strategic
Planning, Project Management
• Certified AML Specialist
• Certified Fraud Examiner
• AML Professional Certification
• CISSP
Neighbouring SkillsData Analytics, Python, Data Mining,
Information Security, SQL, SAS, R,
Splunk, Anti Money Laundering
Certifications Required
Nicholas Teaster
Source : DRAUP
18
18
Stephanie Talebli
BBA Finance, The University of
New Mexico - Robert O. Anderson
School of Management
Greater New York City Area
Credit Analyst (2017 – Present)
Quantitative
analyst
Recommended Progression
Acquired SkillsCredit Analysis, Financial Analysis,
Commercial Banking, Portfolio
Management
• Chartered Financial Analyst
(CFA)
• Certificate in Quantitative
Finance (CQF)
Neighbouring SkillsPython, C++, C#, SQL, R, SAS,
Quantitative Research, Statistical
Data Analysis
Certifications Required
Chris Cziesla
Bachelor's Degree Economics
and Computer Science,
Claremont McKenna College
San Francisco Bay Area
Investment Banking
Analyst (Jun 2018– Present)
Quantitative
analyst
Recommended Progression
Acquired SkillsData Analysis, Financial Analysis,
Microsoft Office, Customer Service
• Chartered Financial Analyst
(CFA)
• Certificate in Quantitative Finance
(CQF)
Neighbouring SkillsPython, C#, SQL, R, SAS,
Quantitative Research, Statistical
Data Analysis
Certifications Required
Sample Profiles that possess the skills to move into Quantitative analyst roles
Hanchen Liang
Master of Science (M.S.) Financial
Engineering
University of Michigan
Greater New York City Area
Consultant, Pricing
and Valuation (Jan 2016 – Present)
Quantitative
analyst
Recommended Progression
Acquired SkillsData Analysis, Quantitative Analytics,
SQL, Financial Analysis, Financial
Modelling
• Chartered Financial Analyst
(CFA)
• Certificate in Quantitative
Finance (CQF)
Neighbouring SkillsPython, C++, C#, R, SAS, Quantitative
Research, Statistical Data Analysis
Certifications Required
Bachelor's Finance and
Entrepreneurship, University of
Utah
Salt Lake City, Utah
Equity Research
Analyst (Jan 2019 – Present)
Quantitative
analyst
Recommended Progression
Acquired SkillsFinancial Analysis, Analytical Skills,
Investments, Business Strategy,
Microsoft Office
• Chartered Financial Analyst (CFA)
• Certificate in Quantitative Finance
(CQF)
Neighbouring SkillsPython, C++, C#, SQL, R, SAS,
Quantitative Research, Statistical
Data Analysis
Certifications Required
Noe Bellet
Source : DRAUP
19
19
US: ~65% of AI & Big Data Talent in US is concentrated across Bay Area & Seattle; Central and Eastern region’s talent is largely spread across start-ups
Seattle 63K+ Talent
22%
Others
14%
Bay Area 110K+ Talent
39%
Phoenix 9K+ Talent
3%
Dallas 17K+ Talent
6%
Austin 10K+ Talent
4%
Boston 35K+ Talent
12%
~ 280,000AI/Big Data TalentTech Companies: Microsoft,
Amazon, Expedia, Facebook,
Banks & Financial Services:
Capital One, JP Morgan Chase
Tech Companies: Google,
Facebook, Apple, Oracle, Uber
Banks & Financial Services:
Wells Fargo, Bank of America
Tech Companies: Microsoft, Wayfair,
Amazon, Google
Banks & Financial Services:
Fidelity Investments, State Street Corp
Tech Companies: Microsoft,
IBM, AT&T, Verizon
Banks & Financial Services:
Wells Fargo, American Express
Tech Companies: IBM, Microsoft
Banks & Financial Services:
Citi, Bank of America, JPMorgan
Chase
Tech Companies: Dell,
General Motors, Oracle
Banks & Financial Services:
Charles Schwab, Citi
1. Higher pay: Tech Giants like Amazon, Google,
Facebook attract tremendous technology talent
due to the lucrative compensation they offer
2. Subject matter experts and thought
leadership: Presence of thought leadership and
subject matter experts in an organisation helps
companies retain the data science talent
3. Sense of security: Tech companies are
predicated on data science initiatives which
inculcates a sense of security within talent and
also promises new challenges
Top Data Science Practices
Source : DRAUP
20
20
AI/Big Data
start-ups
G500
companies
AI/Big Data
Universities
Bay Area: Hotspot for mature AI & Big Data talent pool in Data Science and Data Management roles; talent cost is much higher than other areas in US
Seattle
San Francisco Bay Area
~110,000Employee talent
AI/BD Talent Hubs
San Francisco
Palo Alto
Mountain View
Santa Clara
Sunnyvale
San Jose
San Mateo
$195K Median Cost
Boston
Dallas
Austin
Phoenix
36%
Data Scientist
35%
Analyst:Data Management
11%
DatabaseEngineering
8%
Data Architect
10%
Applied AI
Headcount Distribution(By Skills)
56K+
~ 1,500
11
60K+
Enterprise & Consumer Software,
Semi-Conductor, Consumer
Electronics, BFSI
Key Employers Total Employable Talent
Top Verticals
Total Number
Note : DRAUP’s Talent Simulation Module . We have analysed ~2,000 tech companies and ~10,000 start-ups.
Source : DRAUP
21
21
Bay Area – x
Data
Scientist
Database
EngineerApplied AI
Data
Architect
Analyst –
Data
Management
~2000 25% 4%64%
~3000 6% 6% 16%
~4600 4%Responsible for key AI-based product development activities across major business units such as Tensor Flow, Waymo, DeepMind, Android, Youtube, Chrome, Maps etc.
~1000 21% 10% 58%Oracle AI Platform Cloud Services team is based out of the Bay Area centre.
~1200 15% 1% 2%Uber’s major share of talent pool responsible for Fleet management services, location intelligence and autonomous driving group are based out of Bay Area.
AI & Big Data
Headcount
Top 5 tech companies AI & Big Data head count ~12000
Facebook’s key AI product priorities such across DeepText, Facial recognition, Oculus Advanced Development Group, Instagram & WhatsApp etc are based out of Bay Area
Apple’s HQ - 80% of R&D talent is out of this center. Apple’s core ML teams for Siri, Developer’s platform, iOS, iCloud, etc are all based out of this center
66% 3% 11%16%
2% 5%
16%56%
1%10%
4%78%
Note : DRAUP’s Talent Simulation Module. We have analysed ~2,000 tech companies and ~10,000 start-ups.
Bay Area: Data Science is the most employed role across top players with nearly every engineering priority focussed on building cross industry AI platform
Source : DRAUP
22
22
35%
Data Scientist
30%
Analyst:
Data
Management
20%
Database
Engineering
5%
Data Architect
10%
Applied AI
Headcount Distribution
(By Skills)
Seattle, Washington
~63,000Employee talent
$185K Median Cost
Boston
Dallas
Austin
Phoenix
Bay Area
AI/Big Data
start-ups
G500
companies27K+
~ 350
AI/Big Data
Universities 4
36K+
Enterprise & Consumer Software and
BFSI
Note : DRAUP’s Talent Simulation Module. We have analysed ~2,000 tech companies and ~10,000 start-ups.
Seattle: 2nd largest talent hotspot in US with majority of talent in Data management and Data Scientist role employed across G500 employers and vertical specific start-ups
Key Employers Total Employable Talent
Top Verticals
Total Number
Source : DRAUP
23
23
Data
Scientist
Database
Engineer
Applied
ML
Data
Architect
Analyst –
Data
Management
~2000
Build Deep Learning tools and APIs, and contribute to open source frameworks such as MXNet and Keras. Help build industry-leading conversational technologies and machine learning systems that powers Alexa.
~300
Design, build, scale, and optimize the data infrastructure as a highly innovative service that enhance and innovate the Expedia e-Commerce ecosystems. Build full-stack data from multiple data sources leveraging cloud systems, and structured/unstructured data.
~2100
Develop new NLP capabilities and text understanding APIs in Text Analytics Cognitive Service. Extract insights from unstructured data and build predictive solutions for core NLP problems.
~200
Core Machine Learning Team in Seattle develops and optimizes various algorithms including Neural Networks, Boosted Decision Trees, Sparse Linear Models, and Deep Learning for several ranking teams including Ads, Feed, Search, Instagram and others.
~200
51% 18% 27% 3% 1%
24% 44% 15% 17% 0%
61% 23% 9% 5% 2%
66% 10% 20% 2% 4%
22% 29% 42% 7% 0%
Starbucks AI and analytics team in Seattle is using customers’ spending patterns data to develop insights that would help in generating personalized product promotions such as user based reward cards.
AI & Big Data
Headcount
Top 5 tech companies AI & Big Data head count ~30000
Note : DRAUP’s Talent Simulation Module. We have analysed ~2,000 tech companies and ~10,000 start-ups.
Seattle: Microsoft Azure ML, Facebook Deep Learning and Amazon Core NLP teams are based out of Seattle; Brick and Mortar players like Starbucks offer personalised customer service by using ML technology in analytics
Source : DRAUP
24
24
40%
Data Scientist
30%
Analyst:
Data Management
15%
Database
Engineering
8%
Data Architect
7%
Applied AI
Seattle
Boston
~35,000Employee talent
$155K Median Cost
Boston
Dallas
Austin
Phoenix
Bay AreaKey Employers Total Employable
Talent
8.4K+
~ 300
10
Top Verticals
26K+
Total Number
Healthcare, Retail
Note : DRAUP’s Talent Simulation Module. We have analysed ~2,000 tech companies and ~10,000 start-ups.
AI/Big Data
start-ups
G500
companies
AI/Big Data
Universities
Boston: AI/BD employed talent is consolidated in start-ups from Healthcare and Retail industries; Boston Universities such as MIT and North-eastern University provide mature engineering talent pool
Headcount Distribution
(By Skills)
Source : DRAUP
25
25
Data
Scientist
Database
Engineer
Applied
ML
Data
Architect
Analyst –
Data
Management
~30 22% 0%
~130 10% 2% 5%
~72 5%
~130 2%Wayfair launched “Search with Photo” a new feature that leverages artificial intelligence to assist consumers in the product buying process
~190 17% 15% 0% 13%
Creating AI/ML driven products that combine natural language understanding with batch and real time sales and service models with the goal of improving the omni-channel customer experience in a measurable way
ML/Big Data
Headcount
Top 5 tech companies AI/Big Data head count ~550
Key focus areas are Computational Biology, Computer Science, Cryptography, Machine Learning, Systems and Security
The R&D team in Boston area develops software automation infrastructure for Amazon's Kiva robotic systems in an integrated service-oriented cloud computing environment.
Embedded Systems and Mobile Apps for Android are some of primary activities carried out Boston Area
19%0%6%
3% 13%
13%
10% 0%
55%
72%
70%
62%
70%
16%
Note : DRAUP’s Talent Simulation Module. We have analysed ~2,000 tech companies and ~10,000 start-ups.
Boston: Along with Tech giants, traditional Retail and Banking players are also key employers of Data Science skills primarily working for digital transformation of backend and frontend operations
Source : DRAUP
26
26
1
2
3
4
5
6
Maturity of ML/BD courses
Maturity of the courses has been calculated by analyzing
depth of courses, number of enrollments, no. of citations of
publication by professor teaching the course etc.
No. of ML/BD courses
Total number of ML and Big Data courses taught in the
university
No. of ML/BD publications
Number of ML and Big Data publications done by the
professors/phds of the universities
No. of Masters/Phds
CoE of tech companies
Start-ups born
Number of ML and Big Data publications done by the
professors/phds of the universities
If tech companies have opened Centre of Excellence for AI
or Big Data by tying up with the university
Number of startups born from the university
Note : The ranking shown is a sample
Note : DRAUP’s Talent Module analysed 100,000+ global universities to identify top universities and key courses in software engineering, ML and Big Data
No. of ML/BDcourses
No. of ML/BDpublications
No. of Masters/Phds
CoE of techcompanies
Startups Born
Maturity of AL/MLcourses
CMU Cornell UCB MIT Stanford
University Assessment: Stanford has the maximum number of CoE collaborations with tech companies as well as mature ML courses; CMU has the most number of ML/BD courses and publications
Source : DRAUP
27
27
University Courses: Maturity analysis of course-works related to AI & Big Data skills
Intermediate Courses Advanced CoursesBeginner Courses
Mature courses require students to complete course on Advanced Probability Theory and Advanced Statistical Theory and a project
on Advanced Data Analytics
Intermediate courses have a prerequisite to complete one or two of the beginner courses and
require students to complete projects on real world data
Beginner courses do not have any specific prerequisites but prior experience in calculus, probability, statistics etc is
recommended
• Analogical Reasoning• Decision Theory• Fuzzy Logic• Logic Programming• Machine Discovery• Machine Learning• Planning• Qualitative Physics and Model based
reasoning• Search• Temporal Reasoning• Philosophy of AI
Key Courses
• Cognitive Modelling• Genetics Algorithms• Knowledge Representation• Computer Vision• Non-Monotonic reasoning• Robotics• Cognitive Science• NLP
Key Courses
• AI & Manufacturing• AI & Medicine• AI & Legal reasoning• Artificial Life• Computational Biology• Emotion• Neural Networks• Distributed AI• Integrated AI Architectures• Intelligent Tutoring• Expert System
Key Courses
Universities are specializing in courses with different levels of maturity in the field of AI/ ML & Data Science
Types of course on the basis of maturity level
Note : DRAUP’s Talent Module analysed 100,000+ global universities to identify top universities and key courses in software engineering, ML and Big Data
Source : DRAUP
28
28
Carnegie Mellon University
Tech Collaboration & CoEs
Top AI Awards1. Continuously ranked
amongst Top 5 schools 2. Faculty and alumni have
won multiple prestigious awards and million dollar research grants
Marquee AI Alumni
1. Andrew Ng, Prof at Stanford, co-founder of Coursera
Key Start-Ups Born Key Programme Offered
Tech
Collaboration
& COEs
Major PhD & Professors profiles
Entry Level• Introduction to Machine Intelligence
• Concepts in Artificial Intelligence• AI, Society and Humanity
Intermediate
• Neural computation • Cognitive robotics• Introduction to deep learning • Introduction to Natural language processing
Mature
• Deep reinforcement learning and control • Vision sensors• Human-Robot interaction • Computational perception
CMU researchers are working with Amazon to improve Alexa.
Sony to collaborate with CMU on AI and robotics research. Initial R&D efforts will focus on food preparation, cooking and delivery
PROFILE 1DESIGNATION: Professor, Machine learning Department
Education: : Ph.D. in Molecular Biology and Biochemistry,Ph.D. in Computer Science No of publications: 102
Research Areas: Machine learning, statistical methodology, large-scale computational system and architecture etc.
Current works: Foundations of statistical learning , Framework for parallel machine learning on big data computational and statistical analysis of genes, application of statistical learning in social networks.
PROFILE 2DESIGNATION: Professor, Machine learning Department
Education: Ph.D. in Learning Deep Generative Models,Masters in Optimization Algorithms for LearningNo of publications: 52
Research Area: Deep Learning, Probabilistic Graphical Models, and Large-scale Optimization.
Current works: : Structured Control Nets for Deep Reinforcement Learning , Neural Models for Reasoning over Multiple Mentions using Coreference, Neural Map: Structured Memory for Deep Reinforcement Learning etc.
K&L gates, a law firm gave $10m to CMU to study the ethics of AI
General Motors have set up a GM-CMU collaborative research lab on autonomous driving
Argo.aiRaised $ 1B from Ford
Wombat SecurityAcquired by Proofpoint for
$225m
PetuumHas raised over $108 m
US: Leading tech giants such as Apple, Google and Amazon have collaborated with CMU for research in the field of AI, Robotics and Deep Learning
Note : DRAUP’s Talent Module analysed 100,000+ global universities to identify top universities and key courses in software engineering, ML and Big Data
Source : DRAUP
29
29
John Hopkins University
Tech Collaboration & CoEs:
Marquee AI Alumni:
1. President of Drive.ai
Key Start-Ups Born Key Programme Offered
Tech
Collaboration
& COEs
Major PhD & Professors profiles
Entry Level• Computer graphics
• Parallel Programming• Digital Health and Biomedical Informatics
Intermediate
• Algorithms for Sensor-Based Robotics• Natural language processing• Machine Translation• Representation Learning
Mature• Advanced Topics in Genomic Data Analysis• Deep Learning for Image Understanding• Vision as Bayesian Inference• Modern Biomedical Imaging Instrumentation
and Techniques
US: John Hopkins has collaborated with healthcare companies such as Medopad and Bayer to leverage predictive analytics in healthcare
Acquired by Baidu
Has raised $7.7m in funding
Collaboration with UK medtech firm Medopad to leverage deep data sets for predictive analysis for at-risk patients and communities.
The Felix project, funded by Lunstgartenfoundation, wants to develop deep learning algorithms to study MR and CT images of pancreas.
OurCrowd and The Johns Hopkins University have partenered brings Israeli health IT startups to Johns Hopkins for clinical trials and technology validation
PROFILE 1DESIGNATION: Associate Professor, Computer Science
Education: B.S in Computer Science, BS in Computer Engineering, Ph.D in Computer ScienceNo of publications: 40
Research Areas: Natural Language processing, Machine learning, Health informatics, Clinical NLP , Computational Epidemiology
Current works: Social monitoring pf public health, Bayesian Modeling of Lexical Resources, Multi-task Domain Adaptation for Sequence Tagging, Harmonic Grammar, Optimality Theory, and Syntax Learnability
PROFILE 2DESIGNATION: Assistant Professor, Computer Science
Education: BS in Computer Science, MS in Computer Science , PhD in Computer Science and Linguistics No of publications: 42
Research Area: natural language processing, artificial intelligence, machine learning, linguistic semantics
Current works: Neural Machine Translation Using Natural Language Inference, Neural Models of Factuality, Cross-lingual Semantic Parsing, Semantic Proto-Role Labeling
Note : DRAUP’s Talent Module analysed 100,000+ global universities to identify top universities and key courses in software engineering, ML and Big Data
Source : DRAUP
30
30
Description Focus Areas Presence in United States
• Supports the data science profession with practical resources for data professionals while improving the practice of data science, accrediting schools, and establishing model ethical codes.
Data Science, Data Engineering
Yes
• An international community of 12,500 operations research and analytics professionals and students. The association has presence in over 90 countries
• Conducts meetings and conferences, helps in professional development and recognises excellence in the field of operation research and data science & analytics
Operations Research, Management Science, Data Analytics, Business Intelligence
Yes
• Promote research and responsible use of artificial intelligence• Aims to increase public understanding of artificial intelligence, improve the
teaching and training of AI practitioners, and provide guidance for research planners and funders regarding the importance and potential of current AI developments and future directions
Artificial Intelligence
Yes
US: Top Data Science and Artificial Intelligence associations in the United StatesASSOCIATIONS
Association for the Advancement of Artificial Intelligence
Institute for Operations Research and the Management Sciences
International Data Engineering And Science Association
OTHERS
Source : DRAUP
31
31
www.draup.com