Download - Mohammed AL Madhani
Setting up a Big Data TeamBest Practices
(*) perspective from Mohammed AL Madhani.
Sources of this presentation
Sources used in this presentation
Building Data Science TeamsBy Paco Nathan Publisher: O'Reilly Media
Big Data trend in Google trends
Bigdata vs IOT36 different types of sensors = 30$ in Amazon.com!!
Bigdata vs IOT vs data science trends
The Business Impacts of Data Science
What is a Data Scientist?
Business Intelligence vs Data Science
Some commonly used methods/model
Discrete event simulation
Queuing model
Monte Carlo simulation
Agent-based modeling
System dynamics
Game theory
Probabilities
Economic analysis, IRR , NPV , FV
Linear regression
Stepwise regression–method
Logistic regression
Confidence intervals
Hypothesis testing
Statistical inferencesDesign of experiments
Analysis of variance Principal component analysis (PCA)
Data mining Forecasting
Artificial neural networks
Fuzzy logic
Expert systems
Decision trees
Markov chain
Revenue management (yield management)
Optimization
Linear programming
Integer programming
14
Data Scientest
Data artist of turning Data into action
Skills:
65%
30%
45%
75%
Math
Computer science
Statistics
Domain expertise
Experience:Doing data representations and Using algorithms for optimization and validation, communicante with the team
to make sure data avaiability, rediness, completness, work with data researcher in decomposition the problem.
30%Machine Learning
The Data Science Venn Diagram
How we can reduce the traffic Jam?
How we can reduce waiting time for the patient?
17
UNLOCKING DATAThe Data scientist mission
?
Solution
Answer
ROI
Question
Data
INPUTS
OUTPUTS
18
Generating the momentum
Description:Proofs of concepts can generate the
critical momentum needed to jump start
any Data Science Capability
Problemsolving
Critical ThinkingBuy-in
Necessary Data
Clear ROI
Dedication &
focus Fail often and learn
quickly
Limited Complexity
and Duration
Dealing with the problem ( Informs – CAP Approach)
1 2Decomposition & Datafication
DESIGN THINKING METHODOLOGY (Alternative Approach)
BOOZ ALLEN’S DESIGN THINKING TOOL BOX FOR ANALYTICS
22
Big Data Researcher
Domain expert with data science
knowledge
Skills:
65%
10%
Math
Computer science
Domain expertise
70%
Communication
Statistics
30%
30%
Mission: Generates low-fidelity prototypes to demonstrate applicability and test ideas quickly and cheaply before making significant investments
The Four Key Activities of a Data Science
Respondents who said there weren’t enough data scientists to go around
Do Data Scientists Have What They Need?
Data preparation
P
30
If you have perfect information or zero information then your task is easy – it is in between those two
extremes that the trouble begins“ ”
Maslow’s of Need could by applied to Data Op
timized
Measured
Defined
managed
Performed
Enhance the data management maturity ( Data Preparation )
Data Management Team ( Data preparation part )Ready to Go !
Develop and execute all data flow jobs, business
rules, matching, Scrapping, Cleaning, munging, joining and
wrangling
Responsible about data flow , data solutions and Data
Models and architectures
Responsible about managing the data elements and data
metadata.Running the maturity model components
Big Data Steward
Big DataArchitect
Big Data
Engineer
33
34
Some CertificatsTargeting to Certify your team will increase the maturity
1. Certified Analytics Professional (CAP)
created in 2013 by the Institute for Operations Research and the Management Sciences (INFORMS) and is targeted towards data scientists.
2. EMC: Data Science Associate (EMCDSA)
tests the ability to apply common techniques and tools required for big data analytics.
3. SAS Certified Predictive Modeler :
designed for SAS Enterprise Miner users who perform predictive analytics.
SAS Certified Predictive Modeler u
EMCDSA
CAP Certified
Tools to support the bigdata team• • Spreadsheet systems• • Statistical systems• • Optimization systems• • Simulation systems• • Business intelligence systems• • Data management systems• ▪ Structured data• ▪ Unstructured data• • Data integration systems• • Operating systems such as HADOOP
BOOZ ALLEN TALENT MANAGEMENT MODEL
BOOZ ALLEN TALENT MANAGEMENT MODEL
CRISP-DM (cross-industry standard process for data mining)
Six Sigma’s DMAIC
News:Metis Bootcamp Tuition Increase• Effective June 20, 2016, the tuition for the Metis Data Science
Bootcamp in New York and San Francisco will increase to $15,500. Accepted students who have signed and returned their enrollment agreements on or before June 20, 2016 will receive the current tuition of $14,000.• This is the first tuition increase for Metis, and is the result of
continued investments to ensure that our students are best prepared for careers in data science.