mohammed al madhani
Click here to load reader
Post on 27-Jan-2017
Embed Size (px)
Setting up a Big Data TeamBest Practices (*) perspective from Mohammed AL Madhani.
Alright thank you very much for having me this afternoon.Its big pleasure and honor to be here
My Name is Mohammed AL Madhani, information management director at Federal demographic council.
Sources of this presentation
I would like to start telling you about the great sources helping me to do this presentation.
1- a great guide from booz aleen, its available free on the internet.2nd one is building data science capabilities.The third is a report from Crowdflower .2
Sources used in this presentationBuilding Data Science TeamsByPaco Nathan Publisher:O'Reilly Media
Also I used the study guide from CAP , an Informs certificate for data scientist.5- any a video by paco Nathan about building data science team.
Big Data trend in Google trends
If you do a small search in google trends we will start looking to people interests.
I start searching about Big DataSo we know that the big date is a big trend, 4
Bigdata vs IOT
36 different types of sensors = 30$ in Amazon.com!!
Beside the bigdata , I search about IOT, its shows that IOT is making a bigger trend which really scare me, cause thats mean another huge data is coming.5
Bigdata vs IOT vs data science trends
Organizations want to get all their data, they have a lot of data, they're not doing as much as they could be doing with it?
if you wanna do big data, you have to have a data scientist.
The second reason they're asking this question, is they wanna know what the skills sit are in data scientist because they're finding it hard to hire them.
If you make another search about data science, its shows its also trending but not as like as IOT.So do you need to hire a PhD in mathematics or statistics?!Or perhaps you can grow a data scientist within your existing organization?
The Business Impacts of Data Science
Studies shows that there is a big improvement in performance, when bigdata and data science being adapted.8
What is a Data Scientist?
a data scientist is someone who finds new discoveriesThats what a scientists does.they make a hypothesis, And they try to investigate that hypothesis.in case of data scientist, they do it with DataThey look for meaning knowledge in the data and they do that in a couple of a different ways
One, is they visualize the data, they look at the data they create reports and look for patterns in the datathat's very similar to what you might, think of as a traditional business intelligence analyst or data analyst.
so that's one of the tools that data scientist use.
Business Intelligence vs Data Science
The two capabilities are additive and complementary, each offering a necessary view of business operations and the operating environment. 12
Some commonly used methods/modelDiscrete event simulationQueuing modelMonte Carlo simulationAgent-based modeling System dynamics Game theoryProbabilitiesEconomic analysis, IRR , NPV , FV Linear regression Stepwise regressionmethod Logistic regressionConfidence intervals Hypothesis testing Statistical inferencesDesign of experiments Analysis of variance Principal component analysis (PCA)Data mining ForecastingArtificial neural networks Fuzzy logicExpert systems Decision trees Markov chainRevenue management (yield management) Optimization Linear programming Integer programming
but what really distinguishes a data scientist
is they are using algorithms, Advanced algorithms.That actually run through the data, looking for all this meaning.
And you may have heard things like machine learning algorithms or you may have heard algorithm such as neural networks or regression or K-means
Theres dozens of these algorithms out there, and essentially they run through the data looking for the meaningThat is one of the fundamental tools data scientist
Data artist of turning Data into actionSkills:
65%30%45%75%MathComputer scienceStatisticsDomain expertiseExperience:Doing data representations and Using algorithms for optimization and validation, communicante with the team to make sure data avaiability, rediness, completness, work with data researcher in decomposition the problem.14
so to use those algorithms ,the data scientist has to have a strongFoundational knowledge in mathematics and statistics And in some cases computer science and domain knowledge
The Data Science Venn Diagram
How we can reduce the traffic Jam?How we can reduce waiting time for the patient?
that might be a very good question for data scientist to answer and the data scientist would go about that by gathering all the data,running algorithms till they can find some reliable pattern that can answer that question.
UNLOCKING DATAThe Data scientist mission17
You give the data scientest a data and question and he will start the journey to give you an answer and also a technology that will last supporting the answer.17
Generating the momentumDescription:Proofs of concepts can generate the critical momentum needed to jump start any Data Science Capability ProblemsolvingCritical ThinkingBuy-inNecessary DataClear ROIDedication & focus Fail often and learn quickly18Limited Complexity and Duration
In order to generate the momentum to start seriously working in the problem, 18
Dealing with the problem ( Informs CAP Approach)
12Decomposition & Datafication
DESIGN THINKING METHODOLOGY (Alternative Approach)
BOOZ ALLENS DESIGN THINKING TOOL BOX FOR ANALYTICS
Big Data Researcher
Domain expert with data science knowledgeSkills:
65%10%MathComputer scienceDomain expertise22
Statistics30%30%Mission: Generates low-fidelity prototypes to demonstrate applicability and test ideas quickly and cheaply before making significant investments
The Four Key Activities of a Data Science
Respondents who said there werent enough data scientists to go around
Do Data Scientists Have What They Need?
We actually need the data scientist to spend more of their time in representation, optimization and evaluation.29
30If you have perfect information or zero information then your task is easy it is in between those two extremes that the trouble begins
Maslows of Need could by applied to Data
Enhance the data management maturity ( Data Preparation )
Data Management Team ( Data preparation part )Ready to Go !Develop and execute all data flow jobs, business rules, matching, Scrapping, Cleaning, munging, joining and wrangling
Responsible about data flow , data solutions and Data Models and architecturesResponsible about managing the data elements and data metadata.Running the maturity model componentsBig Data StewardBig DataArchitectBig Data Engineer33
Some CertificatsTargeting to Certify your team will increase the maturity
Certified Analytics Professional (CAP) created in 2013 by the Institute for Operations Research and the Management Sciences (INFORMS) and is targeted towards data scientists.
EMC: Data Science Associate (EMCDSA) tests the ability to apply common techniques and tools required for big data analytics.
SAS Certified Predictive Modeler :designed for SAS Enterprise Miner users who perform predictive analytics.34SAS Certified Predictive Modeler uEMCDSACAP Certified
Tools to support the bigdata team Spreadsheet systems Statistical systems Optimization systems Simulation systems Business intelligence systems Data management systems Structured data Unstructured data Data integration systems Operating systems such as HADOOP
BOOZ ALLEN TALENT MANAGEMENT MODEL
BOOZ ALLEN TALENT MANAGEMENT MODEL
CRISP-DM (cross-industry standard process for data mining)
Six Sigmas DMAIC
News:Metis Bootcamp Tuition IncreaseEffective June 20, 2016, the tuition for the Metis Data Science Bootcamp in New York and San Francisco will increase to $15,500. Accepted students who have signed and returned their enrollment agreements on or before June 20, 2016 will receive the current tuition of $14,000.This is the first tuition increase for Metis, and is the result of continued investments to ensure that our students are best prepared for careers in data science.