c* capacity forecasting (ajay upadhyay, jyoti shandil, arun agrawal, netflix) | cassandra summit...

44
Capacity Forecast @ Scale CDE, Cloud Database Engineering Netflix.

Upload: datastax

Post on 13-Apr-2017

172 views

Category:

Software


1 download

TRANSCRIPT

Page 1: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Capacity Forecast @ ScaleCDE, Cloud Database EngineeringNetflix.

Page 2: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

●CDE, Cloud Database Engineering ●Providing data stores as a service

○Cassandra,○ Dynomite, ○ Elasticsearch and RDS

Ajay Upadhyay Cloud Data Architect @ Netflix

Arun AgrawalSr. Software Engineer @

Netflix

Who are we?

Page 3: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

●Cassandra @ Netflix●Cassandra footprint ●Capacity planning lifecycle

●Forecasting the capacity

●Q and A

Agenda

Page 4: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

• 98% of streaming data is stored in Cassandra

• Data ranges from customer details to Viewing history / streaming bookmarks to billing and payment

Cassandra @ Netflix

Marlee Tart
Minor, but this is dated. Would suggest using S2 artwork
Page 5: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Cassandra Footprint

Hundreds C*

Page 6: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Cassandra Footprint

Thousands

Page 7: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Capacity Planning

•Able to predict

– Current usage and available capacity

– Resources needing upgrade– Life cycle of current configuration– Appropriate configuration for new

and existing App/Service

•Optimize – Under or over utilized resource– Increased business productivity

Page 8: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Capacity Planning

Avoid:

• Impact on Business • No service or SLA

disruption• Un-planned

maintenance• Firefighting

Page 9: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Life Cycle

Capture Requirement

RequirementAnalysis/

feasibility

Proxy or Simulate

Requirement

Monitoring /

Trending

New / Increased

traffic Optimization

Page 10: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Capture Requirement

– IOPs and SLA– Maintenance overhead– Failover – Access pattern

Page 11: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

IOPs and SLAQuestions Response

Read OPS/sec [avg, peak] 5k - 10kRead Latency requirement 95th - 20ms

99th - 100ms Write OPS/sec [avg, peak] 1k - 2kWrite Latency requirement 95th - 20ms

99th - 100msNum Columns / Row 100

Avg col size / or avg row size 64kNum of rows 100 Mil

TTL [life Cycle of data] 365 Days

Data storeC*

Gutenberg publisher service

Gutenberg publisher serviceReadWrite

Page 12: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Maintenance Overhead

Repairs / Compactions Y/N

Node replacement Y

Backup - Full / Incrementals

Y/N

TypeRespons

e

Page 13: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Failover

Region Failover Y/N

SLA in case of region failover

Y/N

Questions Response

Page 14: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Access Pattern

Questions ResponseRead Point read

All row readersColumn slices

Write Part existing rowNew rows

Page 15: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Proxy/Simulate Traffic

– Proxy existing traffic – Simulate traffic

–NDBench– Generate actual /

synthetic traffic before final deployment using app

Page 16: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Optimization

• Cache - Application level- Fronting cache engine before C*

- Stagger R - W operations if possible

Page 17: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Cluster Sharding

Page 18: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Trend AnalysisContinuous monitoring / trending on usage pattern

Page 19: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

New / Increased TrafficCapacity planning cycle begins

Capture

Requirement

RequirementAnalysis/

feasibility

Proxy or Simulate

Requirement

Monitoring /

Trending

New / Increased

traffic

Optimization

Page 20: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Capacity Forecasting

Page 21: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Arun AgrawalSr. Software Engineer

Page 22: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Demo

Page 23: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016
Page 24: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016
Page 25: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Metrics

Atlas

Previous Architecture

Page 26: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Pain Points

•No support for complex relationships

•Hardware failure could fail leading to false positives

Page 27: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Winston• Bridge between atlas and oncall• Complex relationship modeling

between metrics• Reduce false positives• Auto remediation platform

Page 28: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Lesson Learnt•It might be already too late to fix the system.

•Reactive than proactive

Page 29: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Requirements• Show us trend for the clusters. • Warn us of what is coming if

trend continues.• Give us time to scale their

cluster

Page 30: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016
Page 31: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Automic (UC4)

Architecture

Page 32: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Aggregation• Daily • Instance Level• Cluster Level

•Instance Failures•Adding capacity over days

Page 33: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Growth Criteriaf(x) of – Subscriber – Netflix content– # Viewing Sessions

Page 34: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016
Page 35: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016
Page 36: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

ARIMA– AR

•Regression on prior values–I•Data values are replaced with (x(i) - x(i-1))

–MA•Linear combination of error terms

Page 37: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016
Page 38: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016
Page 39: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016
Page 40: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016
Page 41: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Future•Vector Auto Regression

•Automate manual judgement

Page 42: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Resources– https://www.otexts.org/fpp/8

Page 43: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Q & A

Page 44: C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

You may not control all the events that happen to you, but you CAN decide not to be reduced by them.

-Maya Angelou