kickwrangler

10
Kickwrangl er Alex McCauley Predict which Kickstarter projects won’t deliver

Upload: alexmccauley

Post on 14-Feb-2017

117 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Kickwrangler

Kickwrangler

Alex McCauley

Predict which Kickstarter projects won’t deliver

Page 2: Kickwrangler

It doesn’t always work out…

Kickwrangler predicts the projects most likely to not deliver

pledged to Kickstarter projects over 6 years$1,768,328,118

Page 3: Kickwrangler

Data SourcesProject Features

Creator data:• # past projects (experienced?)• # they have backed

Project data:• Category• # backers• $ pledged vs goal (overfunded?)

Project OutcomesNo data/tracking of project delivery

• Sentiment analysis -> project status

• 500k tech comments

Scraped from:

predict

Page 4: Kickwrangler

Algorithms: …

TopicsComment

block

Comment block

Comment block

Topic modeling(Latent Dirichlet Allocation)

Small (1k) sample

Worker 1

Worker 2

Worker 3

Pre-process text

Labeled sentiments

Linear regression:topic – sentiment mapping:

Curate: keep only 7 most relevant topics

Sentiment vs time!

Workers label sentimentsAll comments

Page 5: Kickwrangler

Timeline of project sentiments

“refund”, “delay”

“received mine today”, “awesome”

“pledge”, “funded”

“update”, “status”

Page 6: Kickwrangler

Classification algorithmClassification resultsBuild intuition: 2-d reduction

• Random Forest classifier

Accuracy 0.88

Precision 0.6

Recall 0.3

F1 0.43 (0.24 = random)

Most important feature:• $ pledged / $ goal – overfunded

campaigns are badSurprisingly not important:• Creator history (# previous projects)

Good

Bad

Page 7: Kickwrangler

About me

Wireless batteries!

Green = charging

Photonic crystals, quasi-crystals, and Casimir forces

Page 8: Kickwrangler

Validation

Mechanical Turk ratings• Remove ratings with high scatter• Inspect subset

Topic-sentiment map• Train linear regression on labeled comments• Test on holdout set:

Project status• Recall: 10/10 on “top 10 failed projects” list• Precision: test on 20 predicted failed projects

Page 9: Kickwrangler

Validation: sentiment analysisMechanical Turk ratings

• Train linear regression on labeled comments, mapping topic content to sentiments

• Test on holdout set:

Topic-sentiment map

• Remove comments with standard deviation > 0.5 (S.D. = 1.3 is a random guess)

Page 10: Kickwrangler

• Test on 10 well-known failure cases, gets 10/10

• Test validity on random sample of 100 projects (TODO)

Validation: project labeling