1. 程式人生 > >Data Science Screencasts: A Data Origami Review

Data Science Screencasts: A Data Origami Review

Data Origami is a new website by Cameron Davidson-Pilon that provides data science screencasts. It is a cool idea and a cool site.

Cameron was kind enough to give me access to the site so that I could review it. I watched all of the videos I could and wrote up all my notes, and in this post you will get a sneak peek into Cameron’s new site Data Origami.

data origami

Data Origami Logo

Data Origami

Data Origami is a simple idea. It provides screencasts on topics relevant to a data scientist.

Each screencast is 9-13 minutes in length on a narrow and specific topic. Screencasts all use Python and are presented in an IPython notebook including text, mathematical equations, code and plots. The notebooks are available as well as downloads of the videos themselves for desktop and mobile and links to further resources and relevant datasets.

At the time of writing it is a paid service at $9 a month for access to all of the screencast, although there is one screencast available for free.

data origami skill levels

Data Origami Skill Levels

The videos assume you know how to program (Python) and that you know statistics.

The site is clean and has a Heroku feeling to it (maybe it’s the purple and the line drawings). The videos are large and good quality and the screens are not cluttered with distractions.

Who is Cameron?

If you’re looking for indicators of authority in the domain, Cameron has them.

Cam works on Data Analytics at Shopify. He’s crunching data for a big company, 9-5.

Bayesian Methods for Hackers

Cover of Bayesian Methods for Hackers

Cameron is the author of the self-published technical book Bayesian Methods for Hackers which teaches an introduction to Bayesian using Python. It is all available on GitHub (and nbviewer IPython viewer) and has been popularized many times on technical news sites such as Hacker News and Reddit (multiple times, social proof++).

Finally, Cameron is the author of lifelines, a Python package that supports survival analysis.

Both the topics of Bayesian Methods and Survival Analysis feature in his screencasts on Data Origami.

Data Science Screencasts

I slammed through all 7 screencasts and took notes. I want to respect Cam and his resource, so here is just a summary of the videos currently available:

Note I used clever a few times. His examples are very well thought out, very cool.

UPDATE: There is a new screencast that appeared since I wrote the review.

Review

Cameron knows his stuff. I found the PCA videos less interesting personally, either because I was familiar with the content or perhaps the delivery was less polished. Diving into Bayesian uncertainty and survival analysis was awesome.

Cameron’s the boss of Bayesian. He could easily divide his book up into 10-minute chunks and I would eat it all up (hint, hint).

The videos seem to be hosted on Amazon S3, but I suffered some lag while watching. It is very possible it was the time of day I decided to watch the videos, but it was annoying at the time. Not a big deal, I could have just downloaded them and watched and I’m sure Cam will sort this out as he grows.

He is still finding his feat in terms of format. The more recent videos are a lot more polished than the early ones and a great sign of what is to come. Personally, I’d really like more “this is what we’re going to do” at the start and “this is what we did” at the end. I have to be highly caffeinated to absorb one of these videos on a first watch, even with rapid note taking. Having the screencast remind me of what we covered would be cool.

I maybe somewhat of a power user. I watch all youtube videos on 2x and take lots of notes. It would be cool if the built in player had a 2x feature and if the account supported note taking or comments. Not a big deal, just power user features that might increment happiness.

Once he gets a lot more content in there, I can imagine checkboxes for “I’ve watched this” and even bundling of videos into content-streams.

There does not appear to be a roadmap for content at this time, really just whatever takes Cam’s fancy. This is good, in that he is passionate on whatever he’s sharing, but bad initially because we have to snap to his interests. There’s no hand holding.

Cam notes that he is releasing 2 per month, so growth of the library is bounded. This might curb burn-out (like Ryan Bates from railscasts), but is only 24 per year. I power-slammed all 7 videos in one night. I expect some appetites may not be sated.

Finally, the content is pro. Some screencasts are tagged as beginners. They’re not. You will want to know your way around data and some algorithms before diving in. If you’re still deciding what tool or library to use to run your first classifier on the iris dataset, this resources is not for you.

Summary

This is a great resource with all the signs of being a must-have, with time.

  • It’s created by a real pro, a Bayesian boss.
  • It’s too cheap (raise your prices, consider offering a year/lifetime pass for a few hundred/thousand bucks).
  • It is really for intermediate level (or higher) practitioners, say peers of Cameron or close to it.
  • It only a dozen videos, but will be added to monthly.
  • It does not have a “follow me from a to b” roadmap, but he’s providing peaks at upcoming ‘casts.

If data is your day job, check out Data Origami and get in early to support Cameron and his vision for amazing world-class data science screencasts.

相關推薦

Data Science Screencasts: A Data Origami Review

Tweet Share Share Google Plus Data Origami is a new website by Cameron Davidson-Pilon that provi

Learnings from a Data Science Conference, Open Data Science Europe

Learnings from a Data Science Conference, Open Data Science EuropeLast week I attended Open Data Science Europe hosted at the Novotel, London West. This is

RAPIDS: A Data Science & Analytics Pipeline Accelerator

The RAPIDS suite of software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVID

How to Choose a Data Science and AI Consulting Company

Data science and artificial intelligence are hot media topics. An expert talking about the capabilities of predictive analytics for business on a morning T

Data Science: A Piece Of Cake

Data and Data PreprocessingSo, the first of these steps is to gather the data and process it. Just like you would buy the ingredients.You also need to make

Ask HN: Ideas for an APP,I am a Data Science Analyst and I know Django very well

I would like to see an app that shows the most used memes on a social media platform so that I know which memes not to use.

Discovering Data Science: A Chronicle

EXPLORATORY DATA ANALYSISLike any set of metrics, pitcher value metrics based on a small number of observations can drastically impact their accuracy. As a

Predictive Data Science with Amazon SageMaker and a Data Lake on AWS

This Quick Start builds a data lake environment for building, training, and deploying machine learning (ML) models with Amazon SageMaker on the Am

How Álvaro Lemos got a Machine Learning Internship on a Data Science Team

Tweet Share Share Google Plus Stories of how students and developers get started in applied mach

Data Science From Scratch: Book Review

Tweet Share Share Google Plus Programmers learn by implementing techniques from scratch. It is a

Bioconductor(Bioconductor for Genomic Data Science教程)

mic arc nbsp nba for hub 教程 enc 文件 Bioconductor for Genomic Data Science ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Bacteri

在博客園使用LaTex編輯論文級別data science文章

博客園 Go 效果 公式 過程 第一個 基本 CI 一行 第一個例子我們看看在行文過程中,我們需要一段公式: $p={12\over q}$ ,隨後我們觀察效果。再來另外一個使用\ (來做分界符的行內\(p={12\over q}\)latex公式 在下面的例子,我們有一大

Swift 4 Developing a Data-Driven App

trac art agents speech pdf drive ring modified features 代做Swift 4作業、代寫HTML/CSS/web作業、代做Data-Driven App作業、代寫CSS/web語言作業Developing a Data-D

Python data science two pandas basic

from pandas import Series import pandas as pd s=Series([1,2,'ww','tt']) s #series可以自定義索引 s2=Series(['wangxing','man',24],index=['name','sex','

Python data science thd numpy basic

Numpy最重要的一個特 (ndarray)點是其N維陣列物件,該物件是一個快速而靈活地大資料集容器 建立ndarray建立陣列最簡單的方法就是使用array函式,它接收一切陣列性的物件,然後產生一個新的含有傳入陣列的NumPy物件 data=[2,3,4] arr1=np.arra

Python data science one

在常見的資料探勘中,dirty data的內容: 缺失值,異常值,不一致的值,重複的資料以及含有特殊符號(如#,*,等) 異常值往往十分的具有價值,重視異常值的出現,分析其產生的原因,常常成為發現問題而進而改進決策的契機 異常值分析:1st進行簡單的統計量分析,最常用的是最大值,最小值,

kaggle 2018 data science bowl 細胞核分割學習筆記

一、 獲獎者解決方案 1. 第一名解決方案(Unet 0.631) 主要的貢獻 targets: 預測touching borders,將問題作為instance分割 loss function:組合交叉熵跟soft dice loss,避免pixel imbalance問題

Data Science in Python

Comprehensive learning path – Data Science in Python Journey from a Python noob to a Kaggler on Python So, you want to become a d

ANZ Chengdu Data Science Competition——BASELINE 澳新銀行存款大資料建模預測

# -*- coding: utf-8 -*- """ Created on Fri Nov 9 09:58:21 2018 @author: Lenovo """ import lightgbm as lgb import pandas as pd from sklearn.model_

"The conversion of a datetime2 data type to a datetime data type resulted in an out-of-range value

這句話的意思是將datetime2資料型別轉換為datetime資料型別會導致超出範圍的值。宣告已經終止。 在使用EF插入資料是發生列轉換的錯誤,搞了好久,不知道問題出在哪裡! 根據提示的錯誤資訊來看是Datetime資料型別出現錯誤 後來發現 public Nullable<S