Learnings from a Data Science Conference, Open Data Science Europe

阿新 • • 發佈：2018-12-29

Learnings from a Data Science Conference, Open Data Science Europe

Last week I attended Open Data Science Europe hosted at the Novotel, London West. This is one of Europe’s largest data science conferences with a focus on open source tools, and covers a incredible breadth of topics. At the conference I attended two days of machine learning training, and a number of talks. Overall I found the conference to be a great learning experience, the training in particular was very high quality. The talks were largely theoretical so I didn’t take away anything with an immediately practical business application. How it was very useful for enriching my knowledge, and hearing about some of the cutting edge developments in the field.

In the following post I am going to give a run down of some of the highlights and key learnings from the 3 days:

Scikit Learn Training — Intro to Advanced

A framework for model validation and optimisation

This two day course, taught by Andreas Müller — a research Scientist at Columbia University, and the author of this amazing

cheat sheet which you will probably have seen time and again — covered machine learning with scikit Learn from introductory to advanced concepts. He started by mentioning a few good resources for further learning, including the book he co-wrote Introduction to Machine Learning with Python, and his lecture series at Colombia which is available

here. He then walked through the scikit learn library for classification and regression, including lots of practical examples. On the second day more advanced concepts were covered including pipelines, model evaluation, and how to handle imbalanced data.

As he works on the scikit learn project along the way he mentioned some upcoming developments. These include the new ColumnTransformer, which allows you to carry out pre-processing steps in a pipeline (this he suggests would be released in the next week or two). A plot.tree function which should be available next year, creates the visualisation of a tree for tree based models.

Learning Functions, Understanding Gradient Descent, Back Propagation, and Vanishing Gradients

John D. Keller, gave an excellent talk on the mathematics behind deep learning. This was pitched at a really good level, as someone who does not yet understand all these concepts, I was able to follow the talk. He discussed that in training a deep neural network both gradient descent, and back propagation are used in tandem. Both concepts were explained well and included a walk through of the equations

For any dataset there is always a single best linear model

Towards Interpretable Deep Learning

Deep learning neural networks, are highly powerful learning algorithms, but due to their high degree of complexity can be difficult to understand. For example, in the slide image of the rooster above, is the algorithm predicting based on the shape of the rooster, or is it using the area around it as a context cue? Dr Wojciech Samek introduced a new technique called layer-wise relevance propagation which is able to determine the features in a particular input vector that have contributed most strongly to the prediction.

Dr Wojciech Samek explains how this algorithm had learned the bias “old people don’t laugh”

Techniques such as these should help to develop more trust in what are currently “black box” techniques. Dr Samek gave a number of examples, including how this technique can be used to detect bias. In a project that attempted to classify images of human faces by age, he was able to determine that the algorithm had learned a bias that older people do not laugh, and was categorising the images based on wether or not the subject appeared to be laughing in the image.

From Numbers to Narrative: Data Storytelling

This was an excellent talk by Isaac Reyes on what makes a good and bad chart, and how to apply the Gestalt principals of visual perception to tell better data stories. Some key takeaways for me were to use the insight and/or recommended action as a chart title, a great example of this is shown in the image above. Using proximity of colour to help tell the story, so you may highlight a word in the title in the same colour as the line on the chart. Showing only the data that is absolutely relevant to the story that you are telling.

Overall this was a great learning experience, and I also got to hear about some cutting edge developments in the field. Generally the talks were pitched at a level that both newcomers and experienced members of the field would understand and find useful. Probably not something I would attend every year, due to the quantity and depth of content covered, but definitely worth attending one every couple of years or so.

Learnings from a Data Science Conference, Open Data Science Europe

Learnings from a Data Science Conference, Open Data Science EuropeLast week I attended Open Data Science Europe hosted at the Novotel, London West. This is

A passionate advocate for open data

Radha Mastandrea wants to know what the universe is made of. More specifically, she wants to know about tiny pieces of it called quarks, the particles tha

Open Data Science Conference

Hosted in London, ODSC 2018 is one of the largest applied data science conferences in Europe. Our speakers include some of the core c

Chronicles of an Open affair: International Open Data Conference 2018

After three days of packed activities from session to session, it was time to explore the city more. I took a tour downtown Palermo and the Japanese Garden

A decade into open data, leading governments struggle to make it a reality

A decade into open data, leading governments struggle to make it a realityThe Open Data Barometer — Leaders Edition, published today, finds that 10 years i

aways from the first Australasian Open Science conference

#1 Open science is flourishingThe idea of open science has certainly taken hold in recent years and is growing in a number of exciting and complementary di

Power from wind: Open data on AWS

Data that describe processes in a spatial context are everywhere in our day-to-day lives and they dominate big data problems. Map data, for instan

Applied Data Science Associate | Open position

Our Data Science team is composed of people from around the globe united by their passion for predicting the future. From top Kagglers to engineers with d

Data Science Evangelist | Open position

Our Data Science team is composed of people from around the globe united by their passion for predicting the future. From top Kagglers to engineers with d

Develop and Extract Value from Open Data

Open data is fostering new opportunities for innovation, both in terms of entrepreneurship and public service. AWS embraces open data, providing t

使用windows server2012時FileZilla客戶端連接時報150 Opening data channel for directory listing of "/" 響應:425 Can't open data connection

nec 響應但是 .cn 簡單 family 問題用戶中間 425 Can‘t open data connection 和讀取目錄列表失敗問題解決這個問題主要是由於使用Passive Mode模式造成的，解決這個問題很簡單： 1、在ftp服務軟件中設置指定端口

Learnings from a Data Science Conference, Open Data Science Europe

Learnings from a Data Science Conference, Open Data Science Europe

Scikit Learn Training — Intro to Advanced

Learning Functions, Understanding Gradient Descent, Back Propagation, and Vanishing Gradients

Towards Interpretable Deep Learning

From Numbers to Narrative: Data Storytelling

Learnings from a Data Science Conference, Open Data Science Europe

A passionate advocate for open data

Open Data Science Conference

Chronicles of an Open affair: International Open Data Conference 2018

A decade into open data, leading governments struggle to make it a reality

aways from the first Australasian Open Science conference

Power from wind: Open data on AWS

Applied Data Science Associate | Open position

Data Science Evangelist | Open position

Develop and Extract Value from Open Data

使用windows server2012時FileZilla客戶端連接時報150 Opening data channel for directory listing of "/" 響應:425 Can't open data connection

[knowledge][dpdk] open data plane

小程式學習之旅----open-data web-view 以及 canvas、map

解決FileZilla_Server:425 Can't open data connection

FileZilla FTP 425 Can't open data connection 問題解決辦法阿星小棧

LiveScan3D: A Fast and Inexpensive 3D Data Acquisition System for Multiple Kinect v2 Sensors

File System, Kernel Data Structures, and Open Files(檔案系統，核心資料結構，與開啟檔案)

解決FileZilla_Server:425 Can't open data connection

A Small Definition of Big Data

open-data頭像設定圓角

Learnings from a Data Science Conference, Open Data Science Europe

Learnings from a Data Science Conference, Open Data Science Europe

Scikit Learn Training — Intro to Advanced

Learning Functions, Understanding Gradient Descent, Back Propagation, and Vanishing Gradients

Towards Interpretable Deep Learning

From Numbers to Narrative: Data Storytelling

相關推薦