Marginally Interesting: Machine Learning and Data Sets

I’ve been busy taking care of my 11 month old daughter lately which leaves almost no time to do something as remotely useful as posting on my blog - not that I have been doing it more often when I was still working full time. At the same time you get a lot of ideas and potentially interesting insights, now that your brain has time to idle now and then, for example while picking up toys thrown to the ground again and again.

Anyway, I lately came to think that machine learners as a whole should devote much more time to working with actual data. In particular machine learners who think of themselves as being “method guys” (yes, this also includes me). It usually works likes this: You have some technique you really like a lot and you use it to extend an already existing method until you come up with something you think is really neat. It may have some interesting properties other algorithms don’t have, and you really would like to write a paper on it.

But then, the problem starts, because in order to prove that your extension is actually useful, you will have to prove that it makes a difference practically. So you go around your group asking colleagues if they have or know of some intersting data set. We call this the “have method, need data” phenomenon.

Of course, if you had started with a concrete application in mind, you would never have to ask yourself “oh, this is great, but what is it good for?”

Also, in machine learning, the formally defined problems we have are very abstract (like minimizing the expected risk from i.i.d. drawn data points), and many of the actual challenge are actually only “defined” by specific data sets.

Anyway, data wrangling have recently posted a huge list of links to data sets on the web which is certainly an interesting starting point.

And yes, if you already have your method, you might also find some interesting “real world application” there.

Posted by at 2008-02-24 12:24:00 +0100

Marginally Interesting: Machine Learning and Data Sets

Marginally Interesting: Machine Learning and Data Sets

Marginally Interesting: Machine Learning and Composability

Marginally Interesting: Machine Learning Feeds and Twitterers

機器學習（Machine Learning and Data Mining）CS 5751——Lab1作業記錄

AI, Machine Learning and Data Science Announcements from Microsoft Ignite

Marginally Interesting: Machine Learning Feeds (Update)

Marginally Interesting: Machine Learning Feed Updates II

Marginally Interesting: Machine Learning: Beyond Prediction Accuracy

Marginally Interesting: Machine Learning Twibe

Machine learning and data are fueling a new kind of car, brought to you by Intel

How do you explain Machine Learning and Data Mining to a layman?

Top 10 Machine Learning, Deep Learning, and Data Science Courses for Beginners (Python and R)

Marginally Interesting: More Google Big Data papers: Megastore and Spanner

C extensions, Cleaning data with Pandas, Machine Learning and more

bcr vidcast 112: Machine learning and how we will deal with it

Machine Learning and Security: Hope or Hype?

World Examples of Machine Learning and AI Analytics Insight | AITopics

How AI, Machine Learning and Deep Learning are Differed

What is the difference between Machine Learning and Artificial Intelligence?

The Key Differences Between Machine Learning and Artificial Intelligence

Marginally Interesting: Machine Learning and Data Sets

相關推薦