1. 程式人生 > >Best Programming Language for Machine Learning

Best Programming Language for Machine Learning

A question I get asked a lot is:

What is the best programming language for machine learning?

I’ve replied to this question many times now it’s about time to explore this further in a blog post.

Ultimately, the programming language you use for machine learning should consider your own requirements and predilections. No one can meaningfully address those concerns for you.

No one can meaningfully address those concerns for you.

What Languages Are Being Used

Before I give you my opinion, it is good to have  a look around to see what languages and platforms are popular in self-selected communities of data analysis and machine learning professionals.

KDnuggets has had language polls forever. A recent poll is titled “

What programming/statistics languages you used for an analytics / data mining / data science work in 2013“. The trends are almost identical to the previous year. The results suggest heavy use of R and Python and SQL for data access. SAS and MATLAB rank higher than I would have expected. I’d expect SAS accounts for larger corporate (Fortune 500) data analysis and MATLAB for engineering, research and student use.

kdnuggets popular programming languages

The most popular platforms for machine learning, taken from the KDnuggets 2013 poll.

Kaggle offer machine learning competitions and have polled their user base as to the tools and programming languages used by participants in competitions. They posted results in 2011 titled Kagglers’ Favorite Tools (also see the forum discussion). The results suggested the abundant use of R. The results also show good use of MATLAB and SAS with much lower Python representation. I can attest that I prefer R over Python for competition work. It just feels though it has more on offer in terms of data analysis and algorithm selection.

Ben Hamner, Kaggle Admin and author of the blog post above on the Kaggle blog goes into more detail on the options when it comes to programming languages for machine learning in a forum post titled “What tools do people generally use to solve problems“.

Ben comments that MATLAB/Octave is a good language for matrix operations and can be good when working with a well defined feature matrix. Python is fragmented by comprehensive and can be very slow unless you drop into C. He prefers Python when not working with a well defined feature matrix and uses Pandas and NLTK. Ben comments that “As a general rule, if it’s found to be interesting for statisticians, it’s been implemented in R” (well said). He also complains about the language itself being ugly and painful to work with. Finally, Ben comments on Julia that doesn’t have much to offer in the way of libraries but is his new favorite language. He comments that it has the conciseness of languages like MATLAB and Python with the speed of C.

Anthony Goldbloom, the CEO of Kaggle gave a presentation to the Bay Area R user group in 2011 on the popularity of R in Kaggle competitions titled Predictive modeling competitions: making data science a sport (see the powerpoint slides). The presentation slides give more detail on the use of programming languages and suggest an Other category that is as close to as large as large as the usage of R. It would be nice to have the raw data that was collected (why didn’t they release it to their own data community, seriously!?).

popular languages on kaggle

Popular programming languages on Kaggle, taken from Kaggle presentation.

John Langford on his blog Hunch has an excellent article on the properties of a programming language to consider when working with machine learning algorithms titled “Programming Languages for Machine Learning Implementations“. He divides the properties into concerns of speed and the concerns of programability (programming ease). He points to powerful industry standard implementations of algorithms, all in C and comments that he has not used R or MATLAB (the post was written 8 years ago). Take some time and read some of the comments by academics and industry specialists alike. This is a deep and nuanced problem that really comes down to the specifics of the problem you are solving and the environment in which you are solving it.

Machine Learning Languages

I think of programming languages in the context of the machine learning activities I want to perform.

MATLAB/Octave

I think MATLAB is excellent for representing and working with matrices. As such, I think it’s an excellent language or platform to use when climbing into the linear algebra of a given method. I think it’s suited to learning about algorithms both superficially the first time around and deeply when you are trying to figure something out or go deep into the method. For example, it’s popular in university courses for beginners, like Andrew Ng’s Coursera Machine Learning course.

R

R is a workhorse for statistical analysis and by extension machine learning. Much talk is given to the learning curve, I didn’t really see the problem. It is the platform to use to understand and explore your data using statistical methods and graphs. It has an enormous number of machine learning algorithms, and advanced implementations too written by the developers of the algorithm.

I think you can explore, model and prototype with R. I think it suits one-off projects with an artifact like a set of predictions, report or research paper. For example, it is the most popular platform for machine learning competitors such as Kaggle.

Python

Python if a popular scientific language and a rising star for machine learning. I’d be surprised if it can take the data analysis mantle from R, but matrix handling in NumPy may challenge MATLAB and communication tools like IPython are very attractive and a step into the future of reproducibility.

I think the SciPy stack for machine learning and data analysis can be used for one-off projects (like papers), and frameworks like scikit-learn are mature enough to be used in production systems.

Java-family/C-family

Implementing a system that uses machine learning is an engineering challenge like any other. You need good design and developed requirements. Machine learning is algorithms, not magic. When it comes to serious production implementations, you need a robust library or you customize an implementation of the algorithm for your needs.

There are robust libraries, for example, Java has Weka and Mahout. Also, note that the deeper implementations of core algorithms like regression (LIBLINEAR) and SVM (LIBSVM) are written in C and leveraged by Python and other toolkits. I think you are serious you may prototype in R or Python, but you will implement in a heavier language for reasons such as execution speed and system reliability. For example, the backend of BigML is implemented in Clojure.

Other Concerns

  • Not a Programmer: If you are not a programmer (or not a confident programmer) I recommend playing machine learning via a GUI interface like Weka.
  • One Language for Research and Ops: You may want to use the same language for prototyping and for production to reduce risk of not effectively transferring the results.
  • Pet Language: You may have a pet language of favorite language and want to stick to that. You can implement algorithms yourself or leverage libraries. Most languages have some form of machine learning package, however primitive.

The question of machine learning programming language is popular on blogs and question and answer sites. A few choice discussions include:

What programming language do you use for machine learning and data analysis why do you recommend it?

I’m keen to hear your thoughts, leave a comment.

相關推薦

Best Programming Language for Machine Learning

Tweet Share Share Google Plus A question I get asked a lot is: What is the best programming lang

The 50 Best Public Datasets for Machine Learning

The 50 Best Public Datasets for Machine LearningWhat are some open datasets for machine learning? After scrapping the web for hours after hours, we have cr

[Infographic] The Best Tools for Machine Learning Gengo AI

Machine learning projects can range from small datasets and standard algorithms, to much larger projects that use neural networks engines with massive data

Best Books For Machine Learning in R

Tweet Share Share Google Plus R is a powerful platform for data analysis and machine learning. I

斯坦福大學公開課機器學習:machine learning system design | data for machine learning(數據量很大時,學習算法表現比較好的原理)

ali 很多 好的 info 可能 斯坦福大學公開課 數據 div http 下圖為四種不同算法應用在不同大小數據量時的表現,可以看出,隨著數據量的增大,算法的表現趨於接近。即不管多麽糟糕的算法,數據量非常大的時候,算法表現也可以很好。 數據量很大時,學習算法表現比

Statistical Methods for Machine Learning

AS n-2 cal 元素 n) pan size AC 情況 機器學習中的統計學方法。 統計學是機器學習的一個支柱。 原始觀察僅僅是數據, 但它們不是信息或知識。數據引發問題, 例如: 什麽是最常見的或預期的觀察? 觀察的限制是什麽? 數據是什麽樣子的?

《C4.5: Programs for Machine Learning》chaper4實驗結果重現

使用自帶的vote資料集: 實驗結果如下: 剪枝前: physician fee freeze = n: | adoption of the budget resolution = y: democrat (151.0) | adoption of the budget resolution

the resource for machine learning

Questions and Answers What's matrix dot product in Deep Learning? Deep Neural Network with Matrices https://matrices.io/deep-neural-network-from-scrat

Facebook's PyTorch plans to light the way to speedy workflows for Machine Learning • DEVCLASS

Facebook's development department has finished a first release candidate for v1 of its PyTorch project – just in time for the first conference dedicated to

Essential libraries for Machine Learning in Python

Python is often the language of choice for developers who need to apply statistical techniques or data analysis in their work. It is also used by data scie

Ask HN: Is there a programming language for biology?

Specifically wondering about a high level abstraction that can generate specific lower level genetic circuits and maps.

Best 20 AI and machine learning blogs to follow religiously Gengo AI

aitopics.org uses cookies to deliver the best possible experience. By continuing to use this site, you consent to the use of cookies.  Learn more » I und

What’s the Best Programming Language to Learn?

What’s the Best Programming Language to Learn?Where should you start learning?Nowadays, there are so many programming languages available in the market, an

Top 10 Open Image Datasets for Machine Learning Research

This article would succinctly describe the best ten image datasets used for certain fundamental computer vision problems such as classification, detecti

Why Data Normalization is necessary for Machine Learning models

Why Data Normalization is necessary for Machine Learning modelsNormalization is a technique often applied as part of data preparation for machine learning.

NXP Owns the Stage for Machine Learning in Edge Devices

SAN JOSE, Calif. and BARCELONA, Spain, Oct. 16, 2018 (GLOBE NEWSWIRE) -- (ARMTECHCON and IoT World Congress Barcelona) - Mathematical advances that are dri

NXP's New Development Platform for Machine Learning in the IoT

NXP Semiconductors has launched a new machine learning toolkit. Called "eIQ", it's a software development platform that supports popular neural network fra

Free Online Course: Neural Networks for Machine Learning from Coursera Class Central

I honestly can't understand the multiple 5 star reviews presented on this site about the course. I'm giving it a 1 star which is a bit harsh I know but I'm

the best programming language

When I want to take a break at work, I sometimes read technology forums. And there is one kind of posts that I really like: the flame wars between program