機器學習開源庫和專案總結
A curated list of awesome machine learning frameworks, libraries and software (by language). Inspired by awesome-php. Other awesome lists can be found in the awesome-awesomeness list.
If you want to contribute to this list (please do), send me a pull request or contact me @josephmisiti
For a list of free machine learning books available for download, go
Table of Contents
C
General-Purpose Machine Learning
- Recommender - A C library for product recommendations/suggestions using collaborative filtering (CF).
- Accord-Framework -The Accord.NET Framework is a complete framework for building machine learning, computer vision, computer audition, signal processing and statistical applications.
Computer Vision
- CCV - C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library
- VLFeat - VLFeat is an open and portable library of computer vision algorithms, which has Matlab toolbox
C++
Computer Vision
- OpenCV - OpenCV has C++, C, Python, Java and MATLAB interfaces and supports Windows, Linux, Android and Mac OS.
- DLib - DLib has C++ and Python interfaces for face detection and training general object detectors.
- EBLearn - Eblearn is an object-oriented C++ library that implements various machine learning models
General-Purpose Machine Learning
- DLib - A suite of ML tools designed to be easy to imbed in other applications
- ecogg
- shark
- Vowpal Wabbit (VW) - A fast out-of-core learning system.
- sofia-ml - Suite of fast incremental algorithms.
- Shogun - The Shogun Machine Learning Toolbox
- Caffe - A deep learning framework developed with cleanliness, readability, and speed in mind. [DEEP LEARNING]
- CXXNET - Yet another deep learning framework with less than 1000 lines core code [DEEP LEARNING]
- XGBoost - A parallelized optimized general purpose gradient boosting library.
- CUDA - This is a fast C++/CUDA implementation of convolutional [DEEP LEARNING]
- Stan - A probabilistic programming language implementing full Bayesian statistical inference with Hamiltonian Monte Carlo sampling
- BanditLib - A simple Multi-armed Bandit library.
- Timbl - A software package/C++ library implementing several memory-based learning algorithms, among which IB1-IG, an implementation of k-nearest neighbor classification, and IGTree, a decision-tree approximation of IB1-IG. Commonly used for NLP.
Natural Language Processing
- MIT Information Extraction Toolkit - C, C++, and Python tools for named entity recognition and relation extraction
- CRF++ - Open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data & other Natural Language Processing tasks.
- BLLIP Parser - BLLIP Natural Language Parser (also known as the Charniak-Johnson parser)
- colibri-core - C++ library, command line tools, and Python binding for extracting and working with with basic linguistic constructions such as n-grams and skipgrams in a quick and memory-efficient way.
- ucto - Unicode-aware regular-expression based tokeniser for various languages. Tool and C++ library. Supports FoLiA format.
- frog - Memory-based NLP suite developed for Dutch: PoS tagger, lemmatiser, dependency parser, NER, shallow parser, morphological analyser.
Speech Recognition
- Kaldi - Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers.
Sequence Analysis
- ToPS - This is an objected-oriented framework that facilitates the integration of probabilistic models for sequences over a user defined alphabet.
Clojure
Natural Language Processing
- Clojure-openNLP - Natural Language Processing in Clojure (opennlp)
- Infections-clj - Rails-like inflection library for Clojure and ClojureScript
General-Purpose Machine Learning
- Touchstone - Clojure A/B testing library
- Clojush - he Push programming language and the PushGP genetic programming system implemented in Clojure
- Infer - Inference and machine learning in clojure
- Clj-ML - A machine learning library for Clojure built on top of Weka and friends
- Encog - Clojure wrapper for Encog (v3) (Machine-Learning framework that specialises in neural-nets)
- Fungp - A genetic programming library for Clojure
- Statistiker - Basic Machine Learning algorithms in Clojure.
- clortex - General Machine Learning library using Numenta’s Cortical Learning Algorithm
- comportex - Functionally composable Machine Learning library using Numenta’s Cortical Learning Algorithm
Data Analysis / Data Visualization
- Incanter - Incanter is a Clojure-based, R-like platform for statistical computing and graphics.
- PigPen - Map-Reduce for Clojure.
- Envision - Clojure Data Visualisation library, based on Statistiker and D3 ## Erlang
General-Purpose Machine Learning
- Disco - Map Reduce in Erlang
Go
Natural Language Processing
- go-porterstemmer - A native Go clean room implementation of the Porter Stemming algorithm.
- paicehusk - Golang implementation of the Paice/Husk Stemming Algorithm.
- snowball - Snowball Stemmer for Go.
- go-ngram - In-memory n-gram index with compression.
General-Purpose Machine Learning
- Go Learn - Machine Learning for Go
- go-pr - Pattern recognition package in Go lang.
- bayesian - Naive Bayesian Classification for Golang.
- go-galib - Genetic Algorithms library written in Go / golang
- Cloudforest - Ensembles of decision trees in go/golang.
- gobrain - Neural Networks written in go
Data Analysis / Data Visualization
Haskell
General-Purpose Machine Learning
- haskell-ml - Haskell implementations of various ML algorithms.
- HLearn - a suite of libraries for interpreting machine learning models according to their algebraic structure.
- hnn - Haskell Neural Network library.
- hopfield-networks - Hopfield Networks for unsupervised learning in Haskell.
Java
Natural Language Processing
- CoreNLP - Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words
- Stanford Parser - A natural language parser is a program that works out the grammatical structure of sentences
- Stanford POS Tagger - A Part-Of-Speech Tagger (POS Tagger
- Stanford Name Entity Recognizer - Stanford NER is a Java implementation of a Named Entity Recognizer.
- Stanford Word Segmenter - Tokenization of raw text is a standard pre-processing step for many NLP tasks.
- Tregex, Tsurgeon and Semgrex - Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions").
- Stanford English Tokenizer - Stanford Phrasal is a state-of-the-art statistical phrase-based machine translation system, written in Java.
- Stanford Tokens Regex - A tokenizer divides text into a sequence of tokens, which roughly correspond to "words"
- Stanford Temporal Tagger - SUTime is a library for recognizing and normalizing time expressions.
- Stanford SPIED - Learning entities from unlabeled text starting with seed sets using patterns in an iterative fashion
- Stanford Topic Modeling Toolbox - Topic modeling tools to social scientists and others who wish to perform analysis on datasets
- Twitter Text Java - A Java implementation of Twitter's text processing library
- MALLET - A Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
- OpenNLP - a machine learning based toolkit for the processing of natural language text.
- LingPipe - A tool kit for processing text using computational linguistics.
- ClearTK - ClearTK provides a framework for developing statistical natural language processing (NLP) components in Java and is built on top of Apache UIMA.
- Apache cTAKES - Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) is an open-source natural language processing system for information extraction from electronic medical record clinical free-text.
General-Purpose Machine Learning
- Datumbox - Machine Learning framework for rapid development of Machine Learning and Statistical applications
- ELKI - Java toolkit for data mining. (unsupervised: clustering, outlier detection etc.)
- H2O - ML engine that supports distributed learning on data stored in HDFS.
- htm.java - General Machine Learning library using Numenta’s Cortical Learning Algorithm
- java-deeplearning - Distributed Deep Learning Platform for Java, Clojure,Scala
- JAVA-ML - A general ML library with a common interface for all algorithms in Java
- JSAT - Numerous Machine Learning algoirhtms for classification, regresion, and clustering.
- Mahout - Distributed machine learning
- Meka - An open source implementation of methods for multi-label classification and evaluation (extension to Weka).
- MLlib in Apache Spark - Distributed machine learning library in Spark
- Neuroph - Neuroph is lightweight Java neural network framework
- ORYX - Simple real-time large-scale machine learning infrastructure.
- RankLib - RankLib is a library of learning to rank algorithms
- RapidMiner - RapidMiner integration into Java code
- Stanford Classifier - A classifier is a machine learning tool that will take data items and place them into one of k classes.
- WalnutiQ - object oriented model of the human brain
- Weka - Weka is a collection of machine learning algorithms for data mining tasks
Speech Recognition
- CMU Sphinx - Open Source Toolkit For Speech Recognition purely based on Java speech recognition library.
Data Analysis / Data Visualization
- Hadoop - Hadoop/HDFS
- Spark - Spark is a fast and general engine for large-scale data processing.
- Impala - Real-time Query for Hadoop
Javascript
Natural Language Processing
- Twitter-text-js - A JavaScript implementation of Twitter's text processing library
- NLP.js - NLP utilities in javascript and coffeescript
- natural - General natural language facilities for node
- Knwl.js - A Natural Language Processor in JS
- Retext - Extensible system for analysing and manipulating natural language
- TextProcessing - Sentiment analysis, stemming and lemmatization, part-of-speech tagging and chunking, phrase extraction and named entity recognition.
Data Analysis / Data Visualization
- D3.js
- dc.js
- D3xter - Straight forward plotting built on D3
- statkit - Statistics kit for JavaScript
- science.js - Scientific and statistical computing in JavaScript.
- Z3d - Easily make interactive 3d plots built on Three.js
General-Purpose Machine Learning
- Convnet.js - ConvNetJS is a Javascript library for training Deep Learning models[DEEP LEARNING]
- Clustering.js - Clustering algorithms implemented in Javascript for Node.js and the browser
- Decision Trees - NodeJS Implementation of Decision Tree using ID3 Algorithm
- Node-fann - FANN (Fast Artificial Neural Network Library) bindings for Node.js
- Kmeans.js - Simple Javascript implementation of the k-means algorithm, for node.js and the browser
- LDA.js - LDA topic modeling for node.js
- Learning.js - Javascript implementation of logistic regression/c4.5 decision tree
- Machine Learning - Machine learning library for Node.js
- Node-SVM - Support Vector Machine for nodejs
- Brain - Neural networks in JavaScript
- Bayesian-Bandit - Bayesian bandit implementation for Node and the browser.
- Synaptic - Architecture-free neural network library for node.js and the browser
- kNear - JavaScript implementation of the k nearest neighbors algorithm for supervised learning
Julia
General-Purpose Machine Learning
- PGM - A Julia framework for probabilistic graphical models.
- DA - Julia package for Regularized Discriminant Analysis
- Regression - Algorithms for regression analysis (e.g. linear regression and logistic regression)
- Local Regression - Local regression, so smooooth!
- Naive Bayes - Simple Naive Bayes implementation in Julia
- Mixed Models - A Julia package for fitting (statistical) mixed-effects models
- Simple MCMC - basic mcmc sampler implemented in Julia
- Distance - Julia module for Distance evaluation
- Decision Tree - Decision Tree Classifier and Regressor
- Neural - A neural network in Julia
- MCMC - MCMC tools for Julia
- GLM - Generalized linear models in Julia
- GLMNet - Julia wrapper for fitting Lasso/ElasticNet GLM models using glmnet
- Clustering - Basic functions for clustering data: k-means, dp-means, etc.
- SVM - SVM's for Julia
- Kernal Density - Kernel density estimators for julia
- NMF - A Julia package for non-negative matrix factorization
- ANN - Julia artificial neural networks
- Mocha.jl - Deep Learning framework for Julia inspired by Caffe
- XGBoost.jl - eXtreme Gradient Boosting Package in Julia
Natural Language Processing
- Topic Models - TopicModels for Julia
- Text Analysis - Julia package for text analysis
Data Analysis / Data Visualization
- Graph Layout - Graph layout algorithms in pure Julia
- Data Frames Meta - Metaprogramming tools for DataFrames
- Julia Data - library for working with tabular data in Julia
- Data Read - Read files from Stata, SAS, and SPSS
- Hypothesis Tests - Hypothesis tests for Julia
- Gadfly - Crafty statistical graphics for Julia.
-
Stats - Statistical tests for Julia
-
RDataSets - Julia package for loading many of the data sets available in R
- DataFrames - library for working with tabular data in Julia
- Distributions - A Julia package for probability distributions and associated functions.
- Data Arrays - Data structures that allow missing values
- Time Series - Time series toolkit for Julia
- Sampling - Basic sampling algorithms for Julia
Misc Stuff / Presentations
- DSP - Digital Signal Processing (filtering, periodograms, spectrograms, window functions).
- SignalProcessing - Signal Processing tools for Julia
- Images - An image library for Julia
Lua
General-Purpose Machine Learning
-
- cephes - Cephes mathematical functions library, wrapped for Torch. Provides and wraps the 180+ special mathematical functions from the Cephes mathematical library, developed by Stephen L. Moshier. It is used, among many other places, at the heart of SciPy.
- graph - Graph package for Torch
- randomkit - Numpy's randomkit, wrapped for Torch
-
signal - A signal processing toolbox for Torch-7. FFT, DCT, Hilbert, cepstrums, stft
-
nn - Neural Network package for Torch
- nngraph - This package provides graphical computation for nn library in Torch7.
- nnx - A completely unstable and experimental package that extends Torch's builtin nn library
- optim - An optimization library for Torch. SGD, Adagrad, Conjugate-Gradient, LBFGS, RProp and more.
- unsup - A package for unsupervised learning in Torch. Provides modules that are compatible with nn (LinearPsd, ConvPsd, AutoEncoder, ...), and self-contained algorithms (k-means, PCA).
- manifold - A package to manipulate manifolds
- svm - Torch-SVM library
- lbfgs - FFI Wrapper for liblbfgs
- vowpalwabbit - An old vowpalwabbit interface to torch.
- OpenGM - OpenGM is a C++ library for graphical modeling, and inference. The Lua bindings provide a simple way of describing graphs, from Lua, and then optimizing them with OpenGM.
- sphagetti - Spaghetti (sparse linear) module for torch7 by @MichaelMathieu
- LuaSHKit - A lua wrapper around the Locality sensitive hashing library SHKit
- kernel smoothing - KNN, kernel-weighted average, local linear regression smoothers
- cutorch - Torch CUDA Implementation
- cunn - Torch CUDA Neural Network Implementation
- imgraph - An image/graph library for Torch. This package provides routines to construct graphs on images, segment them, build trees out of them, and convert them back to images.
- videograph - A video/graph library for Torch. This package provides routines to construct graphs on videos, segment them, build trees out of them, and convert them back to videos.
- saliency - code and tools around integral images. A library for finding interest points based on fast integral histograms.
- stitch - allows us to use hugin to stitch images and apply same stitching to a video sequence
- sfm - A bundle adjustment/structure from motion package
- fex - A package for feature extraction in Torch. Provides SIFT and dSIFT modules.
- OverFeat - A state-of-the-art generic dense feature extractor
- Lunum
Demos and Scripts
- Core torch7 demos repository.
- linear-regression, logistic-regression
- face detector (training and detection as separate demos)
- mst-based-segmenter
- train-a-digit-classifier
- train-autoencoder
- optical flow demo
- train-on-housenumbers
- train-on-cifar
- tracking with deep nets
- kinect demo
- filter-bank visualization
- saliency-networks
- Music Tagging - Music Tagging scripts for torch7
- torch-datasets - Scripts to load several popular datasets including:
- BSR 500
- CIFAR-10
- COIL
- Street View House Numbers
- MNIST
- NORB
- Atari2600 - Scripts to generate a dataset with static frames from the Arcade Learning Environment
Matlab
Computer Vision
- Contourlets - MATLAB source code that implements the contourlet transform and its utility functions.
- Shearlets - MATLAB code for shearlet transform
-
相關推薦
機器學習開源庫和專案總結
A curated list of awesome machine learning frameworks, libraries and software (by language). Inspired by awesome-php. Other awesome li
2018 年 8 月以來 5 個最好的機器學習 GitHub 庫和 Reddit 執行緒.md
2018 年 8 月以來 5 個最好的機器學習 GitHub 專案和 Reddit 熱帖 PRANAV DAR, SEPTEMBER 2, 2018 前言 當我去年年初開始使用 GitHub 時,我從來沒有想過它對我來說有多麼有用。最初我只是用它來上傳我自己
機器學習開源庫
以下是根據不同語言型別和應用領域收集的各類工具庫,持續更新中。 C 通用機器學習 Recommender- 一個產品推薦的C語言庫,利用了協同過濾. 計算機視覺 CCV -C-based/Ca
一文盤點近期熱門機器學習開源專案!(研究框架、AutoML庫、深度學習...)
授權自AI科技大本營(ID:rgznai100)本文共1029字,建議閱讀5分鐘。本文為你從過去
頂級的20名Python人工智慧和機器學習開源專案
本文用Python更新了頂級的AI和機器學習專案。Tensorflow已經成為了貢獻者的三位數增
機器學習開源演算法庫
C++計算機視覺 CCV —基於C語言/提供快取/核心的機器視覺庫,新穎的機器視覺庫 OpenCV—它提供C++, C, Python, Java 以及 MATLAB介面,並支援Windows, Linux, Android and Mac OS作業系統。
10月機器學習開源專案Top10
參加 2018 AI開發者大會,請點選 ↑↑↑ 作者 | Mybridge 譯者 | 林春眄 整理 | Jane 出品 | AI科技大本營 【導讀】過去一個月裡,我們對近 250 個機器學習開源專案進行了排名,並挑選出熱度前 10 的專案。這份清單
Github近期最有趣的10款機器學習開源專案
https://yq.aliyun.com/ziliao/294260 Face Recognition 世界上最簡單的人臉識別庫 Github近期最有趣的10款機器學習開源專案 本專案號稱世界上最簡單的人臉識別庫,可使用 Python 和命令列進行呼叫。該庫使用 dlib
機器學習開源專案
開源機器學習專案 30 個:原文地址 FastText:用於快速文字表示和分類的庫,基於快速文字的多語言無監督或監督詞嵌入 深色照片風格轉換:論文“深度照片風格轉移”程式碼和資料 Python和世界上最簡單的面部識別api和命令列 洋紅(Magenta):機器智慧生成音樂和藝
2018年10月Top 10機器學習開源專案
上個月MyBridge從250餘個新增機器學習開源專案中評選出了10個最佳專案: 這些專案在GitHub上平均獲得1345個star 專案涵蓋話題:深度學習,漫畫上色,影象增強,增強學習,資料庫 No.1 Fastai:利用當前最好的深度學習演算法簡化訓練神經網路的過程,包含了很多“開箱即用”
最新機器學習開源專案Top10
作者 | Mybridge 譯者 | Linstancy 整理 | Jane 出品 | AI科技大本營 【導讀】過去一個月裡,我們對近 1400 個機器學習專案進行了排名,並挑選出熱度前 10 的專案。這份清單涵蓋了包括 OpenAI 最新開發的 RN
11月最佳機器學習開源專案Top10!
整理 | Jane 出品 | AI科技大本營 過去一個月,我們從近 250 個機器學習開源專案中挑選出了最受大家關注的前十名。這些專案在 GitHub 上平均 Stars 數為 2713。這些專案涉及由 Google AI Research 開源的 BER
機器學習系統設計和診斷方法學習總結
過擬合:對訓練資料擬合精準,但是對未知的資料預測能力差 如何應對? 2、丟棄一些不能幫助正確預測的特徵。 2.1、手工選擇丟棄特徵 2.2、使用模型選擇方法(如PCA) 3、正則化。保留所有的特徵,減少引數的大小 預防過擬合的方法步驟: 1、打亂資料集;2、劃分資料:70%
年度大盤點:機器學習開源專案及框架
我們先來看看Mybridge AI 中排名靠前的頂級開源專案,再聊聊機器學習今年都有哪些發展,最後探尋下新的一年中會有哪些有值得我們期待的事情。 頂級的開源專案 BERT BERT,全稱為Bidirectional Encoder Representations from
30個超讚的機器學習開源專案!
Medium上的作者Mybridge從8800個專案中,挑選出了30個GitHub上收穫了超多星星的機器學習專案,量子位搬運一下,希望大家學的開心~ 注:此份列表的星星數量僅供參考,因為,GitHub上的星星數量是動態變化的。 No 1 | FastText 用於快速文字表示和分類的庫。
最適合練手的10大機器學習開源專案,趕緊收藏!
本文推薦的10大機器學習開源專案是由Mybridge從250個機器學習開源專案中挑選出來的,Gi
10大機器學習開源專案推薦(Github平均star為1385)
翻譯 | suisui出品 | 人工智慧頭條(AI_Thinker)本文推薦的10大機器學習開源專案是由Myb
十大Python機器學習開源專案
1、Scikit-learn 用於資料探勘和資料分析的簡單而有效的工具,基於NumPy,SciPy和matplotlib,開源,商業可用的BSD許可證。 2、Tensorflow 最初由Google機器智慧研究機構的Google Brain小組的研究人員和工程師開發
機器學習:貝葉斯總結_3:線性迴歸和貝葉斯迴歸
線性迴歸的基函式模型 y(x,w)=w0+w1x1+......+wDxD y(x,w)=w0+∑M−1j=1wjϕj(x) ϕj(x):是基函數 基函式:多項式;高斯;sigmoid函式 基函
20 個頂尖的 Python 機器學習開源專案
1. Scikit-learn www.github.com/scikit-learn/scikit-learn Scikit-learn 是基於Scipy為機器學習建造的的一個Python模組,他的特色就是多樣化的分類,迴歸和聚類的演算法包括支援向量機,邏輯迴歸,樸