Anaconda Python 是完全免費的企業級的Python發行大規模資料處理、預測分析和科學計算工具。
Anaconda 是 Python 科學技術包的合集,功能和 Python(x,y) 類似。它是新起之秀,已更新多次了。包管理使用 conda,GUI基於PySide,容量適中,但該有的科學計算包都有。Anaconda 支援所有作業系統平臺,它的安裝、更新和刪除都很方便,且所有的東西都只安裝在一個目錄中。Anaconda目前提供Python 2.6.X,Python 2.7.X,Python 3.3.X和Python 3.4.X四個系列發行包,這也是其他發行版所望塵莫及的。




第一部分 科學計算相關包

IPython provides a rich architecture for interactive computing with:
1)Powerful interactive shells (terminal and Qt-based).
2)A browser-based notebook with support for code, text, mathematical expressions, inline plots and other rich media.
3)Support for interactive data visualization and use of GUI toolkits.
4)Flexible, embeddable interpreters to load into your own projects.
5)Easy to use, high performance tools for parallel computing.

“iPython 是一個Python 的互動式Shell,比預設的Python Shell 好用得多,功能也更強大。 她支援語法高亮、自動完成、程式碼除錯、物件自省,支援 Bash Shell 命令,內建了許多很有用的功能和函式等,非常容易使用。 ” 啟動iPython的時候用這個命令“ipython –pylab”,預設開啟了matploblib的繪圖互動,用起來很方便。

2. numpy

NumPy is the fundamental package for scientific computing with Python. It contains among other things:
1)a powerful N-dimensional array object
2)sophisticated (broadcasting) functions
3)tools for integrating C/C++ and Fortran code
4) useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

NumPy幾乎是一個無法迴避的科學計算工具包,最常用的也許是它的N維陣列物件,其他還包括一些成熟的函式庫,用於整合C/C++和Fortran程式碼的工具包,線性代數、傅立葉變換和隨機數生成函式等。NumPy提供了兩種基本的物件:ndarray(N-dimensional array object)和 ufunc(universal function object)。ndarray是儲存單一資料型別的多維陣列,而ufunc則是能夠對陣列進行處理的函式。

3. scipy: Python Data Analysis Library

SciPy refers to several related but distinct entities:
1)The SciPy Stack, a collection of open source software for scientific computing in Python, and particularly a specified set of core packages.
2)The community of people who use and develop this stack.
3)Several conferences dedicated to scientific computing in Python – SciPy, EuroSciPy and SciPy.in.
4)The SciPy library, one component of the SciPy stack, providing many numerical routines.

matplotlib 是python最著名的繪相簿,它提供了一整套和matlab相似的命令API,十分適合互動式地進行製圖。而且也可以方便地將它作為繪圖控制元件,嵌入GUI應用程式中。Matplotlib可以配合ipython shell使用,提供不亞於Matlab的繪圖體驗。

matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. matplotlib can be used in python scripts, the python and ipython shell (ala MATLAB®* or Mathematica®†), web application servers, and six graphical user interface toolkits.

第二部分 機器學習、資料探勘相關工具包

You didn’t write that awful page. You’re just trying to get some data out of it. Beautiful Soup is here to help. Since 2004, it’s been saving programmers hours or days of work on quick-turnaround screen scraping projects.


2. pandas: Python Data Analysis Library

Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.


3. scikit-learn: Machine Learning in Python

scikit-learn (formerly scikits.learn) is an open source machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, logistic regression, naive Bayes, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

scikit-learn是一個基於NumPy, SciPy, Matplotlib的開源機器學習工具包,主要涵蓋分類,迴歸和聚類演算法,例如SVM, 邏輯迴歸,樸素貝葉斯,隨機森林,k-means等演算法,程式碼和文件都非常不錯,在許多Python專案中都有應用。例如在我們熟悉的NLTK中,分類器方面就有專門針對scikit-learn的介面,可以呼叫scikit-learn的分類演算法以及訓練資料來訓練分類器模型。

4. nltk:Natural Language Toolkit

NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, and an active discussion forum.


第三部分 其他重要包

1. conda

Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. It works on Linux, OS X and Windows, and was created for Python programs but can package and distribute any software.



2. ipython-notebook

使用一種基於Web技術的互動式計算文件格式。為什麼說它是文件格式,而非計算工具呢?實際上它兩者都是。Notebook 在互動上使用了 C/S 結構,它通過 Tornado 建立一個 shell 伺服器,並使用瀏覽器作為客戶端。另外 notebook 頁面都被儲存為 .ipynb 的類 JSON 檔案格式。這種檔案格式也是 Notebook 最吸引人的地方。IPython Notebook使用瀏覽器作為介面,向後臺的IPython伺服器傳送請求,並顯示結果。在瀏覽器的介面中使用單元(Cell)儲存各種資訊。Cell有多種型別,經常使用的有表示格式化文字的Markdown單元,和表示程式碼的Code單元。

3. spyder


4. pyqt

PyQt是一個建立GUI應用程式的工具包。它是Python程式語言和Qt庫的成功融合。Qt庫是目前最強大的庫之一。 PyQt實現了一個Python模組集。它有超過300類,將近6000個函式和方法。它是一個多平臺的工具包,可以執行在所有主要作業系統上,包括UNIX,Windows和Mac。 PyQt採用雙許可證,開發人員可以選擇GPL和商業許可。在此之前,GPL的版本只能用在Unix上,從PyQt的版本4開始,GPL許可證可用於所有支援的平臺。

5. cpython



