1. 程式人生 > >數據挖掘的數據集資源 --轉載

數據挖掘的數據集資源 --轉載

ear odi inb 關於 cef lib exe int class

來自互聯網:

1、氣候監測數據集 http://cdiac.ornl.gov/ftp/ndp026b

2、幾個實用的測試數據集下載的網站

http://www.cs.toronto.edu/~roweis/data.html
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/

http://www.phys.uni.torun.pl/~duch/software.html
在下面的網址可以找到reuters數據集http://www.research.att.com/~lewis/reuters21578.html

以下網址上有各種數據集:
http://kdd.ics.uci.edu/summary.data.type.html

進行文本分類,還有一個數據集是可以用的,即rainbow的數據集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

3、找了很多測試數據集,寫論文的同誌們肯定需要的,至少能用來檢驗算法的效果
可能有一些不能訪問,但是總有能訪問的吧:

UCI收集的機器學習數據集
ftp://pami.sjtu.edu.cn/
http://www.ics.uci.edu/~mlearn//MLRepository.htm

statlib
http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm
http://lib.stat.cmu.edu/

樣本數據庫
http://kdd.ics.uci.edu/
http://www.ics.uci.edu/~mlearn/MLRepository.html

關於基金的數據挖掘的網站
http://www.gotofund.com/index.asp

http://lans.ece.utexas.edu/~strehl/

reuters數據集

http://www.research.att.com/~lewis/reuters21578.html

各種數據集:
http://kdd.ics.uci.edu/summary.data.type.html
http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html
http://lib.stat.cmu.edu/datasets/
http://dctc.sjtu.edu.cn/adaptive/datasets/
http://fimi.cs.helsinki.fi/data/
http://www.almaden.ibm.com/software/quest/Resources/index.shtml
http://miles.cnuce.cnr.it/~palmeri/datam/DCI/

進行文本分類&WEB
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

http://www.w3.org/TR/WD-logfile-960221.html
http://www.w3.org/Daemon/User/Config/Logging.html#AccessLog
http://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.web-caching.com/traces-logs.html
http://www-2.cs.cmu.edu/webkb
http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf
http://www.cs.cornell.edu/projects/kddcup/index.html


時間序列數據的網址
http://www.stat.wisc.edu/~reinsel/bjr-data/

apriori算法的測試數據
http://www.almaden.ibm.com/cs/quest/syndata.html

數據生成器的鏈接
http://www.cse.cuhk.edu.hk/~kdd/data_collection.html
http://www.almaden.ibm.com/cs/quest/syndata.html


關聯:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData

WEKA:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
1。A jarfile containing 37 classification problems, originally obtained from the UCI repository
http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
2。A jarfile containing 37 regression problems, obtained from various sources
http://prdownloads.sourceforge.net/weka/datasets-numeric.jar
3。A jarfile containing 30 regression datasets collected by Luis Torgo
http://prdownloads.sourceforge.net/weka/regression-datasets.jar

癌癥基因:
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi

金融數據:
http://lisp.vse.cz/pkdd99/Challenge/chall.htm

另一個人提供的
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的網址可以找到reuters數據集
http://www.research.att.com/~lewis/reuters21578.html

以下網址上有各種數據集:
http://kdd.ics.uci.edu/summary.data.type.html

進行文本分類,還有一個數據集是可以用的,即rainbow的數據集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html


Download the Financial Data (~17.5M zipped file, ~67M unzipped data)
Download the Medical Data (~2M zipped file, ~6M unzipped data)
http://lisp.vse.cz/pkdd99/Challenge/chall.htm


kdnuggets 相關鏈接數據集:
http://www.kdnuggets.com/datasets/index.html

還有另外一個很好的資源網址為:http://kdd.ics.uci.edu/,裏面包含的數據資源如下(按應用領域劃分):

Direct Marketing
KDD CUP 1998 Data

GIS
Forest CoverType

Indexing
Corel Image Features
Pseudo Periodic Synthetic Time Series

Intrusion Detection
KDD CUP 1999 Data

Process Control
Synthetic Control Chart Time Series

Recommendation Systems
Entree Chicago Recommendation Data

Robots
Pioneer-1 Mobile Robot Data
Robot Execution Failures

Sign Language Recognition
Australian Sign Language Data
High-quality Australian Sign Language Data

Text Categorization
20 Newsgroups Data
Reuters-21578 Text Categorization Collection
NSF Research Awards Abstracts 199 0-2003

World Wide Web
Microsoft Anonymous Web Data
MSNBC Anonymous Web Data
Syskill Webert Web Data

這裏又找到一個,在一個老外的blog上找到的。(兒童節前一天)
http://www.fs.fed.us/fire/fuelman/

數據挖掘的數據集資源 --轉載