數據挖掘的數據集資源 --轉載
來自互聯網:
1、氣候監測數據集 http://cdiac.ornl.gov/ftp/ndp026b
2、幾個實用的測試數據集下載的網站
http://www.cs.toronto.edu/~roweis/data.html
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
在下面的網址可以找到reuters數據集http://www.research.att.com/~lewis/reuters21578.html
以下網址上有各種數據集:
http://kdd.ics.uci.edu/summary.data.type.html
進行文本分類,還有一個數據集是可以用的,即rainbow的數據集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
3、找了很多測試數據集,寫論文的同誌們肯定需要的,至少能用來檢驗算法的效果
可能有一些不能訪問,但是總有能訪問的吧:
UCI收集的機器學習數據集
ftp://pami.sjtu.edu.cn/
http://www.ics.uci.edu/~mlearn//MLRepository.htm
statlib
http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm
http://lib.stat.cmu.edu/
樣本數據庫
http://kdd.ics.uci.edu/
http://www.ics.uci.edu/~mlearn/MLRepository.html
關於基金的數據挖掘的網站
http://www.gotofund.com/index.asp
http://lans.ece.utexas.edu/~strehl/
reuters數據集
各種數據集:
http://kdd.ics.uci.edu/summary.data.type.html
http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html
http://lib.stat.cmu.edu/datasets/
http://dctc.sjtu.edu.cn/adaptive/datasets/
http://fimi.cs.helsinki.fi/data/
http://www.almaden.ibm.com/software/quest/Resources/index.shtml
http://miles.cnuce.cnr.it/~palmeri/datam/DCI/
進行文本分類&WEB
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
http://www.w3.org/TR/WD-logfile-960221.html
http://www.w3.org/Daemon/User/Config/Logging.html#AccessLog
http://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.web-caching.com/traces-logs.html
http://www-2.cs.cmu.edu/webkb
http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf
http://www.cs.cornell.edu/projects/kddcup/index.html
時間序列數據的網址
http://www.stat.wisc.edu/~reinsel/bjr-data/
apriori算法的測試數據
http://www.almaden.ibm.com/cs/quest/syndata.html
數據生成器的鏈接
http://www.cse.cuhk.edu.hk/~kdd/data_collection.html
http://www.almaden.ibm.com/cs/quest/syndata.html
關聯:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData
WEKA:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
1。A jarfile containing 37 classification problems, originally obtained from the UCI repository
http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
2。A jarfile containing 37 regression problems, obtained from various sources
http://prdownloads.sourceforge.net/weka/datasets-numeric.jar
3。A jarfile containing 30 regression datasets collected by Luis Torgo
http://prdownloads.sourceforge.net/weka/regression-datasets.jar
癌癥基因:
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi
金融數據:
http://lisp.vse.cz/pkdd99/Challenge/chall.htm
另一個人提供的
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的網址可以找到reuters數據集
http://www.research.att.com/~lewis/reuters21578.html
以下網址上有各種數據集:
http://kdd.ics.uci.edu/summary.data.type.html
進行文本分類,還有一個數據集是可以用的,即rainbow的數據集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
Download the Financial Data (~17.5M zipped file, ~67M unzipped data)
Download the Medical Data (~2M zipped file, ~6M unzipped data)
http://lisp.vse.cz/pkdd99/Challenge/chall.htm
kdnuggets 相關鏈接數據集:
http://www.kdnuggets.com/datasets/index.html
還有另外一個很好的資源網址為:http://kdd.ics.uci.edu/,裏面包含的數據資源如下(按應用領域劃分):
Direct Marketing
KDD CUP 1998 Data
GIS
Forest CoverType
Indexing
Corel Image Features
Pseudo Periodic Synthetic Time Series
Intrusion Detection
KDD CUP 1999 Data
Process Control
Synthetic Control Chart Time Series
Recommendation Systems
Entree Chicago Recommendation Data
Robots
Pioneer-1 Mobile Robot Data
Robot Execution Failures
Sign Language Recognition
Australian Sign Language Data
High-quality Australian Sign Language Data
Text Categorization
20 Newsgroups Data
Reuters-21578 Text Categorization Collection
NSF Research Awards Abstracts 199 0-2003
World Wide Web
Microsoft Anonymous Web Data
MSNBC Anonymous Web Data
Syskill Webert Web Data
這裏又找到一個,在一個老外的blog上找到的。(兒童節前一天)
http://www.fs.fed.us/fire/fuelman/
數據挖掘的數據集資源 --轉載