Qt 簡單爬蟲開發
阿新 • • 發佈:2019-01-04
1. 為什麼使用Qt寫爬蟲?
老實說爬蟲非常關鍵是效率,所以說用qt來寫不是一個好的選擇。。。。但是我的需求比較輕量級,就用qt了,qt跨平臺,UI不錯,然後連線資料庫方便,所以用來搞也不是一個壞選擇。
2.爬蟲主要的內容
基本爬蟲就是請求地址,然後用正則表示式對結果進行處理,然後存到資料庫中。大概就三步。這裡只涉及到用get方式來獲取。有一些資料還需要post。還有一些需要登入後,涉及到cookie, session什麼的就沒研究過了.還有多執行緒進行請求等等。
3.請求地址
void MainWindow::on_btnStartGet_clicked() { QNetworkAccessManager *manager = new QNetworkAccessManager(this); connect(manager,SIGNAL(finished(QNetworkReply*)),this, SLOT(query(QNetworkReply*))); //manager->get(QNetworkRequest(QUrl(stockSource))); QNetworkRequest request(QUrl("http://www.baidu.com")); request.setHeader(QNetworkRequest::ContentTypeHeader, "application/x-www-form-urlencoded"); //QByteArray postData; //postData.append(" 5d|false|BIDU"); // QUrlQuery postData; // postData.addQueryItem("", "5d|false|BIDU"); // manager->post(request, postData.toString(QUrl::FullyEncoded).toUtf8()); // manager->post(request, postData); manager->get(request); }
對結果進行正則表示式進行處理,然後存到資料庫中,僅供參考。直接複製無法編譯通過。
void MainWindow::query(QNetworkReply* reply){ const int eachCount = 200; vector<StockHistory> websiteHistorys(eachCount); QString input = reply->readAll(); qDebug() << input; qDebug() << "request finish!"; QRegularExpression dateRegex("(?:<td class=\"yfnc_tabledata1\" nowrap align=\"right\">)(.*?)(?:</td>)"); QRegularExpressionMatchIterator dateItr = dateRegex.globalMatch(input); int dateCount = 0; QDate databaseMaxDate = STUtility::getMaxHistoryDate(ui->editStockId->text()); //QDate databaseMaxDate = QDate(2015,1,1); //save date while (dateItr.hasNext()) { if(dateCount == eachCount){ break; } QRegularExpressionMatch match = dateItr.next(); if (match.hasMatch()) { qDebug() << "date:" << match.captured(1); QString dateString = match.captured(1); QDate currentDate = QDate::fromString(STUtility::getValideDate(dateString),"MM dd, yyyy"); if(currentDate > databaseMaxDate){ websiteHistorys[dateCount] = StockHistory(); websiteHistorys[dateCount].setDate(currentDate); ++dateCount; }else{ break; } } } websiteHistorys.resize(dateCount); //save price QRegularExpression priceRegex("(?:<td class=\"yfnc_tabledata1\" align=\"right\">)(.*?)(?:</td>)"); QRegularExpressionMatchIterator priceItr = priceRegex.globalMatch(input); int priceTypeIndex = 0; int priceIndex = 0; while (priceItr.hasNext()) { if(priceIndex == dateCount){ break; } QRegularExpressionMatch match = priceItr.next(); if (match.hasMatch()) { qDebug() << "price:" << match.captured(1); if(priceTypeIndex == 0){ float price = match.captured(1).toFloat(); websiteHistorys[priceIndex].setOpen(price); }else if(priceTypeIndex == 1){ float price = match.captured(1).toFloat(); websiteHistorys[priceIndex].setHigh(price); }else if(priceTypeIndex == 2){ float price = match.captured(1).toFloat(); websiteHistorys[priceIndex].setLow(price); }else if(priceTypeIndex == 3){ float price = match.captured(1).toFloat(); websiteHistorys[priceIndex].setClose(price); }else if(priceTypeIndex == 4){ std::string volumeString = match.captured(1).toStdString(); QString qVolumeString = QString(volumeString.c_str()); qVolumeString = qVolumeString.replace(",", ""); websiteHistorys[priceIndex].setVolume(qVolumeString.toInt()); } // else if(priceTypeIndex == 5){ // //do nothing // } ++priceTypeIndex; if(priceTypeIndex == 6){ priceTypeIndex = 0; ++priceIndex; } } } // insert into database for(int i = 0; i < dateCount; ++i){ StockHistory oneHistory = websiteHistorys[i]; QSqlQuery query; QString qQuery = "INSERT INTO stockHistory (stockId,date,close,volume,open,high,low) " "VALUES (\"%1\", \"%2\", \"%3\", \"%4\", \"%5\", \"%6\", \"%7\")"; qQuery = qQuery.arg(ui->editStockId->text()).arg(oneHistory.getDate().toString("yyyy-MM-dd")).arg(oneHistory.getClose()) .arg(oneHistory.getVolume()).arg(oneHistory.getOpen()).arg(oneHistory.getHigh()).arg(oneHistory.getLow()); query.prepare(qQuery); //qDebug() << "query:" << qQuery; bool result = query.exec(); if(!result){ qDebug() << query.lastError().text().toLocal8Bit().data(); }else{ qDebug() << "insert one line success"; } } }
4.該簡單爬蟲應用原始碼下載
點選GetData,能從雅虎財經抓取一段資料,點選StartAnalysis對資料進行分析,結果看debug 輸出。