曼城新聞情報站(二)Django框架的爬蟲
阿新 • • 發佈:2018-09-27
aligned html rst 但是 字符串 script 後臺 圖表 ted
前面寫了曼城新聞的爬蟲腳本,現在把它放到Django框架中來。直接把py文件copy到Django目錄下,然後在view裏import一下調用就好了。後面想要定時爬新聞也不難。
之前的爬蟲腳本沒有import lxml也沒有報錯,但是放到Django後不import一下會報錯。
關於Django的創建以及url,static配置就不細說了。由於Mongodb不在Django默認支持範圍中,所以setting裏要追加這兩行
from mongoengine import connect connect(db=‘Man_City‘, host=‘localhost‘, port=27017)
註意看我的Mongodb目錄,db不要填錯了
model裏建3個類
from django.db import models from mongoengine import * # Create your models here. class n_163(Document): title = StringField(max_length=100) url = StringField(max_length=64) pub_date = StringField(max_length=32) meta={ ‘collection‘: ‘163‘ } class sina(Document): title= StringField(max_length=100) url = StringField(max_length=64) pub_date = StringField(max_length=32) meta={‘collection‘:‘sina‘} class qq(Document): title = StringField(max_length=100) url = StringField(max_length=64) pub_date = StringField(max_length=32) meta={‘collection‘:‘qq‘}
view裏面調用數據庫
from django.shortcuts import render, HttpResponse from django.core.paginator import Paginator,Page from cityzen.models import n_163, sina, qq from cityzen.Manchester_City import Man_City M = Man_City() # Create your views here. def get_db(request): M.main() #按日期只顯示60條新聞 db_sina = sina.objects.order_by(‘-pub_date‘)[:60] db_163 = n_163.objects.order_by(‘-pub_date‘)[:60] db_qq = qq.objects.order_by(‘-pub_date‘)[:60] news_sina = get_page(request, db_sina) news_163 = get_page(request, db_163) news_qq = get_page(request, db_qq) return render(request, ‘city.html‘, {‘news_sina‘:news_sina, ‘news_163‘:news_163, ‘news_qq‘:news_qq}) def get_page(request, soup_db): #Django自帶的翻頁功能,每頁顯示10條新聞 page = request.GET.get(‘page‘, 1) paginator = Paginator(soup_db, 10) page_loaded = paginator.page(page) return page_loaded
後臺的功能基本完成了,前端我用的是Semantic UI,把需要用到的3個文件放到tempalates目錄下,就像用jquery那樣調用就好了,官網上有很多實用的模版,抄過來改改就能用。
city.htm
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Title</title> <script src="/static/Semantic-UI-CSS-master/jquery-1.11.3.min.js"></script> <link rel="stylesheet" href="/static/Semantic-UI-CSS-master/semantic.css"> <script src="/static/Semantic-UI-CSS-master/semantic.js"></script> </head> <body> <div style="background: url(/static/man_bg.jpg) no-repeat"> <h1 class="ui header center aligned">曼城情報站</h1> <div class="ui top attached tabular menu"> <a class="active item" data-tab="first">新聞</a> <a class="item" data-tab="second">賽後數據</a> <a class="item" data-tab="third">排名</a> </div> <div class="ui bottom attached active tab segment" data-tab="first"> {# 新聞#} <div class="ui equal width grid"> <div class="column"> <div class="ui segment" data-tab="news_first"><img src="/static/sina_sport.PNG" style="margin: auto" class="small"></div> </div> <div class="column"> <div class="ui segment" data-tab="news_second" id="163"><img src="/static/163_sport.PNG" style="margin: auto" class="small"></div> </div> <div class="column"> <div class="ui segment" data-tab="news_third" id="qq"><img src="/static/qq.PNG" style="margin: auto" class="small"></div> </div> </div> <div class="ui bottom attached active tab segment" data-tab="news_first" > {# 新浪新聞內容#} <ul> {% for item in news_sina %} <span><a href="{{ item.url }}">{{ item.title }} {{ item.pub_date }}</a></span> {# semantic ui分割條#} <div class="ui horizontal divider"></div> {% endfor %} </ul> {# 翻頁div#} <div class="pagination"> {% if news_sina.has_previous %} <a href="?page={{ news_sina.previous_page_number }}">pre</a> {% endif %} <span>{{ news_sina.number }} of {{ news_sina.paginator.num_pages }}</span> {% if news_sina.has_next %} <a href="?page={{ news_sina.next_page_number }}">Next</a> {% endif %} </div> </div> <div class="ui bottom attached tab segment" data-tab="news_second"> {# 網易新聞內容#} <ul> {% for item in news_163 %} <span><a href="{{ item.url }}">{{ item.title }} {{ item.pub_date }}</a></span> <div class="ui horizontal divider"></div> {% endfor %} </ul> {# 翻頁div#} <div class="pagination"> {% if news_163.has_previous %} <a href="?page={{ news_163.previous_page_number }}">pre</a> {% endif %} <span>{{ news_163.number }} of {{ news_163.paginator.num_pages }}</span> {% if news_163.has_next %} <a href="?page={{ news_163.next_page_number }}">Next</a> {% endif %} </div> </div> <div class="ui bottom attached tab segment" data-tab="news_third"> {# 騰訊新聞內容 #} <ul> {% for item in news_qq %} <span><a href="{{ item.url }}">{{ item.title }} {{ item.pub_date }}</a></span> <div class="ui horizontal divider"></div> {% endfor %} </ul> {# 翻頁div#} <div class="pagination"> {% if news_qq.has_previous %} <a href="?page={{ news_qq.previous_page_number }}">pre</a> {% endif %} <span>{{ news_qq.number }} of {{ news_qq.paginator.num_pages }}</span> {% if news_qq.has_next %} <a href="?page={{ news_qq.next_page_number }}">Next</a> {% endif %} </div> </div> </div> <div class="ui bottom attached tab segment" data-tab="second">賽後數據內容</div> <div class="ui bottom attached tab segment" data-tab="third">排名內容</div> <h4 class="ui horizontal inverted divider" style="color: black">Design by John </h4> </div> <script> $(‘.menu .item‘).tab(); $(‘.column .segment‘).tab() $(‘#163‘).click(function () { }) </script> </body> </html>
初步完成的成果如下。
總結:
用Semantic UI的確比自己寫前端代碼快且美觀多了,不過只是用到了其中小部分內容,還有很多模塊需要熟悉。對於爬取的數據沒怎麽處理,只是對日期做了數據清洗,把字符串格式的日期轉化為可排序的日期格式排列,由於是存入數據前就做了這些處理,所以後面調取使用就方便多了。做分頁的時候是直接用的Django自帶的分頁功能,page會直接顯示在網頁上面,然後在三個新聞站點上切換的時候頁碼也會隨便變化,這讓人不爽,所以後面會改用ajax翻頁。還有曼城賽後數據分析打算做成圖表的形式,再加入英超排行榜就差不多完成一個雛形了。
曼城新聞情報站(二)Django框架的爬蟲