WebCrawling: YouTube Pagination in Python

阿新 • • 發佈：2018-12-29

A while ago I wrote a blog post about how to scrape videos from YouTube. One question I’ve been asked since is how to navigate between different pages of search results. So here’s how.

YouTube

The pre-amble looks exactly the same:

from bs4 import BeautifulSoup as bs
import requests

base = "https://www.youtube.com/results?search_query="
qstring = "boddingtons+advert"

r = requests.get(base+qstring)

page = r.text
soup=bs(page,'html.parser')

Pagination

Then we need to find the piece of html that corresponds to the page progress buttons. If you print out the “soup”, the section looks like this:

<a aria-label="Go to page 2" class="yt-uix-button vve-check yt-uix-sessionlink yt-uix-button-default yt-uix-button-size-default" data-sessionlink="itct=CAkQnKQBGAciEwjDhY_x4azXAhUUjBUKHXJHBsso9CQ" data-visibility-tracking="CAkQnKQBGAciEwjDhY_x4azXAhUUjBUKHXJHBsso9CQ" href="/results?sp=SBRQFOoDAA%253D%253D&amp;search_query=boddingtons+advert"><span class="yt-uix-button-content">Next »</span></a>

To find it using BeautifulSoup we can simply specify the ‘class’ as a filter:

buttons = soup.findAll('a',attrs={'class':"yt-uix-button vve-check yt-uix-sessionlink yt-uix-button-default yt-uix-button-size-default"})

There are multiple pagination buttons on the page, for pages 2 – 7 and finally “Next >>”. Each one has its own url, you can print these out like this:

for button in buttons:
	print button['href']

The “Next >>” button is normally what you’re looking for and this is helpfully the last one in the list:

nextbutton = buttons[-1]
print nextbutton['href']

We can navigate to it by invoking the requests.get() function once again.

Then for the blog this.

Like this:

Like Loading...

WebCrawling: YouTube Pagination in Python

YouTube

Pagination

Like this:

WebCrawling: YouTube Pagination in Python

Web Scraping YouTube Videos in Python

YouTube Data in Python

[Python] How to unpack and pack collection in Python?

Data manipulation in python (module 3)

Data manipulation in python (module 4)

leetcode-happy number implemented in python

【轉】How to initialize a two-dimensional array in Python?

The bytes/str dichotomy in Python 3

計算機科學-ASCII, Unicode & UTF-8 (in Python)

[PyProj] Think in Python : 軟件工程

pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.

Redis in Python：HyperLogLog(pfadd、pfcount、pfmerge)

[Python] Create a minimal website in Python using the Flask Microframework

Aspen Plus Automation in Python

str() vs repr() in Python

Mutable and Immutable Variables in Python

Redis in python, how do you close the connection?

【演算法 in python】匹配括號

【演算法 in python | DP】斐波那契數列vs卡塔蘭數列

WebCrawling: YouTube Pagination in Python

YouTube

Pagination

Like this:

相關推薦