python獲取頁面所有a標籤下href的值

阿新 • • 發佈：2018-12-19

參考下面的連結中的內容：

# -*- coding:utf-8 -*-
#python 2.7
#http://tieba.baidu.com/p/2460150866
#標籤操作


from bs4 import BeautifulSoup
import urllib.request
import re


#如果是網址，可以用這個辦法來讀取網頁
#html_doc = "http://tieba.baidu.com/p/2460150866"
#req = urllib.request.Request(html_doc)  
#webpage = urllib.request.urlopen(req)  
#html = webpage.read()



html="""
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="xiaodeng"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
<a href="http://example.com/lacie" class="sister" id="xiaodeng">Lacie</a>
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
soup = BeautifulSoup(html, 'html.parser')   #文件物件


#查詢a標籤,只會查找出一個a標籤
#print(soup.a)#<a class="sister" href="http://example.com/elsie" id="xiaodeng"><!-- Elsie --></a>

for k in soup.find_all('a'):
    print(k)
    print(k['class'])#查a標籤的class屬性
    print(k['id'])#查a標籤的id值
    print(k['href'])#查a標籤的href值
    print(k.string)#查a標籤的string

    如果，標籤<a>中含有其他標籤，比如<em>..</em>，此時要提取<a>中的資料，需要用k.get_text()

soup = BeautifulSoup(html, 'html.parser') #文件物件 #查詢a標籤,只會查找出一個a標籤

for k in soup.find_all('a'): print(k) print(k['class'])#查a標籤的class屬性 print(k['id'])#查a標籤的id值 print(k['href'])#查a標籤的href值 print(k.string)#查a標籤的string

如果，標籤<a>中含有其他標籤，比如<em>..</em>，此時要提取<a>中的資料，需要用k.get_text()

通常我們使用下面這種模式也是能夠處理的，下面的方法使用了get()。

 html = urlopen(url)
 soup = BeautifulSoup(html, 'html.parser')
 t1 = soup.find_all('a')
 print t1
 href_list = []
 for t2 in t1:
    t3 = t2.get('href')
    href_list.append(t3)

python獲取頁面所有a標籤下href的值

參考下面的連結中的內容： # -*- coding:utf-8 -*- #python 2.7 #http://tieba.baidu.com/p/2460150866 #標籤操作 from

PHP抓取頁面中a標籤的href屬性值以及a中間內容

$str = file_get_contents($zh_cn_url); $reg1='/<a href=\"(.*?)\".*?>(.*?)<\/a>/i';//匹配所有A標籤 preg_match_all($reg1,$str,$aarray); //這個$a

提取HTML中所有a標籤的href連結

/** * 提取html中a標籤的href * @param strs * @return */ public List<String> getAHref(String s

正則匹配頁面所有A標籤或Img標籤

先貼一個匹配正則的方法供大家參考： public static ArrayList CutStr(string sStr, string Patrn) { ArrayList al = new ArrayList();

修改div內A標籤的href值

今天用到個API，他的logo點選了會跳轉到其他網頁，影響用法體驗，不顯示他的logo又不好，於是想到將其跳轉的連結href清空，找到多種思路，整理出可用的一種收藏： var a=document.querySelector(".myclass").getEleme

抓取網頁資料 A標籤的HREF 值

在工作中，我們有時候需要從特定的網頁中抓取我們想要的資料，由於工作的需要，我給大家推薦一個專門的抓取類：Winista.HtmlParser.dll 當我們需要從有規律的網頁中提取資料時，如table tr td; ul li之類的，如果用正則表示式，或者做字串的處理，會非常

python 根據a標籤查詢href的值

# !/usr/bin/env python # -*- coding: utf-8 -*- import urllib2 import urllib import cookielib import json import httplib import re import r

IE和Firefox下，a標籤使用href=”javascript:void(0);”和target=”_blank”行為不符合預期

今天在開發中發現，使用如下方式的連結。在Chrome中點選後行為符合預期，但在IE下會新開標籤卡（根據參考資料，Firefox中有相同問題）。 <a href=”javascript:void

獲取統計頁面所有a連結的點選次數(JS初學者)

$(function() { $("#wrap a").click( //獲取某個層下面的所有連結 function() { var href = this.getAttribute("href"); //獲取

JS獲取頁面所有元素並統計每個標籤的個數

python遍歷所有盤符下的圖片並拷貝下來

path 腳本 div import ict Coding color 不同類 getc 最近在學習python，閑著無聊就試著寫啦這個小的腳本，雖然有很多不足，但是還是收獲不少。該腳本的功能： ①遍歷本地計算機中的所有盤符，並將名稱記錄下來； ②循環遍歷盤符下的所有圖片

vue中a標籤的href屬性的寫法

問題：vue.js a標籤href裡有變數，函式拼接問題 <a :href ="'http://search.jd.com/Search?enc=utf-8&keyword='+keyWord+'&page='+Math.ceil(record.skuRank/60

Vue 中 a標籤上href無法跳轉

問題：使用vue-router 在IE下 a標籤裡的路由不跳轉，火狐，chrome工作正常。解決：在App.vue 裡增加判斷IE瀏覽器手動修復…… export default { na

使用jquery通過this獲取a標籤的文字值，很多坑

頁面一個a標籤，在js檔案中通過jquery給它加上onclick事件後怎麼獲取它的文字？正解：$(this).text() 一開始我加班試了n多次，試過this.html(),this.text()，都不行，網上有說是this代表的不是a標籤，可能是它的父標籤或window物件，但我死馬

a標籤使用href=”javascript:void(0); 在火狐瀏覽器跟chrome 不相容

今天在開發中發現，使用如下方式的連結。在Chrome中點選後行為符合預期，但在IE下會新開標籤卡（根據參考資料，Firefox中有相同問題）。 <a href=”javascript:void(0);” target=”_blank”>test</a&g

HTML提取所有div標籤下的所有及下子標籤的內容

示例程式碼如下： <div> <p>123154872313</p> <p>test <em>http://baidu.com</em> </p> </div> p標籤下的內容一般是網頁文字內容，

a標籤中href呼叫js的幾種方法

我們常用的在a標籤中有點選事件： 1. a href=”javascript:js_method();” 這是我們平臺上常用的方法，但是這種方法在傳遞this等引數的時候很容易出問題，而且javascript:協議作為a的href屬性的時候不僅會導致不必要的觸發window.onbeforeun

a標籤的href和onclick屬性同時存在點選事件先觸發

onclick的事件被先執行，其次是href中定義的（頁面跳轉或者javascript）同時存在兩個定義的時候（onclick與href都定義了），如果想阻止href的動作，在onclick必須加上return false; 一般是這樣寫onclick="xxx();retu

a標籤中href=""的用法詳解

眾所周知，a標籤的最重要功能是實現超連結和錨點。而且，大多數人認為a標籤最重要的作用是實現超連結，今天我剛好碰到a標籤的一種寫法<a href="JavaScript:;">

使用Jquery獲取頁面中只有name下的第二個子元素

今天面試問到這個問題，一時緊張忘記怎麼回答了。做個標記，勿忘。 <pre name="code" class="html"><div name="div1"> <div>a</div> <div>b</d

python獲取頁面所有a標籤下href的值

相關推薦