howdoi 簡單分析

阿新 • • 發佈：2017-12-17

目標 pytho logs exists environ div continue nvi with

對howdoi的一個簡單分析。

曾經看到過下面的這樣一段js代碼：

try{
    doSth();
}
catch (e){
    ask_url = "https://stackoverflow.com/search?q="
    window.location.href= ask_url + encodeURIComponent(e)
}

howdoi基本就是把這個流程做成了Python腳本。其基本流程如下：

step1：利用site語法組裝搜索語句(默認指定搜索stackoverflow網站)
step2:利用google搜索接口獲取搜索引擎第一頁排名第一的連接
step3：訪問該鏈接，根據排名從高倒下，提取代碼塊文本

step4：提取到就顯示到終端，沒有提取到就提示未找到答案

當然，howdoi也作了一些其他的工作：

代理設置
既往問題進行緩存，提高下次查詢的速度
查詢的目標網站可配置
做成Python script腳本命令，方便快捷
代碼高亮格式化輸出

更多分析請看代碼註釋：

!/usr/bin/env python

######################################################
#
# howdoi - instant coding answers via the command line
# written by Benjamin Gleitzman ([email protected]) 

# inspired by Rich Jones ([email protected])
#
######################################################

import argparse #用於獲取腳本命令行參數
import glob 
import os
import random
import re
import requests #用於發送http(s)請求
import requests_cache
import sys
from . import __version__
#用於控制臺彩色高亮格式化輸出
from pygments import 
 highlight 
from pygments.lexers import guess_lexer, get_lexer_by_name
from pygments.formatters.terminal import TerminalFormatter
from pygments.util import ClassNotFound
# 用於網頁解析
from pyquery import PyQuery as pq

from requests.exceptions import ConnectionError
from requests.exceptions import SSLError

# 兼容Python2.x和Python3.x的庫
if sys.version < '3':
    import codecs
    from urllib import quote as url_quote
    from urllib import getproxies

    # 處理unicode: http://stackoverflow.com/a/6633040/305414
    def u(x):
        return codecs.unicode_escape_decode(x)[0]
else:
    from urllib.request import getproxies
    from urllib.parse import quote as url_quote

    def u(x):
        return x

#設置google搜索url
if os.getenv('HOWDOI_DISABLE_SSL'):  # 使用系統環境變量中非SSL的http代替https
    SEARCH_URL = 'http://www.google.com/search?q=site:{0}%20{1}'
    VERIFY_SSL_CERTIFICATE = False
else:
    SEARCH_URL = 'https://www.google.com/search?q=site:{0}%20{1}'
    VERIFY_SSL_CERTIFICATE = True
#設置目標問答網站
URL = os.getenv('HOWDOI_URL') or 'stackoverflow.com'

#瀏覽器UA，用於偽造瀏覽器請求，防止網站對腳本請求進行屏蔽
USER_AGENTS = ('Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20100101 Firefox/11.0',
               'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100 101 Firefox/22.0',
               'Mozilla/5.0 (Windows NT 6.1; rv:11.0) Gecko/20100101 Firefox/11.0',
               ('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/536.5 (KHTML, like Gecko) '
                'Chrome/19.0.1084.46 Safari/536.5'),
               ('Mozilla/5.0 (Windows; Windows NT 6.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46'
                'Safari/536.5'), )
#格式化答案輸出
ANSWER_HEADER = u('--- Answer {0} ---\n{1}')
NO_ANSWER_MSG = '< no answer given >'

#設置緩存文件路徑
XDG_CACHE_DIR = os.environ.get('XDG_CACHE_HOME',
                               os.path.join(os.path.expanduser('~'), '.cache'))
CACHE_DIR = os.path.join(XDG_CACHE_DIR, 'howdoi')
CACHE_FILE = os.path.join(CACHE_DIR, 'cache{0}'.format(
    sys.version_info[0] if sys.version_info[0] == 3 else ''))

#獲取代理（在國內China尤其有用，不解釋）
def get_proxies():
    proxies = getproxies()
    filtered_proxies = {}
    for key, value in proxies.items():
        if key.startswith('http'):
            if not value.startswith('http'):
                filtered_proxies[key] = 'http://%s' % value
            else:
                filtered_proxies[key] = value
    return filtered_proxies


def _get_result(url):
    try:
        return requests.get(url, headers={'User-Agent': random.choice(USER_AGENTS)}, proxies=get_proxies(),
                            verify=VERIFY_SSL_CERTIFICATE).text
    except requests.exceptions.SSLError as e:
        print('[ERROR] Encountered an SSL Error. Try using HTTP instead of '
              'HTTPS by setting the environment variable "HOWDOI_DISABLE_SSL".\n')
        raise e

# 獲取google搜索結果中的連接
def _get_links(query):
    result = _get_result(SEARCH_URL.format(URL, url_quote(query)))
    html = pq(result)#用pyquery進行解析
    return [a.attrib['href'] for a in html('.l')] or \
        [a.attrib['href'] for a in html('.r')('a')]


def get_link_at_pos(links, position):
    if not links:
        return False

    if len(links) >= position:
        link = links[position - 1]
    else:
        link = links[-1]
    return link

#代碼格式化輸出函數
def _format_output(code, args):
    if not args['color']:
        return code
    lexer = None

    # try to find a lexer using the StackOverflow tags
    # or the query arguments
    for keyword in args['query'].split() + args['tags']:
        try:
            lexer = get_lexer_by_name(keyword)
            break
        except ClassNotFound:
            pass

    # no lexer found above, use the guesser
    if not lexer:
        try:
            lexer = guess_lexer(code)
        except ClassNotFound:
            return code

    return highlight(code,
                     lexer,
                     TerminalFormatter(bg='dark'))

#利用政策匹配判斷連接是否是問題
def _is_question(link):
    return re.search('questions/\d+/', link)

#獲取問題連接
def _get_questions(links):
    return [link for link in links if _is_question(link)]

#獲取答案（主要是解析stackoverflow的問答頁面）
def _get_answer(args, links):
    links = _get_questions(links)
    link = get_link_at_pos(links, args['pos'])
    if not link:
        return False
    if args.get('link'):
        return link
    page = _get_result(link + '?answertab=votes')
    html = pq(page)

    first_answer = html('.answer').eq(0)#第一個答案
    instructions = first_answer.find('pre') or first_answer.find('code')#pre和code標簽為目標代碼塊
    args['tags'] = [t.text for t in html('.post-tag')]

    if not instructions and not args['all']:
        text = first_answer.find('.post-text').eq(0).text()
    elif args['all']:
        texts = []
        for html_tag in first_answer.items('.post-text > *'):
            current_text = html_tag.text()
            if current_text:
                if html_tag[0].tag in ['pre', 'code']:
                    texts.append(_format_output(current_text, args))
                else:
                    texts.append(current_text)
        texts.append('\n---\nAnswer from {0}'.format(link))
        text = '\n'.join(texts)
    else:
        text = _format_output(instructions.eq(0).text(), args)
    if text is None:
        text = NO_ANSWER_MSG
    text = text.strip()
    return text


def _get_instructions(args):
    links = _get_links(args['query'])

    if not links:
        return False
    answers = []
    append_header = args['num_answers'] > 1
    initial_position = args['pos']
    for answer_number in range(args['num_answers']):
        current_position = answer_number + initial_position
        args['pos'] = current_position
        answer = _get_answer(args, links)
        if not answer:
            continue
        if append_header:
            answer = ANSWER_HEADER.format(current_position, answer)
        answer += '\n'
        answers.append(answer)
    return '\n'.join(answers)

#啟動緩存
def _enable_cache():
    if not os.path.exists(CACHE_DIR):
        os.makedirs(CACHE_DIR)
    requests_cache.install_cache(CACHE_FILE)

#清除緩存
def _clear_cache():
    for cache in glob.glob('{0}*'.format(CACHE_FILE)):
        os.remove(cache)

# 腳本主函數
def howdoi(args):
    #構造查詢（主要是把問號刪除）
    args['query'] = ' '.join(args['query']).replace('?', '')
    try:
        return _get_instructions(args) or 'Sorry, couldn\'t find any help with that topic\n'
    except (ConnectionError, SSLError):
        return 'Failed to establish network connection\n'

#獲取用戶輸入的命令行參數
def get_parser():
    parser = argparse.ArgumentParser(description='instant coding answers via the command line')
    parser.add_argument('query', metavar='QUERY', type=str, nargs='*',
                        help='the question to answer')
    parser.add_argument('-p', '--pos', help='select answer in specified position (default: 1)', default=1, type=int)
    parser.add_argument('-a', '--all', help='display the full text of the answer',
                        action='store_true')
    parser.add_argument('-l', '--link', help='display only the answer link',
                        action='store_true')
    parser.add_argument('-c', '--color', help='enable colorized output',
                        action='store_true')
    parser.add_argument('-n', '--num-answers', help='number of answers to return', default=1, type=int)
    parser.add_argument('-C', '--clear-cache', help='clear the cache',
                        action='store_true')
    parser.add_argument('-v', '--version', help='displays the current version of howdoi',
                        action='store_true')
    return parser

#啟動函數
def command_line_runner():
    parser = get_parser()
    args = vars(parser.parse_args())

    # 輸出腳本版本
    if args['version']:
        print(__version__)
        return
    # 清除緩存
    if args['clear_cache']:
        _clear_cache()
        print('Cache cleared successfully')
        return
    # 如果沒有query，就輸出幫助信息
    if not args['query']:
        parser.print_help()
        return

    # 如果環境變量設置了禁止緩存，就清除緩存
    if not os.getenv('HOWDOI_DISABLE_CACHE'):
        _enable_cache()
    # 彩色輸出
    if os.getenv('HOWDOI_COLORIZE'):
        args['color'] = True
    # 如果用戶Python版本小於3就進行utf-8編碼，如否，就正常啟動
    if sys.version < '3':
        print(howdoi(args).encode('utf-8', 'ignore'))
    else:
        print(howdoi(args))


if __name__ == '__main__':
    command_line_runner()

howdoi 簡單分析

目標 pytho logs exists environ div continue nvi with 對howdoi的一個簡單分析。曾經看到過下面的這樣一段js代碼： try{ doSth(); } catch (e){ ask_url = "h

海量分頁的簡單分析

elk mic als emc usb won tm4 pop iap 此文僅個人理解，不到之處望指出提出：easyui的datagrid組件有海量分頁的內容，通過查詢數據庫的所有數據在表格進行分頁瀏覽，因為數據量多，也叫海量分頁, 網

more-less-cat-tail-head 命令簡單分析

show 多個 nbsp 空行信息單詞同時 npr tab 區別：cat一次性把文件內容全部顯示出來，管你看不看得清，顯示完了cat命令就返回了，不能進行交互式操作，適合察看內容短小、不超過一屏的文件；more比cat強大一點，支持分頁顯示，你可以ctrl+B

java.util.ComparableTimSort中的sort()方法簡單分析

ray urn popu assert 起源排序算法 gac ont nts TimSort算法是一種起源於歸並排序和插入排序的混合排序算法，設計初衷是為了在真實世界中的各種數據中能夠有較好的性能。該算法最初是由Tim Peters於2002年在Python語言中提出

String源碼j簡單分析

other -1 bytes pub point bool unicode編碼方法 matches 分析： 1、 private final char value[]; String內部由這個char數組維護String的字符。首先String類用final修飾，不

Spring IOC 源碼簡單分析 01 - BeanFactory

ebean spl getbean mes ssp let scope class ons ### 準備 ## 目標了解 Spring IOC 的基礎流程 ## 相關資源 Offical Doc：http://docs.spring.io/sp

Spring IOC 源碼簡單分析 04 - bean的初始化

-s 示例 classpath 設置 isp 開始 follow spl abs ### 準備 ## 目標了解 Spring 如何初始化 bean 實例 ##測試代碼 gordon.study.spring.ioc.IOC04_I

下拉刷新XListView的簡單分析

widget touch radi lee show progress 箭頭 warning 理解依照這篇博文裏的思路分析和理解的先要理解Scroller，看過的博文： http://ipjmc.iteye.com/blog/1615828 http:

KMP算法實踐與簡單分析

out 減少 ase exce 能夠數組 string [] hab 一、理解next數組 1、約定next[0]=-1,同時可以假想在sub串的最前面有一個通配符“*”，能夠任意匹配。對應實際的代碼t<0時的處理情況。 2、next[j]可以有如下的幾種理解思路：

python類淺析（包含語法簡單分析）

劃線 cnblogs 簡單的構造函數 python類 col else 什麽 == 學習了一周python。通過一個簡單的類的繼承的例子，分析python中的一些語法。 1 class Animal: 2 age = 1 3 name = ‘luo

第2次作業：網易雲音樂軟件簡單分析

部分同時存在其他 arc family inux 在線搜索 www. 1介紹產品相關信息（1）你選擇的產品是？網易雲音樂（2）為什麽選擇該產品作為分析？網易雲音樂在我下載使用兩天之後便能夠讓我毅然決然地卸載掉了使用十多年的酷狗音樂（3）該產品是怎麽誕生的（在

木馬APP的簡單分析(Android Killer分析)

com smtp pow super text lsm integer ref rar 本文作者：三星s7edge 一.此貼目的：分析一個木馬APP樣本的行為。—————————————————————————————————————————————————-二.分析步驟及

java 中 “文件” 和 “流” 的簡單分析

amp 簡單 orm cto abs perm ext rem png java 中 FIle 和流的簡單分析 File類簡單File 常用方法創建一個File 對象，檢驗文件是否存在，若不存在就創建，然後對File的類的這部分操作進行演示，如文件的名稱、大小等 //

ZooKeeper日誌與快照文件簡單分析

real 很難 nsa 記錄要求代碼塊最新詳細分析源碼分析有用過Zookeeper的都知道zoo.cfg配置文件中有dataDir配置項用於存儲數據，不過可能有些人不太清楚這個目錄具體存儲的是那些數據，默認情況下這個目錄是用於存儲Log（事務日誌）與Snapsh

淩搟手作簡單分析做一個面包店需要多少費用？

而且一個地址 bsp 宋體會有 load 一是 ont 烘焙熱，越來越多的人投身面包蛋糕加盟行業中，但相信很多小夥伴都有很大的困惑，尤其在初次創業階段，相信大夥們內心都會有當下的問題。面包店加盟大概需要多少錢?開面包店投資分析.近年來面包在我國迅速發展起來，而面包店現

elasticsearch 2.3.5 源碼簡單分析

elasticsearch 源碼TransportClient，它用來初始化與elasticsearch集群的鏈接，並調用 transportService.start(); 來啟動服務器，與elasticsearch建立通訊。其中modules.add(new ActionModule(true)); 用

FFmpeg源代碼簡單分析：常見結構體的初始化和銷毀（AVFormatContext，AVFrame等）

new init _array border 代碼 alloc ecc .com VC 結構體初始化銷毀 AVFormatContext avformat_alloc_context() avfo

策略模式---------簡單分析

cls img 不同簡單的方法 @override cat raw ima 　　繼續我的設計模式之旅，這次學習的是策略模式，自己感覺策略模式跟簡單工廠模式好像，引用大話設計模式裏面的定義，策略模式是一種定義一系列算法的方法，從概念上看，所有這些算法完成的都是相同的工作，

OD 實驗 - 對 PE 結構的簡單分析

圖片 nds 兩個 add 復制 bubuko ctr 雙擊尺寸載入程序，按 Alt+M 查看內存空間雙擊進入程序的 PE 頭這些為 DOS 環境下才會運行的這個執行 PE 的地址，PE 結構的偏移地址為 C0 找到這個地址以 PE 開

Java簡單實驗--關於課後提到的java重載函數的簡單分析

-- bsp png ron inf 定義方法返回值 http 分享根據這一小段代碼，獲得了以下的測試截圖：簡單分析：根據輸出結果，判斷這段代碼用到了兩個不同的函數方法，輸出的不止有double類型的數，還有整型的數。又根據類中的定義情況，square是根據

howdoi 簡單分析

相關推薦