gethostbyname(),以及相關的資料處理流程

阿新 • • 發佈：2019-01-30

gethostbyname() -- 用域名或主機名獲取IP地址

#include <netdb.h>

#include <sys/socket.h>

#include <unistd.h>

#include <sys/types.h>

#include <netdb.h>

#include <netinet/in.h>

#include <stdlib.h>

#include <netinet/in.h>

#include <arpa/inet.h>

#include <stdio.h>

struct hostent *gethostbyname(const char *name);

這個函式的傳入值是域名或者主機名，例如"www.google.cn"等等。傳出值，是一個hostent的結構。如果函式呼叫失敗，將返回NULL。

struct hostent

{

char *h_name;

char **h_aliases;

int h_addrtype;

int h_length;

char **h_addr_list;

#define h_addr h_addr_list[0]

};

hostent->h_name

表示的是主機的規範名。例如www.google.com的規範名其實是www.l.google.com。

hostent->h_aliases

表示的是主機的別名.www.google.com就是google他自己的別名。有的時候，有的主機可能有好幾個別名，這些，其實都是為了易於使用者記憶而為自己的網站多取的名字。

hostent->h_addrtype

表示的是主機ip地址的型別，到底是ipv4(AF_INET)，還是pv6(AF_INET6)

hostent->h_length

表示的是主機ip地址的長度

hostent->h_addr_lisst

表示的是主機的ip地址，注意，這個是以網路位元組序儲存的。千萬不要直接用printf帶%s引數來打這個東西，會有問題的哇。所以到真正需要打印出這個IP的話，需要呼叫inet_ntop()。

const char *inet_ntop(int af, const void *src, char *dst, socklen_t cnt) ：

這個函式，是將型別為af的網路地址結構src，轉換成主機序的字串形式，存放在長度為cnt的字串中。返回指向dst的一個指標。如果函式呼叫錯誤，返回值是NULL。

#include <netdb.h>

#include <sys/socket.h>

#include <stdio.h>

int main(int argc, char **argv)

{

char *ptr, **pptr;

struct hostent *hptr;

char str[32];

ptr = argv[1];

if((hptr = gethostbyname(ptr)) == NULL)

{

printf(" gethostbyname error for host:%s\n", ptr);

return 0;

}

printf("official hostname:%s\n",hptr->h_name);

for(pptr = hptr->h_aliases; *pptr != NULL; pptr++)

printf(" alias:%s\n",*pptr);

switch(hptr->h_addrtype)

{

case AF_INET:

case AF_INET6:

pptr=hptr->h_addr_list;

for(; *pptr!=NULL; pptr++)

printf(" address:%s\n",

inet_ntop(hptr->h_addrtype, *pptr, str, sizeof(str)));

printf(" first address: %s\n",

inet_ntop(hptr->h_addrtype, hptr->h_addr, str, sizeof(str)));

break;

default:

printf("unknown address type\n");

break;

}

return 0;

}

編譯執行

-----------------------------

# gcc test.c

# ./a.out www.baidu.com

official hostname:www.a.shifen.com

alias:www.baidu.com

address:121.14.88.11

address:121.14.89.11

first address: 121.14.88.11

注意：

Unix/Linux下的gethostbyname函式常用來向DNS查詢一個域名的IP地址。由於DNS的遞迴查詢，常常會發生gethostbyname函式在查詢一個域名時嚴重超時。而該函式又不能像connect和read等函式那樣通過setsockopt或者select函式那樣設定超時時間，因此常常成為程式的瓶頸。有人提出一種解決辦法是用alarm設定定時訊號，如果超時就用setjmp和longjmp跳過gethostbyname函式（這種方式我沒有試過，不知道具體效果如何）。在多執行緒下面，gethostbyname會一個更嚴重的問題，就是如果有一個執行緒的gethostbyname發生阻塞，其它執行緒都會在gethostbyname處發生阻塞。我在編寫爬蟲時也遇到了這個讓我疑惑很久的問題，所有的爬蟲執行緒都阻塞在gethostbyname處，導致爬蟲速度非常慢。在網上google了很長時間這個問題，也沒有找到解答。今天湊巧在實驗室的googlegroup裡面發現了一本電子書"Mining the Web - Discovering Knowledge from Hypertext Data",其中在講解爬蟲時有下面幾段文字： Many clients for DNS resolution are coded poorly.Most UNIX systems provide an implementation of gethostbyname (the DNS client API—application program interface), which cannot concurrently handle multiple outstanding requests. Therefore, the crawler cannot issue many resolution requests together and poll at a later time for completion of individual requests, which is critical for acceptable performance. Furthermore, if the system-provided client is used, there is no way to distribute load among a number of DNS servers. For all these reasons, many crawlers choose to include their own custom client for DNS name resolution. The Mercator crawler from Compaq System Research Center reduced the time spent in DNS from as high as 87% to a modest 25% by implementing a custom client. The ADNS asynchronous DNS client library is ideal for use in crawlers. In spite of these optimizations, a large-scale crawler will spend a substantial fraction of its network time not waiting for Http data transfer, but for address resolution. For every hostname that has not been resolved before (which happens frequently with crawlers), the local DNS may have to go across many network hops to fill its cache for the first time. To overlap this unavoidable delay with useful work, prefetching can be used. When a page that has just been fetched is parsed, a stream of HREFs is extracted. Right at this time, that is, even before any of the corresponding URLs are fetched, hostnames are extracted from the HREF targets, and DNS resolution requests are made to the caching server. The prefetching client is usually implemented using UDP instead of TCP, and it does not wait for resolution to be completed. The request serves only to fill the DNS cache so that resolution will be fast when the page is actually needed later on. 大意是說unix的gethostbyname無法處理在併發程式下使用，這是先天的缺陷是無法改變的。大型爬蟲往往不會使用gethostbyname，而是實現自己獨立定製的DNS客戶端。這樣可以實現DNS的負載平衡，而且通過非同步解析能夠大大提高DNS解析速度。DNS客戶端往往用UDP實現，可以在爬蟲爬取網頁前提前解析URL的IP。文章中還提到了一個開源的非同步DNS庫adns，主頁是http://www.chiark.greenend.org.uk/~ian/adns/ 從以上可看出，gethostbyname並不適用於多執行緒環境以及其它對DNS解析速度要求較高的程式。

gethostbyname(),以及相關的資料處理流程

gethostbyname(),以及相關的資料處理流程

大資料入門環境搭建整理、大資料入門系列教程合集、大資料生態圈技術整理彙總、大資料常見錯誤合集、大資料的離線和實時資料處理流程分析

資料處理流程總結

Android Gallery3D原始碼學習總結（三）——Cache快取及資料處理流程

基於SpringMVC框架，完成使用者的增，刪，改，查，以及json資料處理

錄音相關的處理流程

CDH5.3.2安裝詳細文件以及相關問題處理

資料處理流程和資料分析方法

Java 中級學習筆記 1 JVM的理解以及新生代GC處理流程和常量池、執行時常量池、字串常量池的理解

netlink監聽網路變化程式碼（轉載）+流程分析（原創+轉載）+資料結構以及相關巨集的解析（原創）

微信小程序image組件開發程序以及相關圖片問題參考資料匯總

解析XML文檔大致流程以及相關方法

Android小知識-剖析Retrofit中的網路請求流程以及相關引數

(原始碼,具體的細節請查閱相關資料)哈弗曼樹的構造以及非遞迴遍歷樹

介面交互鑑權以及資料處理

IOS資料處理及版本特性-退出系統前的事件處理以及首次開啟app的處理

【處理流程01】資料預處理

BFC以及相關外邊距問題處理的小結

FortiGate防火牆對資料包處理流程

深度學習訓練中關於資料處理方式--原始樣本採集以及資料增廣

gethostbyname(),以及相關的資料處理流程

相關推薦