學習Linux-4.12核心網路協議棧(1.7)——網路裝置的初始化(struct net_device)
在linux的網路裝置裡,其中一個最關鍵的結構體應該要算net_device了,它由對應的網路裝置驅動進行建立和初始化,服務於核心網路子系統。
1. struct net_device 註釋分析
struct net_device這個結構體比較大,在瞭解它之前,我們先看一下它的註釋:
1433 /**
1434 * struct net_device - The DEVICE structure.
1435 * Actually, this whole structure is a big mistake. It mixes I/O //這個結構體的設計是一個很大的失誤,它並沒有對IO資料和高級別的資料進行區分,也就是說這個結構
1436 * data with strictly "high-level" data, and it has to know about //體並沒有對資料的來源是普通記憶體還是快取記憶體進行辨別,因此在INET模型裡面,它不得不處理各種
1437 * almost every data structure used in the INET module. //不同的資料型別
1438 *
1439 * @name: This is the first field of the "visible" part of this structure //它代表一個介面的名字,在設備註冊的時候,我們可以指定介面名字,如果沒指定,他會自動申請
1440 * (i.e. as seen by users in the "Space.c" file). It is the name //一個自加1的名字,比如eth0,eth1,eth2...
1441 * of the interface.
1442 *
1443 * @name_hlist: Device name hash chain, please keep it close to name[] //以名字為索引的雜湊表
1444 * @ifalias: SNMP alias // snmp的別名
1445 * @mem_end: Shared memory end //每一個裝置都會分配一塊記憶體區域,start和end指定了這塊區域
1446 * @mem_start: Shared memory start
1447 * @base_addr: Device I/O address //網路硬體裝置的基地址,記憶體管理系統將每一個外部裝置都看作一塊連續的地址,然後將它與記憶體中的一塊地址進行對映,這樣操作記憶體地址就相當於操作這塊網路硬體裝置的地址,而這裡的基地址就是這個網路硬體裝置的起始地址。他會在probe的時候初始化
1448 * @irq: Device IRQ number //該裝置對應的中斷號
1449 *
1450 * @carrier_changes: Stats to monitor carrier on<->off transitions
1451 *
1452 * @state: Generic network queuing layer state, see netdev_state_t //表示裝置的狀態,它很重要
1453 * @dev_list: The global list of network devices //所有net_device物件組成的一個連結串列,可以說系統中所有的網路裝置都可以通過它查到
1454 * @napi_list: List entry used for polling NAPI devices //如果該支援NAPI,會將它掛到這個連結串列上,CPU就可以更快的找到NAPI poll的裝置
1455 * @unreg_list: List entry when we are unregistering the //正在被解除安裝的裝置會加到這個連結串列
1456 * device; see the function unregister_netdev
1457 * @close_list: List entry used when we are closing the device //正在被關閉的裝置會加到這個連結串列
1458 * @ptype_all: Device-specific packet handlers for all protocols //某些特定協議的處理函式會掛接在這裡,但是未必是需要的
1459 * @ptype_specific: Device-specific, protocol-specific packet handlers
1460 *
1461 * @adj_list: Directly linked devices, like slaves for bonding
1462 * @features: Currently active device features //用來標識介面的各種能力和特性
1463 * @hw_features: User-changeable features //一些硬體相關的特性,這些是可以在使用者空間修改的
1464 *
1465 * @wanted_features: User-requested features
1466 * @vlan_features: Mask of features inheritable by VLAN devices //是否支援vlan功能
1467 *
1468 * @hw_enc_features: Mask of features inherited by encapsulating devices //是否支援硬體封裝功能
1469 * This field indicates what encapsulation
1470 * offloads the hardware is capable of doing,
1471 * and drivers will need to set them appropriately.
1472 *
1473 * @mpls_features: Mask of features inheritable by MPLS
1474 *
1475 * @ifindex: interface index //核心指定的索引號,比如第一個,第二個裝置等等
1476 * @group: The group the device belongs to //這個裝置屬於哪個組
1477 *
1478 * @stats: Statistics struct, which was left as a legacy, use //一些介面的資訊,用於提供給舊介面的使用者空間獲取
1479 * rtnl_link_stats64 instead
1480 *
1481 * @rx_dropped: Dropped packets by core network, //被核心丟掉的包,注意不是被driver丟的
1482 * do not use this in drivers
1483 * @tx_dropped: Dropped packets by core network,
1484 * do not use this in drivers
1485 * @rx_nohandler: nohandler dropped packets by core network on
1486 * inactive devices, do not use this in drivers
1487 *
1488 * @wireless_handlers: List of functions to handle Wireless Extensions, //無線子系統的一些介面
1489 * instead of ioctl,
1490 * see <net/iw_handler.h> for details.
1491 * @wireless_data: Instance data managed by the core of wireless extensions
1492 *
1493 * @netdev_ops: Includes several pointers to callbacks, //很重要!操作網路裝置的函式都聚集在這裡了,在網路初始化的時候被初始化,具體支援哪些操作函式,
1494 * if one wants to override the ndo_*() functions //請看這個函式struct net_device_ops()
1495 * @ethtool_ops: Management operations //ethtool的操作介面
1496 * @ndisc_ops: Includes callbacks for different IPv6 neighbour
1497 * discovery handling. Necessary for e.g. 6LoWPAN.
1498 * @header_ops: Includes callbacks for creating,parsing,caching,etc //對L2頭部處理的函式
1499 * of Layer 2 headers.
1500 *
1501 * @flags: Interface flags (a la BSD) //標識介面的狀態,比如UP/down等,可以通過使用者空間修改
1502 * @priv_flags: Like 'flags' but invisible to userspace, //和flags類似,但是使用者空間不能修改
1503 * see if.h for the definitions
1504 * @gflags: Global flags ( kept as legacy ) //全域性標識,和flags配合使用
1505 * @padded: How much padding added by alloc_netdev() //對齊時使用的位元組數,在申請net_device的時候,需要進行對齊,它表示填充的位元組數
1506 * @operstate: RFC2863 operstate
1507 * @link_mode: Mapping policy to operstate
1508 * @if_port: Selectable AUI, TP, ... 目前較少用,對於支援多介質的網路裝置時,用來指定哪種裝置的介面
1509 * @dma: DMA channel //為該裝置分配的DMA通道,如果支援的話,目前來說應該都支援了
1510 * @mtu: Interface MTU value //這個不用說了,一般1500
1511 * @min_mtu: Interface Minimum MTU value
1512 * @max_mtu: Interface Maximum MTU value
1513 * @type: Interface hardware type //介面的硬體型別,目前來說主要都是乙太網
1514 * @hard_header_len: Maximum hardware header length.
1515 * @min_header_len: Minimum hardware header length
1516 *
1517 * @needed_headroom: Extra headroom the hardware may need, but not in all //需要頭部空間嗎
1518 * cases can this be guaranteed
1519 * @needed_tailroom: Extra tailroom the hardware may need, but not in all
1520 * cases can this be guaranteed. Some cases also use
1521 * LL_MAX_HEADER instead to allocate the skb
1522 *
1523 * interface address info:
1524 *
1525 * @perm_addr: Permanent hw address //燒寫在硬體中的地址,初始化的時候讀取到這裡
1526 * @addr_assign_type: Hw address assignment type //硬體地址分配型別,目前來說都是支援使用者空間對硬體地址進行設定了
1527 * @addr_len: Hardware address length //這個不用說了,14B
1528 * @neigh_priv_len: Used in neigh_alloc()
1529 * @dev_id: Used to differentiate devices that share //這個應該很少用了,如果有多個裝置共用一個mac地址,就會有它的作用了,目前見過這樣的產品,雖然mac
1530 * the same link layer address //地址一樣,但是硬體裝置不一樣,工作是沒有問題的
1531 * @dev_port: Used to differentiate devices that share //如果有多個網路介面實現相同的功能就會用到
1532 * the same function
1533 * @addr_list_lock: XXX: need comments on this one
1534 * @uc_promisc: Counter that indicates promiscuous mode //我們知道,如果不是在混雜模式下,網絡卡只會接收發往自己的單播地址, 但是如果同時想接收發往其他
1535 * has been enabled due to the need to listen to //mac的單播地址,就需要新增到這裡讓驅動不要過濾掉
1536 * additional unicast addresses in a device that
1537 * does not implement ndo_set_rx_mode()
1538 * @uc: unicast mac addresses //自己的單播地址
1539 * @mc: multicast mac addresses //自己的廣播地址
1540 * @dev_addrs: list of device hw addresses //現在的裝置可能同時使用多個mac地址,那麼將會保留在這個連結串列裡面
1541 * @queues_kset: Group of all Kobjects in the Tx and RX queues //Tx和Rx鏈的物件
1542 * @promiscuity: Number of times the NIC is told to work in //是否工作在混雜模式
1543 * promiscuous mode; if it becomes 0 the NIC will
1544 * exit promiscuous mode
1545 * @allmulti: Counter, enables or disables allmulticast mode //開啟或關閉allmulti功能,可以通過ifconfig命令設定
1546 *
1547 * @vlan_info: VLAN info //顧名思義
1548 * @dsa_ptr: dsa specific data //下面是各種不同型別包
1549 * @tipc_ptr: TIPC specific data
1550 * @atalk_ptr: AppleTalk link
1552 * @dn_ptr: DECnet specific data
1553 * @ip6_ptr: IPv6 specific data //
1554 * @ax25_ptr: AX.25 specific data
1555 * @ieee80211_ptr: IEEE 802.11 specific data, assign before registering
1556 *
1557 * @dev_addr: Hw address (before bcast, //裝置的mac地址
1558 * because most packets are unicast)
1559 *
1560 * @_rx: Array of RX queues //與發包相關的一些設定
1561 * @num_rx_queues: Number of RX queues
1562 * allocated at register_netdev() time
1563 * @real_num_rx_queues: Number of RX queues currently active in device
1564 *
1565 * @rx_handler: handler for received packets //收包處理函式
1566 * @rx_handler_data: XXX: need comments on this one
1567 * @ingress_queue: XXX: need comments on this one
1568 * @broadcast: hw bcast address //廣播地址
1569 *
1570 * @rx_cpu_rmap: CPU reverse-mapping for RX completion interrupts,
1571 * indexed by RX queue number. Assigned by driver.
1572 * This must only be set if the ndo_rx_flow_steer
1573 * operation is defined
1574 * @index_hlist: Device index hash chain
1575 *
1576 * @_tx: Array of TX queues //與收報相關的以下設定
1577 * @num_tx_queues: Number of TX queues allocated at alloc_netdev_mq() time
1578 * @real_num_tx_queues: Number of TX queues currently active in device
1579 * @qdisc: Root qdisc from userspace point of view
1580 * @tx_queue_len: Max frames per queue allowed
1581 * @tx_global_lock: XXX: need comments on this one
1582 *
1583 * @xps_maps: XXX: need comments on this one
1584 *
1585 * @watchdog_timeo: Represents the timeout that is used by //initial的時候該函式被初始化,網路層確定傳輸已經超時,將會呼叫driver中的tx_timeout處理時間
1586 * the watchdog (see dev_watchdog())
1587 * @watchdog_timer: List of timers
1588 *
1589 * @pcpu_refcnt: Number of references to this device //該裝置被多少個CPU引用
1590 * @todo_list: Delayed register/unregister //下面是和解除安裝相關的一些設定
1591 * @link_watch_list: XXX: need comments on this one
1592 *
1593 * @reg_state: Register/unregister state machine
1594 * @dismantle: Device is going to be freed
1595 * @rtnl_link_state: This enum represents the phases of creating
1596 * a new link
1597 *
1598 * @needs_free_netdev: Should unregister perform free_netdev?
1599 * @priv_destructor: Called from unregister
1600 * @npinfo: XXX: need comments on this one
1601 * @nd_net: Network namespace this network device is inside
1602 *
1603 * @ml_priv: Mid-layer private //統計資訊
1604 * @lstats: Loopback statistics
1605 * @tstats: Tunnel statistics
1606 * @dstats: Dummy statistics
1607 * @vstats: Virtual ethernet statistics
1608 *
1609 * @garp_port: GARP //免費ARP介面
1610 * @mrp_port: MRP //MAR介面
1611 *
1612 * @dev: Class/net/name entry //雖然是網路裝置,它終究是普通裝置,所以它也有普通裝置該有的屬性,也就是struct device結構體裡面的屬性
1613 * @sysfs_groups: Space for optional device, statistics and wireless
1614 * sysfs groups
1615 *
1616 * @sysfs_rx_queue_group: Space for optional per-rx queue attributes
1617 * @rtnl_link_ops: Rtnl_link_ops //netlink介面操作函式
1618 *
1619 * @gso_max_size: Maximum size of generic segmentation offload
1620 * @gso_max_segs: Maximum number of segments that can be passed to the
1621 * NIC for GSO
1622 *
1623 * @dcbnl_ops: Data Center Bridging netlink ops //橋接操作函式
1624 * @num_tc: Number of traffic classes in the net device
1625 * @tc_to_txq: XXX: need comments on this one
1626 * @prio_tc_map: XXX: need comments on this one
1627 *
1628 * @fcoe_ddp_xid: Max exchange id for FCoE LRO by ddp
1629 *
1630 * @priomap: XXX: need comments on this one
1631 * @phydev: Physical device may attach itself
1632 * for hardware timestamping
1633 *
1634 * @qdisc_tx_busylock: lockdep class annotating Qdisc->busylock spinlock
1635 * @qdisc_running_key: lockdep class annotating Qdisc->running seqcount
1636 *
1637 * @proto_down: protocol port state information can be sent to the
1638 * switch driver and used to set the phys state of the
1639 * switch port.
1640 *
1641 * FIXME: cleanup struct net_device such that network protocol info
1642 * moves out.
1643 */
1644
2. struct net_device 結構體
上面這些是對struct net_device的基本介紹,下面將進一步介紹結構體的具體定義,需要說明的是,這個結構體很重要,所以瞭解越詳細越好。
1645 struct net_device {
1646 char name[IFNAMSIZ];
1647 struct hlist_node name_hlist;
1648 char *ifalias;
1649 /*
1650 * I/O specific fields
1651 * FIXME: Merge these and struct ifmap into one
1652 */
1653 unsigned long mem_end;
1654 unsigned long mem_start;
1655 unsigned long base_addr;
1656 int irq;
1657
1658 atomic_t carrier_changes;
1659
1660 /*
1661 * Some hardware also needs these fields (state,dev_list,
1662 * napi_list,unreg_list,close_list) but they are not
1663 * part of the usual set specified in Space.c.
1664 */
1665
1666 unsigned long state;
1667
1668 struct list_head dev_list;
1669 struct list_head napi_list;
1670 struct list_head unreg_list;
1671 struct list_head close_list;
1672 struct list_head ptype_all;
1673 struct list_head ptype_specific;
1674
1675 struct {
1676 struct list_head upper;
1677 struct list_head lower;
1678 } adj_list;
1679
1680 netdev_features_t features;
1681 netdev_features_t hw_features;
1682 netdev_features_t wanted_features;
1683 netdev_features_t vlan_features;
1684 netdev_features_t hw_enc_features;
1685 netdev_features_t mpls_features;
1686 netdev_features_t gso_partial_features;
1687
1688 int ifindex;
1689 int group;
1690
1691 struct net_device_stats stats;
1692
1693 atomic_long_t rx_dropped;
1694 atomic_long_t tx_dropped;
1695 atomic_long_t rx_nohandler;
1696
1697 #ifdef CONFIG_WIRELESS_EXT
1698 const struct iw_handler_def *wireless_handlers;
1699 struct iw_public_data *wireless_data;
1700 #endif
1701 const struct net_device_ops *netdev_ops;
1702 const struct ethtool_ops *ethtool_ops;
1703 #ifdef CONFIG_NET_SWITCHDEV
1704 const struct switchdev_ops *switchdev_ops;
1705 #endif
1706 #ifdef CONFIG_NET_L3_MASTER_DEV
1707 const struct l3mdev_ops *l3mdev_ops;
1708 #endif
1709 #if IS_ENABLED(CONFIG_IPV6)
1710 const struct ndisc_ops *ndisc_ops;
1711 #endif
1712
1713 #ifdef CONFIG_XFRM
1714 const struct xfrmdev_ops *xfrmdev_ops;
1715 #endif
1716
1717 const struct header_ops *header_ops;
1718
1719 unsigned int flags;
1720 unsigned int priv_flags;
1721
1722 unsigned short gflags;
1723 unsigned short padded;
1724
1725 unsigned char operstate;
1726 unsigned char link_mode;
1727
1728 unsigned char if_port;
1729 unsigned char dma;
1730
1731 unsigned int mtu;
1732 unsigned int min_mtu;
1733 unsigned int max_mtu;
1734 unsigned short type;
1735 unsigned short hard_header_len;
1736 unsigned char min_header_len;
1737
1738 unsigned short needed_headroom;
1739 unsigned short needed_tailroom;
1740
1741 /* Interface address info. */
1742 unsigned char perm_addr[MAX_ADDR_LEN];
1743 unsigned char addr_assign_type;
1744 unsigned char addr_len;
1745 unsigned short neigh_priv_len;
1746 unsigned short dev_id;
1747 unsigned short dev_port;
1748 spinlock_t addr_list_lock;
1749 unsigned char name_assign_type;
1750 bool uc_promisc;
1751 struct netdev_hw_addr_list uc;
1752 struct netdev_hw_addr_list mc;
1753 struct netdev_hw_addr_list dev_addrs;
1754
1755 #ifdef CONFIG_SYSFS
1756 struct kset *queues_kset;
1757 #endif
1758 unsigned int promiscuity;
1759 unsigned int allmulti;
1760
1761
1762 /* Protocol-specific pointers */
1763
1764 #if IS_ENABLED(CONFIG_VLAN_8021Q)
1765 struct vlan_info __rcu *vlan_info;
1766 #endif
1767 #if IS_ENABLED(CONFIG_NET_DSA)
1768 struct dsa_switch_tree *dsa_ptr;
1769 #endif
1770 #if IS_ENABLED(CONFIG_TIPC)
1771 struct tipc_bearer __rcu *tipc_ptr;
1772 #endif
1773 void *atalk_ptr;
1774 struct in_device __rcu *ip_ptr;
1775 struct dn_dev __rcu *dn_ptr;
1776 struct inet6_dev __rcu *ip6_ptr;
1777 void *ax25_ptr;
1778 struct wireless_dev *ieee80211_ptr;
1779 struct wpan_dev *ieee802154_ptr;
1780 #if IS_ENABLED(CONFIG_MPLS_ROUTING)
1781 struct mpls_dev __rcu *mpls_ptr;
1782 #endif
1783
1784 /*
1785 * Cache lines mostly used on receive path (including eth_type_trans())
1786 */
1787 /* Interface address info used in eth_type_trans() */
1788 unsigned char *dev_addr;
1789
1790 #ifdef CONFIG_SYSFS
1791 struct netdev_rx_queue *_rx;
1792
1793 unsigned int num_rx_queues;
1794 unsigned int real_num_rx_queues;
1795 #endif
1796
1797 struct bpf_prog __rcu *xdp_prog;
1798 unsigned long gro_flush_timeout;
1799 rx_handler_func_t __rcu *rx_handler;
1800 void __rcu *rx_handler_data;
1801
1802 #ifdef CONFIG_NET_CLS_ACT
1803 struct tcf_proto __rcu *ingress_cl_list;
1804 #endif
1805 struct netdev_queue __rcu *ingress_queue;
1806 #ifdef CONFIG_NETFILTER_INGRESS
1807 struct nf_hook_entry __rcu *nf_hooks_ingress;
1808 #endif
1809
1810 unsigned char broadcast[MAX_ADDR_LEN];
1811 #ifdef CONFIG_RFS_ACCEL
1812 struct cpu_rmap *rx_cpu_rmap;
1813 #endif
1814 struct hlist_node index_hlist;
1815
1816 /*
1817 * Cache lines mostly used on transmit path
1818 */
1819 struct netdev_queue *_tx ____cacheline_aligned_in_smp;
1820 unsigned int num_tx_queues;
1821 unsigned int real_num_tx_queues;
1822 struct Qdisc *qdisc;
1823 #ifdef CONFIG_NET_SCHED
1824 DECLARE_HASHTABLE (qdisc_hash, 4);
1825 #endif
1826 unsigned long tx_queue_len;
1827 spinlock_t tx_global_lock;
1828 int watchdog_timeo;
1829
1830 #ifdef CONFIG_XPS
1831 struct xps_dev_maps __rcu *xps_maps;
1832 #endif
1833 #ifdef CONFIG_NET_CLS_ACT
1834 struct tcf_proto __rcu *egress_cl_list;
1835 #endif
1836
1837 /* These may be needed for future network-power-down code. */
1838 struct timer_list watchdog_timer;
1839
1840 int __percpu *pcpu_refcnt;
1841 struct list_head todo_list;
1842
1843 struct list_head link_watch_list;
1844
1845 enum { NETREG_UNINITIALIZED=0,
1846 NETREG_REGISTERED, /* completed register_netdevice */
1847 NETREG_UNREGISTERING, /* called unregister_netdevice */
1848 NETREG_UNREGISTERED, /* completed unregister todo */
1849 NETREG_RELEASED, /* called free_netdev */
1850 NETREG_DUMMY, /* dummy device for NAPI poll */
1851 } reg_state:8;
1852
1853 bool dismantle;
1854
1855 enum {
1856 RTNL_LINK_INITIALIZED,
1857 RTNL_LINK_INITIALIZING,
1858 } rtnl_link_state:16;
1859
1860 bool needs_free_netdev;
1861 void (*priv_destructor)(struct net_device *dev);
1862
1863 #ifdef CONFIG_NETPOLL
1864 struct netpoll_info __rcu *npinfo;
1865 #endif
1866
1867 possible_net_t nd_net;
1868
1869 /* mid-layer private */
1870 union {
1871 void *ml_priv;
1872 struct pcpu_lstats __percpu *lstats;
1873 struct pcpu_sw_netstats __percpu *tstats;
1874 struct pcpu_dstats __percpu *dstats;
1875 struct pcpu_vstats __percpu *vstats;
1876 };
1877
1878 #if IS_ENABLED(CONFIG_GARP)
1879 struct garp_port __rcu *garp_port;
1880 #endif
1881 #if IS_ENABLED(CONFIG_MRP)
1882 struct mrp_port __rcu *mrp_port;
1883 #endif
1884
1885 struct device dev;
1886 const struct attribute_group *sysfs_groups[4];
1887 const struct attribute_group *sysfs_rx_queue_group;
1888
1889 const struct rtnl_link_ops *rtnl_link_ops;
1890
1891 /* for setting kernel sock attribute on TCP connection setup */
1892 #define GSO_MAX_SIZE 65536
1893 unsigned int gso_max_size;
1894 #define GSO_MAX_SEGS 65535
1895 u16 gso_max_segs;
1896
1897 #ifdef CONFIG_DCB
1898 const struct dcbnl_rtnl_ops *dcbnl_ops;
1899 #endif
1900 u8 num_tc;
1901 struct netdev_tc_txq tc_to_txq[TC_MAX_QUEUE];
1902 u8 prio_tc_map[TC_BITMASK + 1];
1903
1904 #if IS_ENABLED(CONFIG_FCOE)
1905 unsigned int fcoe_ddp_xid;
1906 #endif
1907 #if IS_ENABLED(CONFIG_CGROUP_NET_PRIO)
1908 struct netprio_map __rcu *priomap;
1909 #endif
1910 struct phy_device *phydev;
1911 struct lock_class_key *qdisc_tx_busylock;
1912 struct lock_class_key *qdisc_running_key;
1913 bool proto_down;
1914 };
1915 #define to_net_dev(d) container_of(d, struct net_device, dev)
3. 網路裝置有關的結構組織
net_device結構包含了網路裝置驅動相關的所有資訊,按照資訊的分類又把一些型別的資訊組織到其他結構中,並巢狀在net_device 裡面,比如與ipv4相關的配置巢狀在 in_device結構中,驅動的私有資料則巢狀在struct device中:
網路裝置是通過多條連結串列串連在一起的,具體怎麼串連稍後再講。我們前面看到了,每一個net_device結構體都是由多個成員組成的,然而每個成員也有可能組成那麼自己的連結串列,比如mc_list和ip_ptr,還有priv,雖然這個版本沒有明確的定義priv這個指標,但是從alloc_netdev函式可以知道仍然為它保留著,只要傳進去的sizeof_priv大於0.
下面我們看看其中一個很重要的成員ip_ptr (struct in_device __rcu *ip_ptr)。它是一個頭指標,指向struct in_device物件,那它表示什麼意思呢?我們知道,每一個網路裝置都可以設定IP地址,而且這些引數也可以通過應用層進行修改,這些資訊是每一個介面獨有的,雖然並不是每一個都需要設定這些資訊,但設定的時候,它的存放位置就是在in_ptr指定連結串列裡面。
下面我們來對比一下程式碼和實際輸出:
root:/# ifconfig br-lan
br-lan Link encap:Ethernet HWaddr 0A:02:8E:93:DD:3B
inet addr:192.168.1.129 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::802:8eff:fe93:dd3b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:211672 errors:0 dropped:0 overruns:0 frame:0
TX packets:120803 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:15794642 (15.0 MiB) TX bytes:24446287 (23.3 MiB)
23 struct in_device {
24 struct net_device *dev; //它繞回去指向net_device結構體頭部
25 atomic_t refcnt; //這個物件被引用多少次
26 int dead;
27 struct in_ifaddr *ifa_list; /* IP ifaddr chain */
28 為什麼是連結串列裡面?一個結構體物件不久夠了嗎?事實是一個介面往往不僅可以設定多個mac地址,當然也可以設定多個IP地址,最常見的是IPv4地址和IPv6地址。
29 struct ip_mc_list __rcu *mc_list; /* IP multicast filter chain */
30 struct ip_mc_list __rcu * __rcu *mc_hash;
31
32 int mc_count; /* Number of installed mcasts */
33 spinlock_t mc_tomb_lock;
34 struct ip_mc_list *mc_tomb;
35 unsigned long mr_v1_seen;
36 unsigned long mr_v2_seen;
37 unsigned long mr_maxdelay;
38 unsigned char mr_qrv;
39 unsigned char mr_gq_running;
40 unsigned char mr_ifc_count;
41 struct timer_list mr_gq_timer; /* general query timer */
42 struct timer_list mr_ifc_timer; /* interface change timer */
43
44 struct neigh_parms *arp_parms;
45 struct ipv4_devconf cnf;
46 struct rcu_head rcu_head;
47 };
下面這個圖是關於ip_ptr和priv兩者的記憶體分配關係,這裡需要注意的是,ip_ptr指向的連結串列是記憶體隨機分配空間的,但是priv則不一樣,他的空間是緊緊接在net_device結構體後面的!
1.裝置無關層採用 in_device{}資料結構儲存 IP 地址和鄰居資訊——雖然是間接的
2.網路抽象層採用 net_device{}資料結構儲存裝置的名字、編號、地址等共性
3.裝置特定層的資料則有裝置驅動開發人員自己定義,一般有硬體傳送、接收緩衝區、晶片暫存器的資訊等等。 這片記憶體區一般是緊跟在 net_device{}後面,由驅動程式在建立 net_device{}的時候順帶把這塊記憶體也建立了。當然還是用 priv指標指向,以方便訪問。
雖然說priv指向的私有資料空間是緊接在net_device後面,其實實際上更應該像這樣添加了位元組對齊:
為了更好的理解這一點,我們直接看程式碼:
7851 struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
7852 unsigned char name_assign_type,
7853 void (*setup)(struct net_device *),
7854 unsigned int txqs, unsigned int rxqs)
7855 {
7856 struct net_device *dev;
7857 size_t alloc_size;
7858 struct net_device *p;
7859
.......
7873
7874 alloc_size = sizeof(struct net_device); //這裡獲取到net_device的大小
7875 if (sizeof_priv) { //看一下傳進來的希望申請的私有空間大小是多少
7876 /* ensure 32-byte alignment of private area */
7877 alloc_size = ALIGN(alloc_size, NETDEV_ALIGN); //對齊
7878 alloc_size += sizeof_priv;
7879 }
7880 /* ensure 32-byte alignment of whole construct */
7881 alloc_size += NETDEV_ALIGN - 1; //32-1=31
7882
7883 p = kvzalloc(alloc_size, GFP_KERNEL | __GFP_REPEAT); //這就是net_device和priv一起申請空間的地方
這樣就組成了多個net_device結構:
前面說過,net_device是由多種連結串列串連在一起的,那麼是由哪些連結串列呢?我們來看看:
從圖中可以知道,一共有三個連結串列:
dev_name_head: 基於介面名字的查詢, dev->name,對應的函式是dev_get_by_name()
dev_index_head: 基於介面索引的查詢,dev->ifindex, 對應的函式是dev_get_by_index()
dev_base: 基於其他引數的查詢,比如裝置型別,mac地址和標識等等
在瞭解了net_device後,我們後面講繼續瞭解 裝置驅動模組的載入,裝置的註冊和裝置的啟動