1. 程式人生 > >Linux儲存IO棧(4)-- SCSI子系統之概述

Linux儲存IO棧(4)-- SCSI子系統之概述

概述

Linux SCSI子系統的分層架構:

這裡寫圖片描述

  • 低層:代表與SCSI的物理介面的實際驅動器,例如各個廠商為其特定的主機介面卡(Host Bus Adapter, HBA)開發的驅動,低層驅動主要作用是發現連線到主機介面卡的scsi裝置,在記憶體中構建scsi子系統所需的資料結構,並提供訊息傳遞介面,將scsi命令的接受與傳送解釋為主機介面卡的操作。

  • 高層: 代表各種scsi裝置型別的驅動,如scsi磁碟驅動,scsi磁帶驅動,高層驅動認領低層驅動發現的scsi裝置,為這些裝置分配名稱,將對裝置的IO轉換為scsi命令,交由低層驅動處理。

  • 中層:包含scsi棧的公共服務函式。高層和低層通過呼叫中層的函式完成其功能,而中層在執行過程中,也需要呼叫高層和低層註冊的回撥函式做一些個性化處理。

Linux SCSI模型

這裡寫圖片描述

Linux SCSI模型是核心的抽象,主機介面卡連線主機IO匯流排(如PCI匯流排)和儲存IO匯流排(如SCSI匯流排)。一臺計算機可以有多個主機介面卡,而主機介面卡可以控制一條或多條SCSI匯流排,一條匯流排可以有多個目標節點與之相連,並且一個目標節點可以有多個邏輯單元。

在Linux SCSI子系統中,核心中的目標節點(target)對應SCSI磁碟,SCSI磁碟中可以有多個邏輯單元,統一由磁碟控制器控制,這些邏輯單元才是真正作為IO終點的儲存裝置,核心用裝置(device)對邏輯單元進行抽象;核心中的Host對應主機介面卡(物理的HBA/RAID卡,虛擬的iscsi target)

核心使用四元組 來唯一標識一個scsi的邏輯單元,在sysfs中檢視sda磁碟<2:0:0:0>顯示如下:

root@ubuntu16:/home/comet/Costor/bin# ls /sys/bus/scsi/devices/2\:0\:0\:0/block/sda/
alignment_offset  device             events_poll_msecs  integrity  removable  sda5    subsystem
bdi               discard_alignment  ext_range          power      ro         size    trace
capability        events             holders            queue      sda1       slaves  uevent
dev               events_async       inflight           range      sda2       stat
root@ubuntu16
:/home/comet/Costor/bin# cat /sys/bus/scsi/devices/2\:0\:0\:0/block/sda/dev 8:0 root@ubuntu16:/home/comet/Costor/bin# ll /dev/sda brw-rw---- 1 root disk 8, 0 Sep 19 11:36 /dev/sda
  • host: 主機介面卡的唯一編號。
  • channel: 主機介面卡中scsi通道編號,由主機介面卡韌體維護。
  • id: 目標節點唯一識別符號。
  • lun: 目標節點內邏輯單元編號。

SCSI命令

SCSI 命令是在 Command Descriptor Block (CDB) 中定義的。CDB 包含了用來定義要執行的特定操作的操作程式碼,以及大量特定於操作的引數。

命令 用途
Test unit ready 查詢裝置是否已經準備好進行傳輸
Inquiry 請求裝置基本資訊
Request sense 請求之前命令的錯誤資訊
Read capacity 請求儲存容量資訊
Read 從裝置讀取資料
Write 向裝置寫入資料
Mode sense 請求模式頁面(裝置引數)
Mode select 在模式頁面配置裝置引數

藉助大約 60 種可用命令,SCSI 可適用於許多裝置(包括隨機存取裝置,比如磁碟和像磁帶這樣的順序儲存裝置)。SCSI 也提供了專門的命令以訪問箱體服務(比如儲存箱體內部當前的感測和溫度)。

核心資料結構

主機介面卡模板scsi_host_template

主機介面卡模板是相同型號主機介面卡的公共內容,包括請求佇列深度,SCSI命令處理回撥函式,錯誤處理恢復函式。分配主機介面卡結構時,需要使用主機介面卡模板來賦值。在編寫SCSI低層驅動時,第一步便是定義模板scsi_host_template,之後才能有模板生成主機介面卡。

struct scsi_host_template {
    struct module *module;  //指向使用該模板實現的scsi_host,低層驅動模組。
    const char *name;       //主機介面卡名稱

    int (* detect)(struct scsi_host_template *);
    int (* release)(struct Scsi_Host *);

    const char *(* info)(struct Scsi_Host *); //返回HBA相關資訊,可選實現

    int (* ioctl)(struct scsi_device *dev, int cmd, void __user *arg); //使用者空間ioctl函式的實現,可選實現


#ifdef CONFIG_COMPAT
    //通過該函式,支援32位系統的使用者態ioctl函式
    int (* compat_ioctl)(struct scsi_device *dev, int cmd, void __user *arg);
#endif

    //將scsi命令放進低層驅動的佇列,由中間層呼叫,必須實現
    int (* queuecommand)(struct Scsi_Host *, struct scsi_cmnd *);

    //以下5個函式是錯誤處理回撥函式,由中間層按照嚴重程度呼叫
    int (* eh_abort_handler)(struct scsi_cmnd *);        //Abort
    int (* eh_device_reset_handler)(struct scsi_cmnd *); //Device Reset
    int (* eh_target_reset_handler)(struct scsi_cmnd *); //Target Reset
    int (* eh_bus_reset_handler)(struct scsi_cmnd *);    //Bus Reset
    int (* eh_host_reset_handler)(struct scsi_cmnd *);   //Host Reset

    //當掃描到新磁碟時呼叫,中間層回撥這個函式中可以分配和初始化低層驅動所需要的結構
    int (* slave_alloc)(struct scsi_device *)

//在裝置受到INQUIRY命令後,執行相關的配置操作
    int (* slave_configure)(struct scsi_device *);

    //在scsi裝置銷燬之前呼叫,中間層回撥用於釋放slave_alloc分配的私有資料
    void (* slave_destroy)(struct scsi_device *);

    //當發現新的target,中間層呼叫,使用者分配target私有資料
    int (* target_alloc)(struct scsi_target *);

    //在target被銷燬之前,中間層呼叫,低層驅動實現,用於釋放target_alloc分配的資料
    void (* target_destroy)(struct scsi_target *);

    //需要自定義掃描target邏輯時,中間層迴圈檢查返回值,直到該函式返回1,表示掃描完成
    int (* scan_finished)(struct Scsi_Host *, unsigned long);

    //需要自定義掃描target邏輯時,掃描開始前回調
    void (* scan_start)(struct Scsi_Host *);

    //改變主機介面卡的佇列深度,返回設定的佇列深度
    int (* change_queue_depth)(struct scsi_device *, int);

    //返回磁碟的BIOS引數,如size, device, list (heads, sectors, cylinders)
    int (* bios_param)(struct scsi_device *, struct block_device *,
            sector_t, int []);

    void (*unlock_native_capacity)(struct scsi_device *);

    //在procfs中的讀寫操作回撥
    int (*show_info)(struct seq_file *, struct Scsi_Host *);
    int (*write_info)(struct Scsi_Host *, char *, int);

    //中間層發現scsi命令超時回撥
    enum blk_eh_timer_return (*eh_timed_out)(struct scsi_cmnd *);

    //通過sysfs屬性reset主機介面卡時,回撥
    int (*host_reset)(struct Scsi_Host *shost, int reset_type);
#define SCSI_ADAPTER_RESET  1
#define SCSI_FIRMWARE_RESET 2

    const char *proc_name; //在proc檔案系統的名稱

    struct proc_dir_entry *proc_dir;

    int can_queue; //主機介面卡能同時接受的命令數

    int this_id;

    /*
     * This determines the degree to which the host adapter is capable
     * of scatter-gather.
     */  //聚散列表的引數
    unsigned short sg_tablesize;
    unsigned short sg_prot_tablesize;

    /*
     * Set this if the host adapter has limitations beside segment count.
     */ //單個scsi命令能夠訪問的扇區最大數量
    unsigned int max_sectors;

    /*
     * DMA scatter gather segment boundary limit. A segment crossing this
     * boundary will be split in two.
     */
    unsigned long dma_boundary; //DMA聚散段邊界值,超過該值將被切割成兩個

#define SCSI_DEFAULT_MAX_SECTORS    1024

    short cmd_per_lun;

    /*
     * present contains counter indicating how many boards of this
     * type were found when we did the scan.
     */
    unsigned char present;

    /* If use block layer to manage tags, this is tag allocation policy */
    int tag_alloc_policy;

    /*
     * Track QUEUE_FULL events and reduce queue depth on demand.
     */
    unsigned track_queue_depth:1;

    /*
     * This specifies the mode that a LLD supports.
     */
    unsigned supported_mode:2; //低層驅動支援的模式(initiator或target)

    /*
     * True if this host adapter uses unchecked DMA onto an ISA bus.
     */
    unsigned unchecked_isa_dma:1;

    unsigned use_clustering:1;

    /*
     * True for emulated SCSI host adapters (e.g. ATAPI).
     */
    unsigned emulated:1;

    /*
     * True if the low-level driver performs its own reset-settle delays.
     */
    unsigned skip_settle_delay:1;

    /* True if the controller does not support WRITE SAME */
    unsigned no_write_same:1;

    /*
     * True if asynchronous aborts are not supported
     */
    unsigned no_async_abort:1;

    /*
     * Countdown for host blocking with no commands outstanding.
     */
    unsigned int max_host_blocked; //主機介面卡傳送佇列的低閥值,允許累計多個命令同時派發

#define SCSI_DEFAULT_HOST_BLOCKED   7

    /*
     * Pointer to the sysfs class properties for this host, NULL terminated.
     */
    struct device_attribute **shost_attrs; //主機介面卡類屬性

    /*
     * Pointer to the SCSI device properties for this host, NULL terminated.
     */
    struct device_attribute **sdev_attrs;  //主機介面卡裝置屬性

    struct list_head legacy_hosts;

    u64 vendor_id;

    /*
     * Additional per-command data allocated for the driver.
     */  //scsi 命令緩衝池,scsi命令都是預先分配好的,儲存在cmd_pool中
    unsigned int cmd_size;
    struct scsi_host_cmd_pool *cmd_pool;

    /* temporary flag to disable blk-mq I/O path */
    bool disable_blk_mq;  //禁用通用塊層多佇列模式標誌
};

主機介面卡Scsi_Host

Scsi_Host描述一個SCSI主機介面卡,SCSI主機介面卡通常是一塊基於PCI匯流排的擴充套件卡或是一個SCSI控制器晶片。每個SCSI主機介面卡可以存在多個通道,一個通道實際擴充套件了一條SCSI匯流排。每個通過可以連線多個SCSI目標節點,具體連線數量與SCSI匯流排帶載能力有關,或者受具體SCSI協議的限制。 真實的主機匯流排介面卡是接入主機IO總線上(通常是PCI匯流排),在系統啟動時,會掃描掛載在PCI總線上的裝置,此時會分配主機匯流排介面卡。
Scsi_Host結構包含內嵌通用裝置,將被鏈入SCSI匯流排型別(scsi_bus_type)的裝置連結串列。

struct Scsi_Host {
    struct list_head    __devices; //裝置連結串列
    struct list_head    __targets; //目標節點連結串列

    struct scsi_host_cmd_pool *cmd_pool; //scsi命令緩衝池
    spinlock_t      free_list_lock;   //保護free_list
    struct list_head    free_list; /* backup store of cmd structs, scsi命令預先分配的備用命令連結串列 */
    struct list_head    starved_list; //scsi命令的飢餓連結串列

    spinlock_t      default_lock;
    spinlock_t      *host_lock;

    struct mutex        scan_mutex;/* serialize scanning activity */

    struct list_head    eh_cmd_q; //執行錯誤的scsi命令的連結串列
    struct task_struct    * ehandler;  /* Error recovery thread. 錯誤恢復執行緒 */
    struct completion     * eh_action; /* Wait for specific actions on the
                          host. */
    wait_queue_head_t       host_wait; //scsi裝置恢復等待佇列
    struct scsi_host_template *hostt;  //主機介面卡模板
    struct scsi_transport_template *transportt; //指向SCSI傳輸層模板

    /*
     * Area to keep a shared tag map (if needed, will be
     * NULL if not).
     */
    union {
        struct blk_queue_tag    *bqt;
        struct blk_mq_tag_set   tag_set; //SCSI支援多佇列時使用
    };
    //已經派發給主機介面卡(低層驅動)的scsi命令數
    atomic_t host_busy;        /* commands actually active on low-level */
    atomic_t host_blocked;  //阻塞的scsi命令數

    unsigned int host_failed;      /* commands that failed.
                          protected by host_lock */
    unsigned int host_eh_scheduled;    /* EH scheduled without command */

    unsigned int host_no;  /* Used for IOCTL_GET_IDLUN, /proc/scsi et al. 系統內唯一標識 */

    /* next two fields are used to bound the time spent in error handling */
    int eh_deadline;
    unsigned long last_reset; //記錄上次reset時間


    /*
     * These three parameters can be used to allow for wide scsi,
     * and for host adapters that support multiple busses
     * The last two should be set to 1 more than the actual max id
     * or lun (e.g. 8 for SCSI parallel systems).
     */
    unsigned int max_channel; //主機介面卡的最大通道編號
    unsigned int max_id;      //主機介面卡目標節點最大編號
    u64 max_lun;              //主機介面卡lun最大編號

    unsigned int unique_id;

    /*
     * The maximum length of SCSI commands that this host can accept.
     * Probably 12 for most host adapters, but could be 16 for others.
     * or 260 if the driver supports variable length cdbs.
     * For drivers that don't set this field, a value of 12 is
     * assumed.
     */
    unsigned short max_cmd_len;  //主機介面卡可以接受的最長的SCSI命令
    //下面這段在scsi_host_template中也有,由template中的欄位賦值
    int this_id;
    int can_queue;
    short cmd_per_lun;
    short unsigned int sg_tablesize;
    short unsigned int sg_prot_tablesize;
    unsigned int max_sectors;
    unsigned long dma_boundary;
    /*
     * In scsi-mq mode, the number of hardware queues supported by the LLD.
     *
     * Note: it is assumed that each hardware queue has a queue depth of
     * can_queue. In other words, the total queue depth per host
     * is nr_hw_queues * can_queue.
     */
    unsigned nr_hw_queues; //在scsi-mq模式中,低層驅動所支援的硬體佇列的數量
    /*
     * Used to assign serial numbers to the cmds.
     * Protected by the host lock.
     */
    unsigned long cmd_serial_number;  //指向命令序列號

    unsigned active_mode:2;           //標識是initiator或target
    unsigned unchecked_isa_dma:1;
    unsigned use_clustering:1;

    /*
     * Host has requested that no further requests come through for the
     * time being.
     */
    unsigned host_self_blocked:1; //表示低層驅動要求阻塞該主機介面卡,此時中間層不會繼續派發命令到主機介面卡佇列中

    /*
     * Host uses correct SCSI ordering not PC ordering. The bit is
     * set for the minority of drivers whose authors actually read
     * the spec ;).
     */
    unsigned reverse_ordering:1;

    /* Task mgmt function in progress */
    unsigned tmf_in_progress:1;  //任務管理函式正在執行

    /* Asynchronous scan in progress */
    unsigned async_scan:1;       //非同步掃描正在執行

    /* Don't resume host in EH */
    unsigned eh_noresume:1;      //在錯誤處理過程不恢復主機介面卡

    /* The controller does not support WRITE SAME */
    unsigned no_write_same:1;

    unsigned use_blk_mq:1;       //是否使用SCSI多佇列模式
    unsigned use_cmd_list:1;

    /* Host responded with short (<36 bytes) INQUIRY result */
    unsigned short_inquiry:1;

    /*
     * Optional work queue to be utilized by the transport
     */
    char work_q_name[20];  //被scsi傳輸層使用的工作佇列
    struct workqueue_struct *work_q;

    /*
     * Task management function work queue
     */
    struct workqueue_struct *tmf_work_q; //任務管理函式工作佇列

    /* The transport requires the LUN bits NOT to be stored in CDB[1] */
    unsigned no_scsi2_lun_in_cdb:1;

    /*
     * Value host_blocked counts down from
     */
    unsigned int max_host_blocked; //在派發佇列中累計命令達到這個數值,才開始喚醒主機介面卡

    /* Protection Information */
    unsigned int prot_capabilities;
    unsigned char prot_guard_type;

    /*
     * q used for scsi_tgt msgs, async events or any other requests that
     * need to be processed in userspace
     */
    struct request_queue *uspace_req_q; //需要在使用者空間處理的scsi_tgt訊息、非同步事件或其他請求的請求佇列

    /* legacy crap */
    unsigned long base;
    unsigned long io_port;   //I/O埠編號
    unsigned char n_io_port;
    unsigned char dma_channel;
    unsigned int  irq;


    enum scsi_host_state shost_state; //狀態

    /* ldm bits */ //shost_gendev: 內嵌通用裝置,SCSI裝置通過這個域鏈入SCSI匯流排型別(scsi_bus_type)的裝置連結串列
    struct device       shost_gendev, shost_dev;
    //shost_dev: 內嵌類裝置, SCSI裝置通過這個域鏈入SCSI主機介面卡型別(shost_class)的裝置連結串列
    /*
     * List of hosts per template.
     *
     * This is only for use by scsi_module.c for legacy templates.
     * For these access to it is synchronized implicitly by
     * module_init/module_exit.
     */
    struct list_head sht_legacy_list;

    /*
     * Points to the transport data (if any) which is allocated
     * separately
     */
    void *shost_data; //指向獨立分配的傳輸層資料,由SCSI傳輸層使用

    /*
     * Points to the physical bus device we'd use to do DMA
     * Needed just in case we have virtual hosts.
     */
    struct device *dma_dev;

    /*
     * We should ensure that this is aligned, both for better performance
     * and also because some compilers (m68k) don't automatically force
     * alignment to a long boundary.
     */ //主機介面卡專有資料
    unsigned long hostdata[0]  /* Used for storage of host specific stuff */
        __attribute__ ((aligned (sizeof(unsigned long))));
};

目標節點scsi_target

scsi_target結構中有一個內嵌驅動模型裝置,被鏈入SCSI匯流排型別scsi_bus_type的裝置連結串列。

struct scsi_target {
    struct scsi_device  *starget_sdev_user; //指向正在進行I/O的scsi裝置,沒有IO則指向NULL
    struct list_head    siblings;  //鏈入主機介面卡target連結串列中
    struct list_head    devices;   //屬於該target的device連結串列
    struct device       dev;       //通用裝置,用於加入裝置驅動模型
    struct kref     reap_ref; /* last put renders target invisible 本結構的引用計數 */
    unsigned int        channel;   //該target所在的channel號
    unsigned int        id; /* target id ... replace
                     * scsi_device.id eventually */
    unsigned int        create:1; /* signal that it needs to be added */
    unsigned int        single_lun:1;   /* Indicates we should only
                         * allow I/O to one of the luns
                         * for the device at a time. */
    unsigned int        pdt_1f_for_no_lun:1;    /* PDT = 0x1f
                         * means no lun present. */
    unsigned int        no_report_luns:1;   /* Don't use
                         * REPORT LUNS for scanning. */
    unsigned int        expecting_lun_change:1; /* A device has reported
                         * a 3F/0E UA, other devices on
                         * the same target will also. */
    /* commands actually active on LLD. */
    atomic_t        target_busy;
    atomic_t        target_blocked;           //當前阻塞的命令數

    /*
     * LLDs should set this in the slave_alloc host template callout.
     * If set to zero then there is not limit.
     */
    unsigned int        can_queue;             //同時處理的命令數
    unsigned int        max_target_blocked;    //阻塞命令數閥值
#define SCSI_DEFAULT_TARGET_BLOCKED 3

    char            scsi_level;                //支援的SCSI規範級別
    enum scsi_target_state  state;             //target狀態
    void            *hostdata; /* available to low-level driver */
    unsigned long       starget_data[0]; /* for the transport SCSI傳輸層(中間層)使用 */
    /* starget_data must be the last element!!!! */
} __attribute__((aligned(sizeof(unsigned long))));

邏輯裝置scsi_device

scsi_device描述scsi邏輯裝置,代表scsi磁碟的邏輯單元lun。scsi_device描述符所代表的裝置可能是另一臺儲存裝置上的SATA/SAS/SCSI磁碟或SSD。作業系統在掃描到連線在主機介面卡上的邏輯裝置時,建立scsi_device結構,用於scsi高層驅動和該裝置通訊。

struct scsi_device {
    struct Scsi_Host *host;  //所歸屬的主機匯流排介面卡
    struct request_queue *request_queue; //請求佇列

    /* the next two are protected by the host->host_lock */
    struct list_head    siblings;   /* list of all devices on this host */ //鏈入主機匯流排介面卡裝置連結串列
    struct list_head    same_target_siblings; /* just the devices sharing same target id */ //鏈入target的裝置連結串列

    atomic_t device_busy;       /* commands actually active on LLDD */
    atomic_t device_blocked;    /* Device returned QUEUE_FULL. */

    spinlock_t list_lock;
    struct list_head cmd_list;  /* queue of in use SCSI Command structures */
    struct list_head starved_entry; //鏈入主機介面卡的"飢餓"連結串列
    struct scsi_cmnd *current_cmnd; /* currently active command */ //當前正在執行的命令
    unsigned short queue_depth; /* How deep of a queue we want */
    unsigned short max_queue_depth; /* max queue depth */
    unsigned short last_queue_full_depth; /* These two are used by */
    unsigned short last_queue_full_count; /* scsi_track_queue_full() */
    unsigned long last_queue_full_time; /* last queue full time */
    unsigned long queue_ramp_up_period; /* ramp up period in jiffies */
#define SCSI_DEFAULT_RAMP_UP_PERIOD (120 * HZ)

    unsigned long last_queue_ramp_up;   /* last queue ramp up time */

    unsigned int id, channel; //scsi_device所屬的target id和所在channel通道號
    u64 lun;  //該裝置的lun編號
    unsigned int manufacturer;  /* Manufacturer of device, for using  製造商
                     * vendor-specific cmd's */
    unsigned sector_size;   /* size in bytes 硬體的扇區大小 */

    void *hostdata;     /* available to low-level driver 專有資料 */
    char type;          //SCSI裝置型別
    char scsi_level;    //所支援SCSI規範的版本號,由INQUIRY命令獲得
    char inq_periph_qual;   /* PQ from INQUIRY data */
    unsigned char inquiry_len;  /* valid bytes in 'inquiry' */
    unsigned char * inquiry;    /* INQUIRY response data */
    const char * vendor;        /* [back_compat] point into 'inquiry' ... */
    const char * model;     /* ... after scan; point to static string */
    const char * rev;       /* ... "nullnullnullnull" before scan */

#define SCSI_VPD_PG_LEN                255
    int vpd_pg83_len;          //sense命令 0x83
    unsigned char *vpd_pg83;
    int vpd_pg80_len;          //sense命令 0x80
    unsigned char *vpd_pg80;
    unsigned char current_tag;  /* current tag */
    struct scsi_target      *sdev_target;   /* used only for single_lun */

    unsigned int    sdev_bflags; /* black/white flags as also found in
                 * scsi_devinfo.[hc]. For now used only to
                 * pass settings from slave_alloc to scsi
                 * core. */
    unsigned int eh_timeout; /* Error handling timeout */
    unsigned removable:1;
    unsigned changed:1; /* Data invalid due to media change */
    unsigned busy:1;    /* Used to prevent races */
    unsigned lockable:1;    /* Able to prevent media removal */
    unsigned locked:1;      /* Media removal disabled */
    unsigned borken:1;  /* Tell the Seagate driver to be
                 * painfully slow on this device */
    unsigned disconnect:1;  /* can disconnect */
    unsigned soft_reset:1;  /* Uses soft reset option */
    unsigned sdtr:1;    /* Device supports SDTR messages 支援同步資料傳輸 */
    unsigned wdtr:1;    /* Device supports WDTR messages 支援16位寬資料傳輸*/
    unsigned ppr:1;     /* Device supports PPR messages 支援PPR(並行協議請求)訊息*/
    unsigned tagged_supported:1;    /* Supports SCSI-II tagged queuing */
    unsigned simple_tags:1; /* simple queue tag messages are enabled */
    unsigned was_reset:1;   /* There was a bus reset on the bus for
                 * this device */
    unsigned expecting_cc_ua:1; /* Expecting a CHECK_CONDITION/UNIT_ATTN
                     * because we did a bus reset. */
    unsigned use_10_for_rw:1; /* first try 10-byte read / write */
    unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
    unsigned no_report_opcodes:1;   /* no REPORT SUPPORTED OPERATION CODES */
    unsigned no_write_same:1;   /* no WRITE SAME command */
    unsigned use_16_for_rw:1; /* Use read/write(16) over read/write(10) */
    unsigned skip_ms_page_8:1;  /* do not use MODE SENSE page 0x08 */
    unsigned skip_ms_page_3f:1; /* do not use MODE SENSE page 0x3f */
    unsigned skip_vpd_pages:1;  /* do not read VPD pages */
    unsigned try_vpd_pages:1;   /* attempt to read VPD pages */
    unsigned use_192_bytes_for_3f:1; /* ask for 192 bytes from page 0x3f */
    unsigned no_start_on_add:1; /* do not issue start on add */
    unsigned allow_restart:1; /* issue START_UNIT in error handler */
    unsigned manage_start_stop:1;   /* Let HLD (sd) manage start/stop */
    unsigned start_stop_pwr_cond:1; /* Set power cond. in START_STOP_UNIT */
    unsigned no_uld_attach:1; /* disable connecting to upper level drivers */
    unsigned select_no_atn:1;
    unsigned fix_capacity:1;    /* READ_CAPACITY is too high by 1 */
    unsigned guess_capacity:1;  /* READ_CAPACITY might be too high by 1 */
    unsigned retry_hwerror:1;   /* Retry HARDWARE_ERROR */
    unsigned last_sector_bug:1; /* do not use multisector accesses on
                       SD_LAST_BUGGY_SECTORS */
    unsigned no_read_disc_info:1;   /* Avoid READ_DISC_INFO cmds */
    unsigned no_read_capacity_16:1; /* Avoid READ_CAPACITY_16 cmds */
    unsigned try_rc_10_first:1; /* Try READ_CAPACACITY_10 first */
    unsigned is_visible:1;  /* is the device visible in sysfs */
    unsigned wce_default_on:1;  /* Cache is ON by default */
    unsigned no_dif:1;  /* T10 PI (DIF) should be disabled */
    unsigned broken_fua:1;      /* Don't set FUA bit */
    unsigned lun_in_cdb:1;      /* Store LUN bits in CDB[1] */

    atomic_t disk_events_disable_depth; /* disable depth for disk events */

    DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */
    DECLARE_BITMAP(pending_events, SDEV_EVT_MAXBITS); /* pending events */
    struct list_head event_list;    /* asserted events */
    struct work_struct event_work;

    unsigned int max_device_blocked; /* what device_blocked counts down from  */
#define SCSI_DEFAULT_DEVICE_BLOCKED 3

    atomic_t iorequest_cnt;
    atomic_t iodone_cnt;
    atomic_t ioerr_cnt;

    struct device       sdev_gendev, //內嵌通用裝置, 鏈入scsi匯流排型別(scsi_bus_type)的裝置連結串列
                sdev_dev; //內嵌類裝置,鏈入scsi裝置類(sdev_class)的裝置連結串列

    struct execute_work ew; /* used to get process context on put */
    struct work_struct  requeue_work;

    struct scsi_device_handler *handler; //自定義裝置處理函式
    void            *handler_data;

    enum scsi_device_state sdev_state;  //scsi裝置狀態
    unsigned long       sdev_data[0];   //scsi傳輸層使用
} __attribute__((aligned(sizeof(unsigned long))));

核心定義的SCSI命令結構scsi_cmnd

scsi_cmnd結構有SCSI中間層建立,傳遞到SCSI低層驅動。每個IO請求會被建立一個scsi_cnmd,但scsi_cmnd並不一定是時IO請求。scsi_cmnd最終轉化成一個具體的SCSI命令。除了命令描述塊之外,scsi_cmnd包含更豐富的資訊,包括資料緩衝區、感測資料緩衝區、完成回撥函式以及所關聯的塊裝置驅動層請求等,是SCSI中間層執行SCSI命令的上下文。

struct scsi_cmnd {
    struct scsi_device *device;  //指向命令所屬SCSI裝置的描述符的指標
    struct list_head list;  /* scsi_cmnd participates in queue lists 鏈入scsi裝置的命令連結串列 */
    struct list_head eh_entry; /* entry for the host eh_cmd_q */
    struct delayed_work abort_work;
    int eh_eflags;      /* Used by error handlr */

    /*
     * A SCSI Command is assigned a nonzero serial_number before passed
     * to the driver's queue command function.  The serial_number is
     * cleared when scsi_done is entered indicating that the command
     * has been completed.  It is a bug for LLDDs to use this number
     * for purposes other than printk (and even that is only useful
     * for debugging).
     */
    unsigned long serial_number; //scsi命令的唯一序號

    /*
     * This is set to jiffies as it was when the command was first
     * allocated.  It is used to time how long the command has
     * been outstanding
     */
    unsigned long jiffies_at_alloc; //分配時的jiffies, 用於計算命令處理時間

    int retries;  //命令重試次數
    int allowed;  //允許的重試次數

    unsigned char prot_op;    //保護操作(DIF和DIX)
    unsigned char prot_type;  //DIF保護型別
    unsigned char prot_flags;

    unsigned short cmd_len;   //命令長度
    enum dma_data_direction sc_data_direction;  //命令傳輸方向

    /* These elements define the operation we are about to perform */
    unsigned char *cmnd;  //scsi規範格式的命令字串


    /* These elements define the operation we ultimately want to perform */
    struct scsi_data_buffer sdb;        //scsi命令資料緩衝區
    struct scsi_data_buffer *prot_sdb;  //scsi命令保護資訊緩衝區

    unsigned underflow; /* Return error if less than
                   this amount is transferred */

    unsigned transfersize;  /* How much we are guaranteed to  //傳輸單位
                   transfer with each SCSI transfer
                   (ie, between disconnect /
                   reconnects.   Probably == sector
                   size */

    struct request *request;    /* The command we are  通用塊層的請求描述符
                       working on */

#define SCSI_SENSE_BUFFERSIZE   96
    unsigned char *sense_buffer;    //scsi命令感測資料緩衝區
                /* obtained by REQUEST SENSE when
                 * CHECK CONDITION is received on original
                 * command (auto-sense) */

    /* Low-level done function - can be used by low-level driver to point
     *        to completion function.  Not used by mid/upper level code. */
    void (*scsi_done) (struct scsi_cmnd *); //scsi命令在低層驅動完成時,回撥

    /*
     * The following fields can be written to by the host specific code.
     * Everything else should be left alone.
     */
    struct scsi_pointer SCp;    /* Scratchpad used by some host adapters */

    unsigned char *host_scribble;   /* The host adapter is allowed to
                     * call scsi_malloc and get some memory
                     * and hang it here.  The host adapter
                     * is also expected to call scsi_free
                     * to release this memory.  (The memory
                     * obtained by scsi_malloc is guaranteed
                     * to be at an address < 16Mb). */

    int result;     /* Status code from lower level driver */
    int flags;      /* Command flags */

    unsigned char tag;  /* SCSI-II queued command tag */
};

驅動scsi_driver

struct scsi_driver {
    struct device_driver    gendrv;  // "繼承"device_driver

    void (*rescan)(struct device *); //重新掃描前呼叫的回撥函式
    int (*init_command)(struct scsi_cmnd *);
    void (*uninit_command)(struct scsi_cmnd *);
    int (*done)(struct scsi_cmnd *);  //當低層驅動完成一個scsi命令時呼叫,用於計算已經完成的位元組數
    int (*eh_action)(struct scsi_cmnd *, int); //錯誤處理回撥
};

裝置模型

  • scsi_bus_type: scsi子系統匯流排型別
struct bus_type scsi_bus_type = {
        .name       = "scsi",   // 對應/sys/bus/scsi
        .match      = scsi_bus_match,
    .uevent     = scsi_bus_uevent,
#ifdef CONFIG_PM
    .pm     = &scsi_bus_pm_ops,
#endif
};
EXPORT_SYMBOL_GPL(scsi_bus_type);
  • shost_class: scsi子系統類
static struct class shost_class = {
    .name       = "scsi_host",  // 對應/sys/class/scsi_host
    .dev_release    = scsi_host_cls_release,
};

這裡寫圖片描述

初始化過程

作業系統啟動時,會載入scsi子系統,入口函式是init_scsi,使用subsys_initcall定義:

static int __init init_scsi(void)
{
    int error;

    error = scsi_init_queue();  //初始化聚散列表所需要的儲存池
    if (error)
        return error;
    error = scsi_init_procfs(); //初始化procfs中與scsi相關的目錄項
    if (error)
        goto cleanup_queue;
    error = scsi_init_devinfo();//設定scsi動態裝置資訊列表
    if (error)
        goto cleanup_procfs;
    error = scsi_init_hosts();  //註冊shost_class類,在/sys/class/目錄下建立scsi_host子目錄
    if (error)
        goto cleanup_devlist;
    error = scsi_init_sysctl(); //註冊SCSI系統控制表
    if (error)
        goto cleanup_hosts;
    error = scsi_sysfs_register(); //註冊scsi_bus_type匯流排型別和sdev_class類
    if (error)
        goto cleanup_sysctl;

    scsi_netlink_init(); //初始化SCSI傳輸netlink介面

    printk(KERN_NOTICE "SCSI subsystem initialized\n");
    return 0;

cleanup_sysctl:
    scsi_exit_sysctl();
cleanup_hosts:
    scsi_exit_hosts();
cleanup_devlist:
    scsi_exit_devinfo();
cleanup_procfs:
    scsi_exit_procfs();
cleanup_queue:
    scsi_exit_queue();
    printk(KERN_ERR "SCSI subsystem failed to initialize, error = %d\n",
           -error);
    return error;
}

scsi_init_hosts函式初始化scsi子系統主機介面卡所屬的類shost_class:

int scsi_init_hosts(void)
{
    return class_register(&shost_class);
}

scsi_sysfs_register函式初始化scsi子系統匯流排型別scsi_bus_type和裝置所屬的類sdev_class類:

int scsi_sysfs_register(void)
{
    int error;

    error = bus_register(&scsi_bus_type);
    if (!error) {
        error = class_register(&sdev_class);
        if (error)
            bus_unregister(&scsi_bus_type);
    }

    return error;
}

scsi低層驅動是面向主機介面卡的,低層驅動被載入時,需要新增主機介面卡。主機介面卡新增有兩種方式:1.在PCI子系統掃描掛載驅動時新增;2.手動方式新增。所有基於硬體PCI介面的主機介面卡都採用第一種方式。新增主機介面卡包括兩個步驟:
1. 分別主機介面卡資料結構sc