linux 檔案系統簡析

阿新 • • 發佈：2019-01-17

最近在看APUE，其中的一章談到了檔案系統，所以我在這裡把linux 虛擬檔案系統的相關內容做一個簡單總結，其中會有部分原始碼，但不是很深入。

在上回的blog中，我們初步遇到了幾個資料結構，還是從現象出發，逐步深入。我們已經瞭解到在程序描述符中與檔案系統相關的資料結構有"struct files_struct"，除此以外還有：

struct fs_struct {
	int users;
	spinlock_t lock;
	seqcount_t seq;
	int umask;
	int in_exec;
	struct path root, pwd;
};

該結構由程序描述符中的fs域指向，定義位於/include/linux/fs_struct.h，該結構主要包含了當前程序的當前工作目錄（pwd）和根目錄。

還有一個結構體是：

struct nsproxy {
	atomic_t count;
	struct uts_namespace *uts_ns;
	struct ipc_namespace *ipc_ns;
	struct mnt_namespace *mnt_ns;
	struct pid_namespace *pid_ns_for_children;
	struct net 	     *net_ns;
};

該結構體定義位於/include/linux/nsproxy.h中，這裡關於linux程序名稱空間的內容暫且放下，回頭再來討論這個主題。

通過分析以上兩個結構體的內容，發現這兩個結構體其實與檔案系統的基本操作關係不大（如read、write操作等），看來還是得回到struct files_struct上來，再來看看它的內容：

struct files_struct {
  /*
   * read mostly part
   */
	atomic_t count;
	struct fdtable __rcu *fdt;
	struct fdtable fdtab;
  /*
   * written part on a separate cache line in SMP
   */
	spinlock_t file_lock ____cacheline_aligned_in_smp;
	int next_fd;
	unsigned long close_on_exec_init[1];
	unsigned long open_fds_init[1];
	struct file __rcu * fd_array[NR_OPEN_DEFAULT];
};

以下內容摘自LKD，其中的內容我無法通過實驗程序，因為上述內容位於核心中，關於核心的除錯方法我還不會

fd_array陣列指標指向已開啟的檔案物件，由於NR_OPEN_DEFAULT的值有上限，所以如果一個程序所開啟的檔案物件超過某個限定值，核心將分配一個新陣列，並且將fdt指標指向它，關於“struct fdtable”結構體的內容我們之前已經進行了簡單的分析，再來回顧一下：

struct fdtable {
	unsigned int max_fds;
	struct file __rcu **fd;      /* current fd array */
	unsigned long *close_on_exec;
	unsigned long *open_fds;
	struct rcu_head rcu;
};

此處fd的作用與fd_array的作用相同，均指向已經開啟的檔案物件。

好，既然已經談到了檔案物件，那就對檔案物件做一個詳細的研究，根據當前我看到的一些資料（Linux核心設計與實現、深入理解Linux核心），虛擬檔案系統（virtual file system，VFS）中有四個主要的物件型別，分別是：

超級塊物件，它代表一個具體的已安裝檔案系統。
索引節點物件，它代表一個具體檔案。
目錄項物件，它代表一個目錄項，是路徑的一個組成部分。
檔案物件，它代表由程序開啟的檔案。

這裡盜用《深入理解Linux核心》中的一副圖，來表示這四個物件型別之間的關係。

Linux_inode_file_dentry

先來看看struct file，基本定義如下：

struct file {
	union {
		struct llist_node	fu_llist;
		struct rcu_head 	fu_rcuhead;
	} f_u;
	struct path		f_path;
	struct inode		*f_inode;	/* cached value */
	const struct file_operations	*f_op;

	/*
	 * Protects f_ep_links, f_flags.
	 * Must not be taken from IRQ context.
	 */
	spinlock_t		f_lock;
	atomic_long_t		f_count;
	unsigned int 		f_flags;
	fmode_t			f_mode;
	struct mutex		f_pos_lock;
	loff_t			f_pos;
	struct fown_struct	f_owner;
	const struct cred	*f_cred;
	struct file_ra_state	f_ra;

	u64			f_version;
#ifdef CONFIG_SECURITY
	void			*f_security;
#endif
	/* needed for tty driver, and maybe others */
	void			*private_data;

#ifdef CONFIG_EPOLL
	/* Used by fs/eventpoll.c to link all the hooks to this file */
	struct list_head	f_ep_links;
	struct list_head	f_tfile_llink;
#endif /* #ifdef CONFIG_EPOLL */
	struct address_space	*f_mapping;
} __attribute__((aligned(4)));	/* lest something weird decides that 2 is OK */

檔案物件是已開啟的檔案在記憶體中的表示。該物件（不是物理檔案）由相應的open系統呼叫建立，由close系統呼叫撤銷，所有這些檔案相關的呼叫實際上都是檔案操作表中定義的方法。因為多個程序可以同時開啟和操作同一個檔案，所以同一個檔案也可能存在多個對應的檔案物件。檔案物件僅僅在程序觀點上代表已開啟的檔案，它反過來指向目錄項物件，其實只有目錄項物件才代表已開啟的實際檔案。雖然一個檔案對應的檔案物件不是惟一的，即通過open函式開啟一個檔案就會得到一個檔案描述符，即使是同一個程序開啟相同的檔案得到的檔案描述符也不相同，不同的檔案描述符指向fd_array中不同的檔案物件。雖然一個檔案對應的檔案物件不是惟一的，但對應的索引節點和目錄項無疑是惟一的。

這裡比較重要的欄位有三個：

	struct path        f_path;
        struct inode		*f_inode;	/* cached value */
	const struct file_operations	*f_op;

先來看f_path的定義，位於/include/linux/path.h

struct path {
	struct vfsmount *mnt;
	struct dentry *dentry;
};

通過這一結構體，檔案物件就與目錄項物件建立了聯絡。

再來看f_inode欄位。f_inode的型別是索引節點物件，這一點與上圖中描述的情況有所不同：檔案物件與索引節點物件存在直接關係。這一點與《Linux核心設計與實現》、《深入理解Linux核心》中描述的也不相同，檔案物件中就不包括這一欄位，這一欄位可能是2.6之後引入的新欄位。

不過也可以根據註釋對f_inode的功能做一個簡單的推測，f_inode的可能是對索引節點的快取，在訪問時可以不通過目錄項物件，直接對索引節點進行訪問。

接下來struct file_operations，這一欄位定義了檔案物件的所有操作，具體定義如下：

struct file_operations {
    struct module *owner;
    loff_t (*llseek) (struct file *, loff_t, int);
    ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
    ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
    ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
    ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
    ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
    ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
    int (*iterate) (struct file *, struct dir_context *);
    unsigned int (*poll) (struct file *, struct poll_table_struct *);
    long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
    long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
    int (*mmap) (struct file *, struct vm_area_struct *);
    void (*mremap)(struct file *, struct vm_area_struct *);
    int (*open) (struct inode *, struct file *);
    int (*flush) (struct file *, fl_owner_t id);
    int (*release) (struct inode *, struct file *);
    int (*fsync) (struct file *, loff_t, loff_t, int datasync);
    int (*aio_fsync) (struct kiocb *, int datasync);
    int (*fasync) (int, struct file *, int);
    int (*lock) (struct file *, int, struct file_lock *);
    ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
    unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
    int (*check_flags)(int);
    int (*flock) (struct file *, int, struct file_lock *);
    ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
    ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
    int (*setlease)(struct file *, long, struct file_lock **, void **);
    long (*fallocate)(struct file *file, int mode, loff_t offset,
              loff_t len);
    void (*show_fdinfo)(struct seq_file *m, struct file *f);
};

注意此處read的函式名就是read，由系統呼叫read呼叫它，而我們先前已經瞭解到系統呼叫read的函式名是sys_read。

通過對檔案物件的簡單研究我們也可以發現，虛擬檔案系統的實現在很大程度上體現了面向物件的思想，其中即包括物件所操作的資料，同時也包括對這些資料進行操作的函式。

在對檔案物件進行簡單分析後，再向下一層對目錄項物件進行分析。VFS把目錄當作檔案對待，所以對於某個特定的路徑，其中可能即包括目錄檔案同時也包括普通檔案，路徑中的每個組成部分都由一個索引點物件表示。雖然他們可以統一由索引節點表示，但是VFS經常需要執行目錄相關的操作，比如路徑名查詢等。路徑名查詢需要解析路徑中的每一個組成部分，不但要確保它有效，而且還需要再進一步尋找路徑的下一個部分。為了方便查詢操作，VFS引入了目錄項的概念。每個dentry代表路徑中的一個特定部分。必須明確一點：在路徑中（包括普通檔案在內），每一個部分都是目錄項物件。解析一個路徑並遍歷其分量絕非簡單的演練，它是耗時的、常規的字串比較過程，執行耗時、程式碼繁瑣。目錄項物件的引入使得這個過程更加簡單（對於這一點我現在還不能理解，沒有目錄項物件會變成什麼樣我現在給不出什麼結論）。

回到主題，目錄項物件定義如下，定義位於/include/linux/dcache.h。

struct dentry {
	/* RCU lookup touched fields */
	unsigned int d_flags;		/* protected by d_lock */
	seqcount_t d_seq;		/* per dentry seqlock */
	struct hlist_bl_node d_hash;	/* lookup hash list */
	struct dentry *d_parent;	/* parent directory */
	struct qstr d_name;
	struct inode *d_inode;		/* Where the name belongs to - NULL is
					 * negative */
	unsigned char d_iname[DNAME_INLINE_LEN];	/* small names */

	/* Ref lookup also touches following */
	struct lockref d_lockref;	/* per-dentry lock and refcount */
	const struct dentry_operations *d_op;
	struct super_block *d_sb;	/* The root of the dentry tree */
	unsigned long d_time;		/* used by d_revalidate */
	void *d_fsdata;			/* fs-specific data */

	struct list_head d_lru;		/* LRU list */
	struct list_head d_child;	/* child of parent list */
	struct list_head d_subdirs;	/* our children */
	/*
	 * d_alias and d_rcu can share memory
	 */
	union {
		struct hlist_node d_alias;	/* inode alias list */
	 	struct rcu_head d_rcu;
	} d_u;
};

目錄項物件沒有對應的磁碟資料結構，VFS根據字串形式的路徑名現場建立它。而且由於目錄項物件並非真正儲存在磁碟上，所以目錄項結構體沒有是否被修改的標誌（也就是是否為髒、是否需要寫回磁碟的標誌）。

以下內容直接引用自《linux核心設計與實現》、《深入理解linux核心》。

目錄項物件共包括三種狀態：被使用、未被使用和負狀態。

一個被使用的目錄項對應一個有效的索引節點（即d_node指向相應的索引節點）並且表明該物件存在一個或多個使用者（即d_count為正值）。它的內容不能被丟棄。
一個未被使用的目錄項對應一個有效的索引節點，但是VFS當前並未使用它（即d_count為0）。但該目錄項物件仍然指向一個有效物件，而且被保留在快取中以便需要時再使用它。這樣使路徑查詢更迅速。為了在必要時回收記憶體，它的內容可能被丟棄。
一個負狀態的目錄項沒有對應的有效索引節點（d_inode為NULL），因為索引節點被刪除了，或路徑不再正確了，但是目錄項仍然保留，以便快速解析以後的路徑查詢。該目錄向仍然被儲存在目錄項快取記憶體中是為後續對同一檔案目錄名的查詢操作能夠快速完成。在需要時其內容同樣可以被丟棄。

上文中提到了目錄項快取記憶體，下面就來簡單瞭解下這一內容。

由於從磁碟讀入一個目錄項並構造相應的目錄項物件需要花費大量的時間，所以，在完成對目錄項的操作後，可能後面還要使用它，因此仍在記憶體中保留它有重要意義。為了最大限度地提高這些目錄項物件的效率，Linux使用目錄項快取記憶體，它由兩種型別的資料結構組成：

一個處於正在使用、未使用或負狀態的目錄項物件的集合。
一個散列表，從中能夠快速獲取與給定的檔名和目錄名對應的目錄項物件。同樣，如果訪問的物件不在目錄項快取記憶體中，則雜湊函式返回一個空值。

對於正在使用的目錄項物件都被插入一個雙向連結串列中，該連結串列由相應索引節點物件的i_dentry欄位所指向（由於每個索引節點可能與若干硬連結關聯，所以需要一個連結串列）。目錄項物件的d_alias欄位存放連結串列中相鄰元素的地址。這兩個欄位的型別都是struct list_head。

未被使用和負狀態的目錄項物件都被插入一個“最近最少使用（LRU）”的雙向連結串列中。由於該連結串列總是在頭部插入目錄項，所以鏈頭節點的資料總比鏈尾的資料要新。每當核心縮減目錄項快取記憶體時，“負”狀態目錄項物件就朝著LRU連結串列的尾部移動，這樣一來，這些物件就逐漸被釋放了。

散列表和相應的雜湊函式用來快速地將給定路徑解析為相關目錄項物件。

接下來簡單看一下目錄項物件的操作函式：

struct dentry_operations {
	int (*d_revalidate)(struct dentry *, unsigned int);
	int (*d_weak_revalidate)(struct dentry *, unsigned int);
	int (*d_hash)(const struct dentry *, struct qstr *);
	int (*d_compare)(const struct dentry *, const struct dentry *,
			unsigned int, const char *, const struct qstr *);
	int (*d_delete)(const struct dentry *);
	void (*d_release)(struct dentry *);
	void (*d_prune)(struct dentry *);
	void (*d_iput)(struct dentry *, struct inode *);
	char *(*d_dname)(struct dentry *, char *, int);
	struct vfsmount *(*d_automount)(struct path *);
	int (*d_manage)(struct dentry *, bool);
	struct inode *(*d_select_inode)(struct dentry *, unsigned);
} ____cacheline_aligned;

在研究目錄項物件的過程中可以發現：索引節點物件與超級塊物件，首先來看索引節點物件。

在上文的分析中我們已經大概瞭解了虛擬檔案系統的實現思想——OOP，所以還是沿著這個思路，先分析類成員，再來分析類操作。struct inode定義如下，位於/include/linux/fs.h。

struct inode {
	umode_t			i_mode;
	unsigned short		i_opflags;
	kuid_t			i_uid;
	kgid_t			i_gid;
	unsigned int		i_flags;

#ifdef CONFIG_FS_POSIX_ACL
	struct posix_acl	*i_acl;
	struct posix_acl	*i_default_acl;
#endif

	const struct inode_operations	*i_op;
	struct super_block	*i_sb;
	struct address_space	*i_mapping;

#ifdef CONFIG_SECURITY
	void			*i_security;
#endif

	/* Stat data, not accessed from path walking */
	unsigned long		i_ino;
	/*
	 * Filesystems may only read i_nlink directly.  They shall use the
	 * following functions for modification:
	 *
	 *    (set|clear|inc|drop)_nlink
	 *    inode_(inc|dec)_link_count
	 */
	union {
		const unsigned int i_nlink;
		unsigned int __i_nlink; //硬連結數目
	};
	dev_t			i_rdev;
	loff_t			i_size;
	struct timespec		i_atime;
	struct timespec		i_mtime;
	struct timespec		i_ctime;
	spinlock_t		i_lock;	/* i_blocks, i_bytes, maybe i_size */
	unsigned short          i_bytes;
	unsigned int		i_blkbits;
	blkcnt_t		i_blocks;

#ifdef __NEED_I_SIZE_ORDERED
	seqcount_t		i_size_seqcount;
#endif

	/* Misc */
	unsigned long		i_state;
	struct mutex		i_mutex;

	unsigned long		dirtied_when;	/* jiffies of first dirtying */

	struct hlist_node	i_hash;
	struct list_head	i_wb_list;	/* backing dev IO list */
	struct list_head	i_lru;		/* inode LRU list */
	struct list_head	i_sb_list;
	union {
		struct hlist_head	i_dentry;
		struct rcu_head		i_rcu;
	};
	u64			i_version;
	atomic_t		i_count; //引用計數器
	atomic_t		i_dio_count;
	atomic_t		i_writecount;
#ifdef CONFIG_IMA
	atomic_t		i_readcount; /* struct files open RO */
#endif
	const struct file_operations	*i_fop;	/* former ->i_op->default_file_ops */
	struct file_lock	*i_flock;
	struct address_space	i_data;
	struct list_head	i_devices;
	union {
		struct pipe_inode_info	*i_pipe;
		struct block_device	*i_bdev;
		struct cdev		*i_cdev;
	};

	__u32			i_generation;

#ifdef CONFIG_FSNOTIFY
	__u32			i_fsnotify_mask; /* all events this inode cares about */
	struct hlist_head	i_fsnotify_marks;
#endif

	void			*i_private; /* fs or device private pointer */
};

索引節點物件包含了核心在操作檔案或目錄時需要的全部資訊。對於unix風格的檔案系統來說，這些資訊可以從磁碟索引節點直接讀入（索引節點物件存在於記憶體中，而磁碟索引節點是實際存在於磁碟中的資料結構）。但如果檔案系統沒有索引節點，則檔案系統就必須從中提取這些資訊（用於操作檔案或目錄的資訊）。沒有索引節點的檔案系統通常將檔案描述資訊作為檔案的一部分來存放。這些檔案系統與unix風格的檔案系統不同，沒有將資料與控制資訊分開存放。但不管控制資訊如何存放，索引節點必須在記憶體中建立，以便於檔案系統使用。一個索引節點代表檔案系統中（但是索引節點僅當檔案被訪問時，才在記憶體中建立）的一個檔案。

比較重要的欄位有三個：

        unsigned long        i_state;
        const struct inode_operations	*i_op;
	struct super_block	*i_sb;

逐個看下，首先是i_state，表示索引節點物件對應的磁碟索引節點的狀態，對於檔案的操作首先會在索引節點物件中進行，進而根據更改後的狀態對磁碟索引節點進行修改。共存在以下幾種狀態：

#define I_DIRTY_SYNC		(1 << 0)
#define I_DIRTY_DATASYNC	(1 << 1)
#define I_DIRTY_PAGES		(1 << 2)
#define __I_NEW			3
#define I_NEW			(1 << __I_NEW)
#define I_WILL_FREE		(1 << 4)
#define I_FREEING		(1 << 5)
#define I_CLEAR			(1 << 6)
#define __I_SYNC		7
#define I_SYNC			(1 << __I_SYNC)
#define I_REFERENCED		(1 << 8)
#define __I_DIO_WAKEUP		9
#define I_DIO_WAKEUP		(1 << I_DIO_WAKEUP)
#define I_LINKABLE		(1 << 10)

#define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES) //該索引節點為“髒”，磁碟內容必須被更新

i_op就是索引節點物件所包含的所有操作，具體定義如下，還是位於/include/linux/fs.h。

struct inode_operations {
	struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
	void * (*follow_link) (struct dentry *, struct nameidata *);
	int (*permission) (struct inode *, int);
	struct posix_acl * (*get_acl)(struct inode *, int);

	int (*readlink) (struct dentry *, char __user *,int);
	void (*put_link) (struct dentry *, struct nameidata *, void *);

	int (*create) (struct inode *,struct dentry *, umode_t, bool);
	int (*link) (struct dentry *,struct inode *,struct dentry *);
	int (*unlink) (struct inode *,struct dentry *);
	int (*symlink) (struct inode *,struct dentry *,const char *);
	int (*mkdir) (struct inode *,struct dentry *,umode_t);
	int (*rmdir) (struct inode *,struct dentry *);
	int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
	int (*rename) (struct inode *, struct dentry *,
			struct inode *, struct dentry *);
	int (*rename2) (struct inode *, struct dentry *,
			struct inode *, struct dentry *, unsigned int);
	int (*setattr) (struct dentry *, struct iattr *);
	int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
	int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
	ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t);
	ssize_t (*listxattr) (struct dentry *, char *, size_t);
	int (*removexattr) (struct dentry *, const char *);
	int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
		      u64 len);
	int (*update_time)(struct inode *, struct timespec *, int);
	int (*atomic_open)(struct inode *, struct dentry *,
			   struct file *, unsigned open_flag,
			   umode_t create_mode, int *opened);
	int (*tmpfile) (struct inode *, struct dentry *, umode_t);
	int (*set_acl)(struct inode *, struct posix_acl *, int);

	/* WARNING: probably going away soon, do not use! */
} ____cacheline_aligned;

再來就是最後一個，檔案系統物件——i_sb（超級塊物件），先來看看它的定義：

struct super_block {
	struct list_head	s_list;		/* Keep this first */
	dev_t			s_dev;		/* search index; _not_ kdev_t */
	unsigned char		s_blocksize_bits;
	unsigned long		s_blocksize;
	loff_t			s_maxbytes;	/* Max file size */
	struct file_system_type	*s_type;
	const struct super_operations	*s_op;
	const struct dquot_operations	*dq_op;
	const struct quotactl_ops	*s_qcop;
	const struct export_operations *s_export_op;
	unsigned long		s_flags;
	unsigned long		s_iflags;	/* internal SB_I_* flags */
	unsigned long		s_magic;
	struct dentry		*s_root;
	struct rw_semaphore	s_umount;
	int			s_count;
	atomic_t		s_active;
#ifdef CONFIG_SECURITY
	void                    *s_security;
#endif
	const struct xattr_handler **s_xattr;

	struct list_head	s_inodes;	/* all inodes */
	struct hlist_bl_head	s_anon;		/* anonymous dentries for (nfs) exporting */
	struct list_head	s_mounts;	/* list of mounts; _not_ for fs use */
	struct block_device	*s_bdev;
	struct backing_dev_info *s_bdi;
	struct mtd_info		*s_mtd;
	struct hlist_node	s_instances;
	unsigned int		s_quota_types;	/* Bitmask of supported quota types */
	struct quota_info	s_dquot;	/* Diskquota specific options */

	struct sb_writers	s_writers;

	char s_id[32];				/* Informational name */
	u8 s_uuid[16];				/* UUID */

	void 			*s_fs_info;	/* Filesystem private info */
	unsigned int		s_max_links;
	fmode_t			s_mode;

	/* Granularity of c/m/atime in ns.
	   Cannot be worse than a second */
	u32		   s_time_gran;

	/*
	 * The next field is for VFS *only*. No filesystems have any business
	 * even looking at it. You had been warned.
	 */
	struct mutex s_vfs_rename_mutex;	/* Kludge */

	/*
	 * Filesystem subtype.  If non-empty the filesystem type field
	 * in /proc/mounts will be "type.subtype"
	 */
	char *s_subtype;

	/*
	 * Saved mount options for lazy filesystems using
	 * generic_show_options()
	 */
	char __rcu *s_options;
	const struct dentry_operations *s_d_op; /* default d_op for dentries */

	/*
	 * Saved pool identifier for cleancache (-1 means none)
	 */
	int cleancache_poolid;

	struct shrinker s_shrink;	/* per-sb shrinker handle */

	/* Number of inodes with nlink == 0 but still referenced */
	atomic_long_t s_remove_count;

	/* Being remounted read-only */
	int s_readonly_remount;

	/* AIO completions deferred from interrupt context */
	struct workqueue_struct *s_dio_done_wq;
	struct hlist_head s_pins;

	/*
	 * Keep the lru lists last in the structure so they always sit on their
	 * own individual cachelines.
	 */
	struct list_lru		s_dentry_lru ____cacheline_aligned_in_smp;
	struct list_lru		s_inode_lru ____cacheline_aligned_in_smp;
	struct rcu_head		rcu;

	/*
	 * Indicates how deep in a filesystem stack this SB is
	 */
	int s_stack_depth;
};

各種檔案系統都必須實現超級塊物件，該物件用於儲存特定檔案系統的資訊，通常對應於存放在磁碟特定扇區中的檔案系統超級塊或檔案系統控制塊。對於並非基於磁碟的檔案系統（如基於記憶體的檔案系統，比如sysfs），他們會在現場建立超級塊並將其儲存到記憶體中。

最後來看看超級塊物件操作，同樣定義於/include/linux/fs.h中。

struct super_operations {
   	struct inode *(*alloc_inode)(struct super_block *sb);
	void (*destroy_inode)(struct inode *);

   	void (*dirty_inode) (struct inode *, int flags);
	int (*write_inode) (struct inode *, struct writeback_control *wbc);
	int (*drop_inode) (struct inode *);
	void (*evict_inode) (struct inode *);
	void (*put_super) (struct super_block *);
	int (*sync_fs)(struct super_block *sb, int wait);
	int (*freeze_super) (struct super_block *);
	int (*freeze_fs) (struct super_block *);
	int (*thaw_super) (struct super_block *);
	int (*unfreeze_fs) (struct super_block *);
	int (*statfs) (struct dentry *, struct kstatfs *);
	int (*remount_fs) (struct super_block *, int *, char *);
	void (*umount_begin) (struct super_block *);

	int (*show_options)(struct seq_file *, struct dentry *);
	int (*show_devname)(struct seq_file *, struct dentry *);
	int (*show_path)(struct seq_file *, struct dentry *);
	int (*show_stats)(struct seq_file *, struct dentry *);
#ifdef CONFIG_QUOTA
	ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
	struct dquot **(*get_dquots)(struct inode *);
#endif
	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
	long (*nr_cached_objects)(struct super_block *, int);
	long (*free_cached_objects)(struct super_block *, long, int);
};

說了這麼多感覺我自己都亂了，把上面這四種物件做一個簡單總結吧。

超級塊物件：存放已安裝檔案系統的有關資訊。對於基於磁碟的檔案系統，這類物件通常對應於存放在磁碟上的檔案系統控制塊。
索引節點物件：存放關於具體檔案的一般資訊。對於基於磁碟的檔案系統，這類物件通常對應於存放在磁碟上的檔案控制塊。每個索引節點物件都有一個索引節點號，這個節點號唯一地標識檔案系統中的檔案。
檔案物件：存放開啟檔案與程序之間進行互動的有關資訊。這類資訊僅當程序訪問檔案期間存在於核心記憶體中。也即檔案物件在實際的檔案系統（與虛擬檔案系統相對）中沒有對應的映像。
目錄項物件：存放目錄項（也就是檔案的特定名稱）與對應檔案進行連結的有關資訊。目錄項物件在實際的檔案系統中同樣沒有對應的映像。

在研究檔案系統過程中還提到了“目錄項快取記憶體”，與之類似的還有“索引點快取記憶體”，以上兩種都屬於“磁碟快取記憶體”。“磁碟快取記憶體”屬於軟體機制，它允許核心將原本存在磁碟上的某些資訊儲存在RAM中，以便對這些資料的進一步訪問能快速進行，而不必慢速訪問磁碟本身。

與“磁碟快取記憶體”類似的概念還有“硬體快取記憶體”、“記憶體快取記憶體”，以後遇到了再詳細分析。

特別是第15頁的圖，通過一個例項對上述四種檔案系統物件之間的進行了一個圖解。

linux 檔案系統簡析

linux 檔案系統簡析

Linux檔案系統快照

Linux檔案系統層次標準

Linux 檔案系統的建立與掛載方法

Linux 檔案系統呼叫open七日遊（三）

linux檔案系統管理---分割槽掛載篇

Linux檔案系統呼叫open 七日遊（六）

linux檔案系統呼叫 open 七日遊（四）

Linux檔案系統只讀Read-only file system的解決方法

linux 檔案系統之superblock

嵌入式Linux檔案系統-jffs2，yaffs2，ubifs，ramfs，網路檔案系統，

Linux檔案系統許可權詳解

Linux檔案系統管理 parted分割槽命令

Linux檔案系統管理 swap分割槽及作用

Linux檔案系統管理開機自動掛載及fstab檔案修復

Linux學習筆記14——認識 Linux 檔案系統

buildroot 製作Linux檔案系統初級使用教程

Linux檔案系統及硬碟

Linux檔案系統層級結構

linux 檔案系統結構

linux 檔案系統簡析

相關推薦