Section 3 文件IO
starting: 2017/12/12
3.1 introduction
describing the function available for file I/O.Most file I/O on Unix system can be performed using only five functions: open, wirte, lseek, and close. Then we examine the effect of various buffer size on the read and write functions.Then atomic operation becomes important, when we describe the sharing of resources among multiple processes.
3.2 File Descriptors
one: all open file are referred to by file descriptors,to the kernel. two: A file descriptor is a non-negative integer. Three: we identify the file with the file descriptor that was return by open or creat as an argument to either read or write. Four: open or create a new file the kernel return a file descriptor to the process.
Unix System associate file descriptor 0 with the standard input of a process, 1 with the standard output, 2 with the standard error. the magic numbers 0,1,2 was replaced with symbolic constants STDIN_FILENO, STDOUT_FILENO and STDERR_FILENO deifned in <unistd.h> to improve readability.
file descriptor range from 0 through OPEN_MAX
3.3 open and openat fucntions
a file is opend or created by calling either the open or openat function.
#include<fcntl.h>
int open(const char* path, int flag, ... /** mode_t mode **/);
int openat(int fd, const char* path, int flag, ... /** mode_t mode **/);
the function has a multitude options, which are specified by the flag argument. This argument is formed by ORing together one or more constants from <fcntl.h> header.
CONSTANTS: one and only one : O_RDONLY, O_WRONLY, O_RDWR, O_EXEC(執行權限), O_SEARCH(applies to directory).
the following constants are optional: O_APPEND, O_CLOEXEC, O_CREAT(requires a third argument to the open fucntion- the mode, which specifies the access permission bits of the new file) O_EXCL(Generate an errno when O_CREAT is specified. This is an atomic operation)
return the lowest-numbered unused descriptor
1 #include<fcntl.h> 2 #include<stdio.h> 3 #include<errno.h> 4 5 int main() 6 { 7 int fd = open("1.c",O_WRONLY|OCREAT|O_EXCL, 0666); 8 printf("errno = %d\n", errno); // O_EXCL 保證 原子操作 9 }
OPENAT: 1. an absolute path the fd is ignored .And behaves like the open fucntion. 2. a relative path, the fd specifies the starting location in the file system where the relative path is to be evaluated. 3. the fd is AT_FDCWD, the pathename is evaluated satring in the current working directory and behaves like the open function.
3.3 filename and pathname truncation
Question: what happens if NAME_MAX 14 and we try to create a new file in the current directory with a filename containing 15 characters. silently truncating the filename beyond the 14th character. or return an errno.with POSIX.1 the constant _POSIX_NO_TRUNC determines whether long filenames and long components of pathname are truncated or an errno is returned. we use fpathconf or pathconf to query a directory to see which behavior is supported.
1 #include<stdio.h> 2 #include<unistd.h> 3 int main(int argc, char**argv) 4 { 5 if(argc != 2) 6 { 7 printf("usage a.out <dirname>\n"); 8 return 0; 9 } 10 else 11 { 12 printf("filename : %s/n",argv[1]); 13 14 } 15 #ifdef _POSIX_NO_TRUNC 16 printf("_POSIX_NO_TRUNC value : %d\n", _POSIX_NO_TRUNC); 17 #else 18 printf("not supported!!!\n"); 19 #endif 20 21 #ifdef _PC_PATH_MAX 22 int max_pathname_num = pathconf(argv[1], _PC_PATH_MAX); 23 printf("max_num: %d\n", max_pathname_num); 24 #else 25 printf("not supported too!!!\n"); 26 #endif 27 return 0; 28 }
3.4 creat function
#iinclude<fcntl.h>
int creat(const char* path, mode_t mode); returns: file descriptor opened for write_only if OK, -1 on error.
Note that this fucntion is equivalent to int open(path, O_WRONLY|O_CREAT|O_TRUNC, mode). one deficiency with creat is that the file is opened only for writing.a better way to use open as in:
int open(path, O_RDWR|O_CREAT|O_TRUNC, mode);
3.5 close function
#include<unistd.h>
int close(int fd); return 0 if OK, -1 on error.
when a process terminates, all of its open files are closed automaticlly by the kernel.
3.6 lseek function
#include<unistd.h>
off_t lseek(int fd, off_t offset, int whence); return: new file offset if OK, -1 on error.
wheence: SEEK_SET(from the begining of the file), SEEK_CUR( current value plus the offset which can be negative or positive), SEEK_END
because the lseek return new file offset,we can seek zero offset bytes from the current position to determine the current position.Don‘t cause any I/O to take place.the offset is used by the next read or write operation.
when the file‘s offset is greater than the file current size,the next write to the file will extend the file. in this case, it will cause a hole in the file, But do not allocate disk blocks for the data hole
_POSIX_V7_ILP32_OFF32 sysconf(_SC_V7_ILP32_OFF32);
3.7 read function read an opened file
#include<unistd.h>
ssize_t read(int fd, void* buffer /** generic pointer**/, size_t nbytes); return number of bytes read,0 if end of the file, -1 on error.
3.8 write function write an opened file
#include<unistd.h>
ssize_t write(int fd, const void* buffer, size_t nbytes); return number of bytes writen if OK, -1 on error. A common cause for a write error is either filling a disk or exceeding the file size limit for a given process.
3.9 io efficinecy
1 #include<unistd.h> 2 #include<stdio.h> 3 #define BUFFSIZE 4096 4 int main() 5 { 6 int n; 7 char buffer[BUFFSIZE]; 8 while((n = read(STDIN_FILENO, buffer, BUFFSIZE)) > 0) 9 if(write(STDOUT_FILENO, buffer, n) == -1) 10 printf("write error!!!\n"); 11 if(n < 0) 12 printf("read error!!!\n"); 13 }
some caveats apply to this program:
One:it use standard input and output to read and write . the user can redirect them. Two: when the process terminates, the kernal close all open file descriptor in a process. Three:there is no difference between the text and birnary file for the UNIX kernal.
let‘s run the program using different values for BUFFSIZE. In this book, when the size is 4096,increasing the buffsize thee systime time has little positive effect. most file systems support some kind of read-ahead to improve performance. The system try to read more date than an application requests.
3.10 file sharing
why: The UNIX System supports the sharing of open files among different process. Solution: the kernal use three data structures to represent an open file.
1.Process: process table entry (a) The file descrioptor flags (b) A pointer to a file table entry
2.The kernal: file table entry (a) file status flags (b) current file offset (c) v-node pointer
3.V-node structure: (a) contains the type of file and pointers to function that operate on the file. (b) contains an i-node for the file.
i-node structrue:contains the owner of the file, the pointers to where the actual data blocks for the file are located on disk,and so on.
Case : if two indenpent process have the same file open,
:each process table entry has its own file table entry (each process has its own current offset for the file),but only one a single V-node table entry is required for a given file.
Case: more than one file descriptor entry to the same file table entry. this also happen after fork when the parent and child share the sanme file table entry for each open file.
Case: Note the difference between the file descriptor flags and the file status flags. the former apply only to a single descriptor in a signer process, whereas the latter apply to all descriptors in any process that point to the given file table entry.
3.11 atomic operations
there is always the posibility that the kernal might temporarily suspend the process between the two function calls.
The single UNIX Specification includes two fucntions that allows applications to seek and perform I/O atomically: pread and pwrite.
case 1: lseek and read or write as an atomic operation.
#include<unistd.h>
ssize_t pread(int fd, void* buf, size_t nbytes, off_t offset)
ssize_t pwrite(int fd, const void* buf, size_t nbytes, off_t offset) return -1 if not ok
case 2:creating a file
O_CREAT,O_EXCL/ ** test and create **/
1 if( (fd = open(path, O_WRONLY) ) < 0 ) 2 if( ENOENT == errno) 3 if( (fd = creat(path, mode)) < 0 ) 4 printf(" creat fail \n"); 5 else 6 printf(" open fail \n");
if the operation is performed atomically, either all the steps are performed( on success) or none are performed(on failture).
3.12 dup amd dup2 function duplication
#include<unistd.h>
int dup(int fd);
int dup2(int fd, int fd2); return new file descriptor if OK -1 on error
the close-on-exec file descriptor flag for the new descriptor is always clear by thr dup functions.
3.13 sync,fsync, and fdatasync function
Traditional implemention of the UNIX System have a buffer cache or page cache in the kernal. DELAY WRITE.
#include<unistd.h> // synchronize/ consistency
int fsync(int fd); // 確保數據寫到了磁盤上
int fdatasync(int fd);/** only data portions of a file **/ return 0 if ok -1 on error
void sync(int fd);
3.14 fcntl function
The fcntl function can changee the properties of a file that is already open.
#include<fcntl.h> file control
int fcntl(int fd, int cmd,.../* int arg */); // return -1 on error
The fcntl function is used for five different purposes.
case 1: Duplicate an existing descriptor( cmd = F_DUPFD or cmd = F_DUPFD_CLOEXEC) // The new descriptor clear or not FD_CLOEXEC file descriptor flg
case 2: Get /Set file descriptor flags( cmd = F_GETFD or cmd = FSETFD) // Get fd flag only one file descriptor flag is defined : the FD_CLOEXEC flag
case 3: Get/Set file status flags( cmd = F_GETFL or cmd = F_SETFL)// File status flg : O_RDONLY ... THe only file status flag can be changed O_APPEND , O_SYNC, O_DSYNC
O_RSYNC except other five status // O_SYNC 同步寫到磁盤,根據不同的系統情況不一樣
case 4: Get/Set asynchronous I/O ownership( cmd = F_GETOWN or cmd = F_SETOWN)
case 5: Get/Set record locks( cmd = F_GETLK, F_SETLK, or F_SETLKW)
1 #include<fcntl.h> 2 #include<stdio.h> 3 4 void set_fl(int fd, int flags) 5 { 6 int val; 7 if((val = fcntl(fd, FD_GETFL, 0)) < 0) 8 printf("fcntl error\n"); 9 val &=flag; // val |= flags; 10 11 }
3.15 ioctl function
#include<unistd.h> /** System V **/
#include<sys/ioctl.h> /** liunx and BSD **/
int ioctl( int fd, int request, .....); -1 on error, something else if OK. there only one more argument , it is usually a pointer to a variable or a struct. beyond basic operation.
3.16 /dev/fd
Opening the file /dev/fd/n is equivalent to duplicating descriptor n, assuming that descriptor n is open.mode only is the first mode subset
用 /dev/fd/0 做 creat 函數 會得到file descriptor 但是不能讀寫,mode 只能設置先前mode的子集
Question:
1只是用戶沒有緩沖區
Section 3 文件IO