1. 程式人生 > >Tcl 指令碼讀取複雜CSV檔案

Tcl 指令碼讀取複雜CSV檔案

用 tcl/tk 寫了個測試工具,需要用tcl 指令碼讀取csv 檔案。但複雜的csv 檔案中,每個單元格可能包含逗號,雙引號,換行符,雙引號中又有換行符等等情況,導致讀取困難。網上找到的一些例子,大多是逐個讀取單個字元,用了一段時間,感覺效率差了點。研究了一下,自己寫了 tcl 讀csv 檔案的程式碼,如下:

proc readCSV { channel { header 1 } { symbol , }} {
	set quote 0	
	set data [ split [ read $channel nonewline ] "\n" ]
	foreach line $data {
		set quote [ expr { $quote + [ regexp -all \" $line ]}]
		if { [ expr { $quote % 2 }] == "0" } {
			set quote 0
			append row_temp $line
			set row_temp [ split $row_temp , ]	
			foreach section $row_temp {
				set quote [ expr { $quote + [ regexp -all \" $section ]}]
				if { [ expr { $quote % 2 }] == "0" } {
					append cell_temp $section
					set cell_temp [ regsub {"(.*)"} $cell_temp {\1} ]
					lappend cell $cell_temp
					unset cell_temp
					set quote 0
				} else {
					append cell_temp $section$symbol
				}
			}
			lappend final [ regsub -all {""} $cell \" ]
			unset cell
			unset row_temp
		} else {
			append row_temp $line\n
		}
	}
	# generate array if needed, or return $final here
	set row [ llength $final ]
	set column [ llength [ lindex $final 0 ]]
	if { $header == 1 } {
		for { set i 0 } { $i < $row } { incr i } {		
			for { set j 0 } { $j < $column } { incr j } {
				set csvData([ lindex [ lindex $final 0 ] $j ],$i) [ lindex [ lindex $final $i ] $j ]
			}
		}
	} else {
		for { set i 0 } { $i < $row } { incr i } {		
			for { set j 0 } { $j < $column } { incr j } {
				set csvData($i,$j) [ lindex [ lindex $final $i ] $j ]
			}
		}
	}
	return [ array get csvData ]
}

函式返回一個數組,預設指定csv檔案中第一行作為Header,分隔符為",",可變更。

能夠處理csv檔案中包含的 ",", "'", "\n" 字元。

Example:

下面是以Header & line number的方式輸出某單元格資料:

set csv [ open c:/testcase.csv {RDWR} ]
array set csvData [ readCSV $csv ]
puts $csvData(Name,1)    ;# assume there is a cell containing "Name" at first row.

下面是以row number & line number方式輸出某單元格資料:

set csv [ open c:/testcase.csv {RDWR} ]
array set csvData [ readCSV $csv 0 ]
puts $csvData(3,1)   

Efficency:
經測試,處理 2000 x 4 容量的測試用例檔案,用時100ms左右。

-----------------------------------

CPU: Dual-Core 3.20GHz

Memory: 2G

System Type: 32bit

-----------------------------------

tcl 裡有個專門處理csv檔案的包,叫csv,對比了一下效率。如果同樣返回處理後的資料列表,這個函式處理速度會快一點。

csv package的使用方法:

package require csv
package require struct::queue

set csv [ open c:/testcase.csv {RDWR} ]

::struct::queue q
::csv::read2queue $csv q
set final [ q peek [ q size ]]
Cappacity readCSV csv package file size
2000*4 103ms 170ms 768KB
2000*8 200ms 335ms 1534KB
2000*16 382ms 770ms 3065KB
2000*32 760ms 2088ms 6127KB
2000*64 1501ms 6411ms 12252KB
2000*128 2995ms 21841ms 24501KB

Output:

所輸出的資料,與在Excel 中看到的csv 檔案內容相同。

類的形式:

package require Itcl

itcl::class readCSV {
	common final
	common anchor 1
	constructor { path } {
		set quote 0
		set channel [ open $path {RDWR} ]
		set data [ split [ read $channel nonewline ] "\n" ]
		close $channel
			foreach line $data {
				set quote [ expr { $quote + [ regexp -all \" $line ]}]
				if { [ expr { $quote % 2 }] == "0" } {
					set quote 0
					append row_temp $line
					set row_temp [ split $row_temp , ]	
					foreach section $row_temp {
						set quote [ expr { $quote + [ regexp -all \" $section ]}]
						if { [ expr { $quote % 2 }] == "0" } {
							append cell_temp $section
							set cell_temp [ regsub {"(.*)"} $cell_temp {\1} ]
							lappend cell $cell_temp
							unset cell_temp
							set quote 0
						} else {
							append cell_temp $section,
						}
					}
					lappend final [ regsub -all {""} $cell \" ]
					unset cell
					unset row_temp
				} else {
					append row_temp $line\n
				}
			}
	}
	
	method getCell { row col } {
		return [ lindex [ lindex $final $row ] $col ]
	}
	
	method getValue { header } {
		set col [ lsearch [ lindex $final 0 ] $header ]
		return [ getCell $anchor $col ]
	}
	
	method next { } {
		if { [ done ] == 0 } {
			incr anchor
		}
	}
	
	method pre { } {
		if { $anchor > 1 } {
			incr anchor -1
		}
	}
	
	method end { } {
		set anchor [ expr {[ llength $final ]-1}]
	}
	
	method done { } {
		if { $anchor == [ expr {[ llength $final ]-1} ]} {
			return 1
		} else {
			return 0
		}
	}
	
	method reset { } {
		set anchor 1
	}
	
}	
Name Age Address
Zhang_san 13 Address1:
1. aaaaa
2. aaad "bbbb",
3. bacad,
adfa"aaa".
Li_si 14 Address2, xxxx
aaaa"
bbbbb".,
Wang_wu 15 Address3
Example:
readCSV f c:/csvfile.csv
f getValue Name
output:

Zhang_san

f next
f getValue Name
output:

Li_si

f pre
f getValue Name
f end
f getValue Name
f getCell 1 0
output:

Zhang_san

Wang_wu

Zhang_san