Tcl 指令碼讀取複雜CSV檔案
阿新 • • 發佈:2019-01-01
用 tcl/tk 寫了個測試工具,需要用tcl 指令碼讀取csv 檔案。但複雜的csv 檔案中,每個單元格可能包含逗號,雙引號,換行符,雙引號中又有換行符等等情況,導致讀取困難。網上找到的一些例子,大多是逐個讀取單個字元,用了一段時間,感覺效率差了點。研究了一下,自己寫了 tcl 讀csv 檔案的程式碼,如下:
proc readCSV { channel { header 1 } { symbol , }} { set quote 0 set data [ split [ read $channel nonewline ] "\n" ] foreach line $data { set quote [ expr { $quote + [ regexp -all \" $line ]}] if { [ expr { $quote % 2 }] == "0" } { set quote 0 append row_temp $line set row_temp [ split $row_temp , ] foreach section $row_temp { set quote [ expr { $quote + [ regexp -all \" $section ]}] if { [ expr { $quote % 2 }] == "0" } { append cell_temp $section set cell_temp [ regsub {"(.*)"} $cell_temp {\1} ] lappend cell $cell_temp unset cell_temp set quote 0 } else { append cell_temp $section$symbol } } lappend final [ regsub -all {""} $cell \" ] unset cell unset row_temp } else { append row_temp $line\n } } # generate array if needed, or return $final here set row [ llength $final ] set column [ llength [ lindex $final 0 ]] if { $header == 1 } { for { set i 0 } { $i < $row } { incr i } { for { set j 0 } { $j < $column } { incr j } { set csvData([ lindex [ lindex $final 0 ] $j ],$i) [ lindex [ lindex $final $i ] $j ] } } } else { for { set i 0 } { $i < $row } { incr i } { for { set j 0 } { $j < $column } { incr j } { set csvData($i,$j) [ lindex [ lindex $final $i ] $j ] } } } return [ array get csvData ] }
函式返回一個數組,預設指定csv檔案中第一行作為Header,分隔符為",",可變更。
能夠處理csv檔案中包含的 ",", "'", "\n" 字元。
Example:
下面是以Header & line number的方式輸出某單元格資料:
set csv [ open c:/testcase.csv {RDWR} ]
array set csvData [ readCSV $csv ]
puts $csvData(Name,1) ;# assume there is a cell containing "Name" at first row.
下面是以row number & line number方式輸出某單元格資料:
set csv [ open c:/testcase.csv {RDWR} ]
array set csvData [ readCSV $csv 0 ]
puts $csvData(3,1)
Efficency:
經測試,處理 2000 x 4 容量的測試用例檔案,用時100ms左右。
-----------------------------------
CPU: Dual-Core 3.20GHz
Memory: 2G
System Type: 32bit
-----------------------------------
tcl 裡有個專門處理csv檔案的包,叫csv,對比了一下效率。如果同樣返回處理後的資料列表,這個函式處理速度會快一點。
csv package的使用方法:
package require csv
package require struct::queue
set csv [ open c:/testcase.csv {RDWR} ]
::struct::queue q
::csv::read2queue $csv q
set final [ q peek [ q size ]]
Cappacity | readCSV | csv package | file size |
---|---|---|---|
2000*4 | 103ms | 170ms | 768KB |
2000*8 | 200ms | 335ms | 1534KB |
2000*16 | 382ms | 770ms | 3065KB |
2000*32 | 760ms | 2088ms | 6127KB |
2000*64 | 1501ms | 6411ms | 12252KB |
2000*128 | 2995ms | 21841ms | 24501KB |
Output:
所輸出的資料,與在Excel 中看到的csv 檔案內容相同。
類的形式:
package require Itcl
itcl::class readCSV {
common final
common anchor 1
constructor { path } {
set quote 0
set channel [ open $path {RDWR} ]
set data [ split [ read $channel nonewline ] "\n" ]
close $channel
foreach line $data {
set quote [ expr { $quote + [ regexp -all \" $line ]}]
if { [ expr { $quote % 2 }] == "0" } {
set quote 0
append row_temp $line
set row_temp [ split $row_temp , ]
foreach section $row_temp {
set quote [ expr { $quote + [ regexp -all \" $section ]}]
if { [ expr { $quote % 2 }] == "0" } {
append cell_temp $section
set cell_temp [ regsub {"(.*)"} $cell_temp {\1} ]
lappend cell $cell_temp
unset cell_temp
set quote 0
} else {
append cell_temp $section,
}
}
lappend final [ regsub -all {""} $cell \" ]
unset cell
unset row_temp
} else {
append row_temp $line\n
}
}
}
method getCell { row col } {
return [ lindex [ lindex $final $row ] $col ]
}
method getValue { header } {
set col [ lsearch [ lindex $final 0 ] $header ]
return [ getCell $anchor $col ]
}
method next { } {
if { [ done ] == 0 } {
incr anchor
}
}
method pre { } {
if { $anchor > 1 } {
incr anchor -1
}
}
method end { } {
set anchor [ expr {[ llength $final ]-1}]
}
method done { } {
if { $anchor == [ expr {[ llength $final ]-1} ]} {
return 1
} else {
return 0
}
}
method reset { } {
set anchor 1
}
}
Name | Age | Address |
---|---|---|
Zhang_san | 13 | Address1: 1. aaaaa 2. aaad "bbbb", 3. bacad, adfa"aaa". |
Li_si | 14 | Address2, xxxx aaaa" bbbbb"., |
Wang_wu | 15 | Address3 |
readCSV f c:/csvfile.csv
f getValue Name
output:
Zhang_san
f next
f getValue Name
output:
Li_si
f pre
f getValue Name
f end
f getValue Name
f getCell 1 0
output:
Zhang_san
Wang_wu
Zhang_san