用 shell 指令碼做日誌清洗
問題的提出
公司有一個使用者行為分析系統,可以記錄使用者在使用公司產品過程中的一系列操作軌跡,便於分析產品使用情況以便優化產品 UI 介面佈局。這套系統有點類似於 Google Analyse(GA),所不同的是,為了讓使用者把產品用起來,公司會將使用者行為與優惠活動聯絡起來,例如購買產品後一段時間內如果使用時長達到一定標準,就能從銷售那裡領取小禮品,初衷是為了讓使用者把產品用起來。可是這個活動一經推出,我這邊電話就被打爆了,因為經常有使用者感覺自己在用產品了,但是銷售給他反饋的時長資訊卻沒有增長。於是實施同事會要求我們去排查問題,可能很多情況下,就是使用者沒聯網、或者防火牆設定的太嚴格了導致資料上不來、甚至是後臺服務掛了導致資料沒及時分析……其實 90% 以上的問題和客戶端沒關係,而是後臺在某個環節丟失了資料,但是作為查問題的第一個環節,客戶端開發往往會被頂到前線充當 call center 的角色,浪費大量個人時間不說,效率也是極低的。於是自然而然就會想到,能不能做一個分析工具,自動從日誌裡提出關鍵資料,做成直觀的圖表展示給實施人員,可以一眼就能定位出是客戶端還是後臺的問題,從而達成初步排 (甩) 查 (鍋) 的目的?
問題的解決
要能從日誌中提取出行為資料(專業術語稱為埋點)並根據時間順序繪製一幅活躍圖的根基,是日誌中要有相應的記錄,巧的是,之前為了排查問題,已經在日誌中輸出了大量這樣的資訊:
2020-08-31 10:15:12.531 2248 2564 [1] ########################################### 2020-08-31 10:15:12.531 2248 2564 [2] Service starting ! 2020-08-31 10:15:14.668 2248 2564 [1] collecting OS information... 2020-08-31 10:15:14.949 2248 2564 [1] TimeZone: 480 2020-08-31 10:15:14.949 2248 2564 [1] CountryCode: 86 2020-08-31 10:15:14.949 2248 2564 [1] OSLanugage Name: 1033 2020-08-31 10:15:14.980 2248 2564 [1] Version: 6.1.7601 2020-08-31 10:15:14.980 2248 2564 [1] OSArchitectur: 64-bit 2020-08-31 10:15:14.980 2248 2564 [1] OperatingSystemSKU: 1 2020-08-31 10:15:14.980 2248 2564 [2] Start to init Db. 2020-08-31 10:15:14.980 2248 2564 [2] local device id: 7aaa811b-f720-46b9-bdf3-e6b186cdcc77 2020-08-31 10:15:14.980 2248 2564 [2] sqlite3 database threadsafe = 1 2020-08-31 10:15:15.043 2248 2564 [2] Init Db done. 2020-08-31 10:15:15.043 2248 2564 [2] Start Event Worker. 2020-08-31 10:15:15.043 2248 2564 [2] Start App Worker. 2020-08-31 10:15:15.043 2248 2564 [2] Start Trace Worker. 2020-08-31 10:15:15.043 2248 2564 [2] HB Started. 2020-08-31 10:15:15.043 2248 2564 [1] Local Server Start work done. 2020-08-31 10:15:15.043 2248 3888 [2] Local Server started. 2020-08-31 10:15:15.043 2248 2564 [2] delete files: C:\ProgramData\Glodon\GUX\GCI\\GUX_*.session 2020-08-31 10:15:15.043 2248 2564 [2] start to read strategy from DB 2020-08-31 10:15:15.043 2248 2564 [2] add item 'GMD2017' to aggregation list 2020-08-31 10:15:15.043 2248 2564 [2] add item 'GMJ2017' to aggregation list 2020-08-31 10:15:15.043 2248 2564 [2] add item 'GFYCM2017' to aggregation list 2020-08-31 10:15:15.043 2248 2564 [2] add item 'GAQ2017' to aggregation list 2020-08-31 10:15:15.043 2248 2564 [2] add item 'GJH2017' to aggregation list 2020-08-31 10:15:15.043 2248 2564 [2] add item 'GMJ' to aggregation list 2020-08-31 10:15:15.043 2248 2564 [2] add item 'GBS2017' to aggregation list 2020-08-31 10:15:15.043 2248 2564 [2] add item 'GBCB2017' to aggregation list 2020-08-31 10:15:15.043 2248 2564 [2] add item 'GFYC2017' to aggregation list 2020-08-31 10:15:15.043 2248 2564 [2] add item 'GSJ2017' to aggregation list 2020-08-31 10:15:15.043 2248 2564 [2] add item 'GFYC' to aggregation list 2020-08-31 10:15:15.043 2248 2564 [2] add item 'GFYQ' to aggregation list 2020-08-31 10:15:15.043 2248 2564 [2] Read Local App Strategy Done. 2020-08-31 10:15:15.043 2248 2564 [2] Start GMTService done.Version=2.11.0.1777 2020-08-31 10:15:15.043 2248 3724 [2] Adjust Time thread start. 2020-08-31 10:15:15.058 2248 3076 [2] gcm started 2020-08-31 10:15:15.058 2248 3076 [2] Start to Get Strategy. 2020-08-31 10:15:15.058 2248 1748 [2] GUX Online failed, code=4 2020-08-31 10:15:15.089 2248 424 [2] GUX Online succeed. 2020-08-31 10:15:15.121 2248 3076 [1] http get: /v3/server_status?type=102&data_version=2.4, result 200 2020-08-31 10:15:15.121 2248 3076 [4] WinHttpQueryHeaders failed, flag = 0x20000005, errno = 12150 2020-08-31 10:15:15.121 2248 3076 [2] query content length failed, try with transfer-encoding.. 2020-08-31 10:15:15.121 2248 3076 [2] Start to process received message 102 2020-08-31 10:15:15.121 2248 3076 [2] Recv POST strategy. 2020-08-31 10:15:15.121 2248 3076 [2] start post: 1, period: 300 2020-08-31 10:15:15.401 2248 3076 [1] http get: /v3/server_status?type=103&data_version=2.4, result 200 2020-08-31 10:15:15.401 2248 3076 [4] WinHttpQueryHeaders failed, flag = 0x20000005, errno = 12150 2020-08-31 10:15:15.401 2248 3076 [2] query content length failed, try with transfer-encoding.. 2020-08-31 10:15:15.401 2248 3076 [2] Start to process received message 103 2020-08-31 10:15:15.401 2248 3076 [2] Recv TIME strategy. 2020-08-31 10:15:15.401 2248 3076 [2] Time shift: 0 2020-08-31 10:15:15.464 2248 3076 [1] http get: /v3/server_status?type=100&data_version=2.4&md5=ba16886ef2cf522245ad6f0dd20380c958e834ff7bb54d7a2d46fbf416806462, result 200 2020-08-31 10:15:15.464 2248 3076 [4] WinHttpQueryHeaders failed, flag = 0x20000005, errno = 12150 2020-08-31 10:15:15.464 2248 3076 [2] query content length failed, try with transfer-encoding.. 2020-08-31 10:15:15.776 2248 3076 [2] Start to process received message 100 2020-08-31 10:15:15.776 2248 3076 [2] Recv APP strategy. 2020-08-31 10:15:15.776 2248 3076 [2] GrandDog [140]: 1, 1, 0 2020-08-31 10:15:15.776 2248 3076 [2] CubicostTRB [178]: 1, 1, 0 2020-08-31 10:15:15.776 2248 3076 [2] GTJ2017 [78]: 1, 1, 0 2020-08-31 10:15:15.776 2248 3076 [2] GMD2017 [137]: 1, 1, 1 2020-08-31 10:15:15.776 2248 3076 [2] GDraw [180]: 1, 1, 0 2020-08-31 10:15:15.776 2248 3076 [2] GLC [276]: 1, 1, 0 2020-08-31 10:15:15.776 2248 3076 [2] GUX [164]: 1, 1, 0 2020-08-31 10:15:15.776 2248 3076 [2] GCCP5 [67]: 1, 1, 0 2020-08-31 10:15:15.776 2248 3076 [2] GCCP6 [261]: 1, 1, 0 2020-08-31 10:15:15.776 2248 3076 [2] TME [17]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GWS [25]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] MOZIDIFFER [36]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GMJ [40]: 1, 0, 1 2020-08-31 10:15:15.776 2248 3076 [2] GCL2013 [44]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GGJ2013 [45]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] MD_GMA [56]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GDQ2015 [75]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GQI2017 [76]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GJG2015 [77]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GMP2016 [80]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] Revit2GFC4GMP [83]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GTJ2017CAD [100]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GYZB2017 [112]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] BIM5D_PC [114]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GFYCM [115]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GBCB [125]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] CubicostTAS [128]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GMD [129]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GAQ2017 [131]: 1, 0, 1 2020-08-31 10:15:15.776 2248 3076 [2] GBCB2017 [132]: 1, 0, 1 2020-08-31 10:15:15.776 2248 3076 [2] GBS2017 [133]: 1, 0, 1 2020-08-31 10:15:15.776 2248 3076 [2] GFYC2017 [134]: 1, 0, 1 2020-08-31 10:15:15.776 2248 3076 [2] GFYCM2017 [135]: 1, 0, 1 2020-08-31 10:15:15.776 2248 3076 [2] GMJ2017 [136]: 1, 0, 1 2020-08-31 10:15:15.776 2248 3076 [2] GSJ2017 [138]: 1, 0, 1 2020-08-31 10:15:15.776 2248 3076 [2] GJH2017 [139]: 1, 0, 1 2020-08-31 10:15:15.776 2248 3076 [2] TeamViewer [142]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] ZPert [148]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GBS [160]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GIR_C [162]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] TBQ2017 [163]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GYJC2017 [167]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GSXGZT2016 [177]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] TBQD [181]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] TTED [182]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] TCFD [183]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GSCApp [188]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GFYC [200]: 1, 0, 1 2020-08-31 10:15:15.776 2248 3076 [2] GDQ2017 [207]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GO [217]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] AppGbmp [218]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GQI2018 [222]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GDS2017 [226]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GLDTCS [228]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] TenderGo [231]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GDQ2018 [232]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] SectionManual [233]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] BeamGo [234]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GJG2018 [235]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] RevitViewer [236]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] BIM5D_PC_TEST [237]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] BIM5D_PC_TRIAL [238]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GEC5 [239]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GFYQ [240]: 1, 0, 1 2020-08-31 10:15:15.776 2248 3076 [2] RoadDesigner [241]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] CECS100G [242]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GBES [243]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] Ceshi [244]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] dpUpdate [245]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GFY4 [246]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GGPT [248]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GMA2020 [249]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] JZYK [250]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GVB5 [251]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GHW5 [252]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GUp [253]: 1, 0, 1 2020-08-31 10:15:15.776 2248 3076 [2] BIM_COST [254]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GICP5 [255]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] bim5d_basic [256]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GWH5 [257]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GFY4_2019 [258]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GDD2019 [259]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GCCP5_ShanDong_64 [260]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GSC6 [262]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GCCP6_WP [263]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GEB6 [264]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GSH6 [265]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GTech2019 [266]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GPC5 [267]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GTJ2021 [268]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GDE2019 [269]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] CubicostTIO [270]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GCA5 [271]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GLC5 [272]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GMT5 [273]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GCN5 [274]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GHC5 [275]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GVB6 [277]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GJG2021 [278]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GJG [279]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GAP [280]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GSTP [281]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] TRS2021 [283]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] TMEC [284]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] CubicostTMEC [285]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GGF5 [286]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GRE5 [287]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [2] GA_CloudPlugin [310]: 1, 0, 0 2020-08-31 10:15:15.776 2248 3076 [1] Parse App-Id Done. 2020-08-31 10:15:15.869 2248 3076 [2] start to read strategy from DB 2020-08-31 10:15:15.869 2248 3076 [2] add item 'GMD2017' to aggregation list 2020-08-31 10:15:15.869 2248 3076 [2] add item 'GMJ' to aggregation list 2020-08-31 10:15:15.869 2248 3076 [2] add item 'GAQ2017' to aggregation list 2020-08-31 10:15:15.869 2248 3076 [2] add item 'GBCB2017' to aggregation list 2020-08-31 10:15:15.869 2248 3076 [2] add item 'GBS2017' to aggregation list 2020-08-31 10:15:15.869 2248 3076 [2] add item 'GFYC2017' to aggregation list 2020-08-31 10:15:15.869 2248 3076 [2] add item 'GFYCM2017' to aggregation list 2020-08-31 10:15:15.869 2248 3076 [2] add item 'GMJ2017' to aggregation list 2020-08-31 10:15:15.869 2248 3076 [2] add item 'GSJ2017' to aggregation list 2020-08-31 10:15:15.869 2248 3076 [2] add item 'GJH2017' to aggregation list 2020-08-31 10:15:15.869 2248 3076 [2] add item 'GFYC' to aggregation list 2020-08-31 10:15:15.869 2248 3076 [2] add item 'GFYQ' to aggregation list 2020-08-31 10:15:15.869 2248 3076 [2] add item 'GUp' to aggregation list 2020-08-31 10:15:15.869 2248 3076 [2] Read Local App Strategy Done. 2020-08-31 10:15:15.947 2248 3076 [1] http get: /v3/server_status?type=101&data_version=2.4, result 200 2020-08-31 10:15:15.947 2248 3076 [4] WinHttpQueryHeaders failed, flag = 0x20000005, errno = 12150 2020-08-31 10:15:15.947 2248 3076 [2] query content length failed, try with transfer-encoding.. 2020-08-31 10:15:16.275 2248 3076 [2] Start to process received message 101 2020-08-31 10:15:16.275 2248 3076 [2] Recv KEY strategy. 2020-08-31 10:15:16.275 2248 2984 [1] start posting local information... 2020-08-31 10:15:16.275 2248 2984 [1] User: yunh, Domain: WIN-AS1KN0S7NAJ 2020-08-31 10:15:16.431 2248 2984 [1] User Profile: 2020-08-31 10:15:16.525 2248 2984 [2] load user profile OK 2020-08-31 10:15:16.525 2248 2984 [2] impersonate logged on user OK 2020-08-31 10:15:16.525 2248 2984 [2] revert to self OK 2020-08-31 10:15:16.525 2248 2984 [2] dpi user = yunh, current = 96 2020-08-31 10:15:16.805 2248 2984 [2] dpi infact = 96 2020-08-31 10:15:16.805 2248 2984 [2] dpi scale = 100% 2020-08-31 10:15:16.805 2248 2984 [4] Excute SQL Error: [19] constraint failed SQL=insert into DpiSettingTbl (user, infact, setting, scale) values('yunh','96','96','100') 2020-08-31 10:15:16.805 2248 2984 [3] insert dpi setting for yunh failed, try updating 2020-08-31 10:15:16.868 2248 2984 [1] 2020-08-31 10:15:16.868 2248 2984 [1] environment strings: 2020-08-31 10:15:16.868 2248 2984 [1] ALLUSERSPROFILE=C:\ProgramData 2020-08-31 10:15:16.868 2248 2984 [1] APPDATA=C:\Users\yunh\AppData\Roaming 2020-08-31 10:15:16.868 2248 2984 [1] CommonProgramFiles=C:\Program Files (x86)\Common Files 2020-08-31 10:15:16.868 2248 2984 [1] CommonProgramFiles(x86)=C:\Program Files (x86)\Common Files 2020-08-31 10:15:16.868 2248 2984 [1] CommonProgramW6432=C:\Program Files\Common Files 2020-08-31 10:15:16.868 2248 2984 [1] COMPUTERNAME=WIN-AS1KN0S7NAJ 2020-08-31 10:15:16.868 2248 2984 [1] ComSpec=C:\Windows\system32\cmd.exe 2020-08-31 10:15:16.868 2248 2984 [1] FP_NO_HOST_CHECK=NO 2020-08-31 10:15:16.868 2248 2984 [1] GLODON_COMMON_APPDATA=C:\ProgramData 2020-08-31 10:15:16.868 2248 2984 [1] GLODON_COMMON_DESKTOP=C:\Users\Public\Desktop 2020-08-31 10:15:16.868 2248 2984 [1] GLODON_COMMON_DOCUMENT=C:\Users\Public\Documents 2020-08-31 10:15:16.868 2248 2984 [1] GLODON_COMMON_FILES_32=C:\Program Files (x86)\Common Files 2020-08-31 10:15:16.868 2248 2984 [1] GLODON_COMMON_FILES_64=C:\Program Files\Common Files 2020-08-31 10:15:16.868 2248 2984 [1] GLODON_PROGRAM_FILES_32=C:\Program Files (x86) 2020-08-31 10:15:16.868 2248 2984 [1] GLODON_PROGRAM_FILES_64=C:\Program Files 2020-08-31 10:15:16.868 2248 2984 [1] GLODON_SYSTEM_32=C:\Windows\SysWOW64 2020-08-31 10:15:16.868 2248 2984 [1] GLODON_SYSTEM_64=C:\Windows\system32 2020-08-31 10:15:16.868 2248 2984 [1] GLODON_SYSTEM_DRIVE=C: 2020-08-31 10:15:16.868 2248 2984 [1] GLODON_WINDOWS_DIR=C:\Windows 2020-08-31 10:15:16.868 2248 2984 [1] HOMEDRIVE=C: 2020-08-31 10:15:16.868 2248 2984 [1] HOMEPATH=\Users\yunh 2020-08-31 10:15:16.868 2248 2984 [1] LOCALAPPDATA=C:\Users\yunh\AppData\Local 2020-08-31 10:15:16.868 2248 2984 [1] LOGONSERVER=\\WIN-AS1KN0S7NAJ 2020-08-31 10:15:16.868 2248 2984 [1] NUMBER_OF_PROCESSORS=1 2020-08-31 10:15:16.868 2248 2984 [1] OS=Windows_NT 2020-08-31 10:15:16.868 2248 2984 [1] Path=C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\ 2020-08-31 10:15:16.868 2248 2984 [1] PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC 2020-08-31 10:15:16.868 2248 2984 [1] PROCESSOR_ARCHITECTURE=AMD64 2020-08-31 10:15:16.868 2248 2984 [1] PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 60 Stepping 3, GenuineIntel 2020-08-31 10:15:16.868 2248 2984 [1] PROCESSOR_LEVEL=6 2020-08-31 10:15:16.868 2248 2984 [1] PROCESSOR_REVISION=3c03 2020-08-31 10:15:16.868 2248 2984 [1] ProgramData=C:\ProgramData 2020-08-31 10:15:16.868 2248 2984 [1] ProgramFiles=C:\Program Files (x86) 2020-08-31 10:15:16.868 2248 2984 [1] ProgramFiles(x86)=C:\Program Files (x86) 2020-08-31 10:15:16.868 2248 2984 [1] ProgramW6432=C:\Program Files 2020-08-31 10:15:16.868 2248 2984 [1] PSModulePath=C:\Windows\system32\WindowsPowerShell\v1.0\Modules\ 2020-08-31 10:15:16.868 2248 2984 [1] PUBLIC=C:\Users\Public 2020-08-31 10:15:16.868 2248 2984 [1] SystemDrive=C: 2020-08-31 10:15:16.868 2248 2984 [1] SystemRoot=C:\Windows 2020-08-31 10:15:16.868 2248 2984 [1] TEMP=C:\Users\yunh\AppData\Local\Temp 2020-08-31 10:15:16.868 2248 2984 [1] TMP=C:\Users\yunh\AppData\Local\Temp 2020-08-31 10:15:16.868 2248 2984 [1] USERDOMAIN=WIN-AS1KN0S7NAJ 2020-08-31 10:15:16.868 2248 2984 [1] USERNAME=yunh 2020-08-31 10:15:16.868 2248 2984 [1] USERPROFILE=C:\Users\yunh 2020-08-31 10:15:16.868 2248 2984 [1] windir=C:\Windows 2020-08-31 10:15:16.868 2248 2984 [1] windows_tracing_flags=3 2020-08-31 10:15:16.868 2248 2984 [1] windows_tracing_logfile=C:\BVTBin\Tests\installpackage\csilogfile.log 2020-08-31 10:15:16.868 2248 2984 [1] 2020-08-31 10:15:21.673 2248 2984 [2] first got hwid: {C4F752B9-EF31-580A-B75C-67E300000000} 2020-08-31 10:15:21.673 2248 2984 [2] first got ssid: 2020-08-31 10:15:21.673 2248 2984 [1] local info:{ "metrics": { "os": "Windows", "os_version": "7", "os_version1": "6.1.7601", "os_edition": "Ultimate Edition", "dev_type": "PC", "resolution": "1920x1080", "sysbits": "64", "cpu_sum": "Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz 3.60GHz", "total_mem": "2048MB", "language": "English (United States)", "pctype": "unknown", "language_id": "1033", "timezone": "8", "dpiscale": "100" }, "mem": { "total_mem": "2048MB" }, "videocards": [ { "name": "VMware SVGA 3D", "manufacturer": "VMware, Inc.", "chipset": "VMware Virtual SVGA 3D Graphics Adapter", "dac_type": "n\/a", "memory": "1024MB", "inuse": "1", "pixelbits": "32", "monitor_count": "1", "driverversion": "8.15.1.33", "driverdate": "20151016000000.000000-000", "pnpdeviceid": "PCI\\VEN_15AD&DEV_0405&SUBSYS_040515AD&REV_00\\3&2B8E0B4B&0&78", "caption": "VMware SVGA 3D", "frequency": "60Hz", "resolution": "1920x1080" } ], "cpu": { "name": "Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz", "manufacturer": "GenuineIntel", "physicalcores": "1", "logicalcores": "1", "frequency": "3.60GHz", "address_width": "64", "voltage": "33" }, "baseboard": { "manufacturer": "Intel Corporation", "product": "440BX Desktop Reference Platform", "sn": "None" }, "dpisetting": [ { "user": "yunh", "infact": "96", "setting": "96", "scale": "100" } ], "msg_type": "7", "app_name": "GUX", "app_version": "2.11.0.1777", "device_id": "7aaa811b-f720-46b9-bdf3-e6b186cdcc77", "mt_devid": "{C4F752B9-EF31-580A-B75C-67E300000000}", "ssid": "", "timestamp": "1598840121673", "sessionid": "", "extra": "" } 2020-08-31 10:15:21.673 2248 2984 [2] Read device info succeed. 2020-08-31 10:15:21.673 2248 2984 [2] local hardware changed, prepare to post system information 2020-08-31 10:15:21.673 2248 2984 [2] first got ssid: 2020-08-31 10:15:21.766 2248 2984 [1] http post: /v3/do, result 200 2020-08-31 10:15:21.766 2248 2984 [1] 6 : finish 2020-08-31 10:15:21.766 2248 2984 [2] Post local information succeed.. 2020-08-31 10:15:21.766 2248 2984 [1] collectLocalInfoData end... 2020-08-31 10:15:44.650 2248 3888 [2] local client accepted. 2020-08-31 10:15:44.681 2248 2936 [2] Send sync msg to Client, socketNo=1, len=46 2020-08-31 10:15:44.681 2248 2936 [2] GDraw start, appid = B84c955afdfe63ae1acf0e1a45f6dcc9e4fb64a3, inst = 1473 2020-08-31 10:15:44.697 2248 3800 [2] add event 3 (1) 2020-08-31 10:15:44.697 2248 3360 [4] Excute SQL Error: [21] library routine called out of sequence SQL=insert into B84c955afdfe63ae1acf0e1a45f6dcc9e4fb64a3(type, key , count , data, timestamp ) values('5','啟動','1','{"dognum":"","fname":"Æô¶¯","fngroup":"VisDraw","gid":"","pcode":"-110000","query":{},"sessionid":"3680"} ','1598840144697') 2020-08-31 10:15:44.697 2248 3360 [1] add action GDraw.啟動 (0) 2020-08-31 10:15:44.697 2248 3360 [2] create action table B84c955afdfe63ae1acf0e1a45f6dcc9e4fb64a3 (1). 2020-08-31 10:15:44.712 2248 3360 [2] insert 1 record into db by transaction elapsed 10 ms 2020-08-31 10:15:44.837 2248 3360 [1] add action GDraw.新建檔案 (1) 2020-08-31 10:15:44.837 2248 3360 [2] insert 1 record into db by transaction elapsed 1 ms 2020-08-31 10:15:45.071 2248 3076 [2] Save Timer Expired.Start to Save Aggregation Data. 2020-08-31 10:15:48.254 2248 3360 [1] add action GDraw.新增弧 (1) 2020-08-31 10:15:48.254 2248 3360 [2] insert 1 record into db by transaction elapsed 2 ms 2020-08-31 10:15:53.683 2248 3360 [1] add action GDraw.新增長方形 (1) 2020-08-31 10:15:53.683 2248 3360 [2] insert 1 record into db by transaction elapsed 2 ms 2020-08-31 10:15:57.349 2248 3360 [1] add action GDraw.新增圓 (1) 2020-08-31 10:15:57.349 2248 3360 [2] insert 1 record into db by transaction elapsed 2 ms 2020-08-31 10:16:01.389 2248 1992 [2] GLC heart beat timeout, appid = B21170b9e920c9c0fb57374ee21d0b71af1adf3d, inst = 2 2020-08-31 10:16:01.389 2248 1992 [2] GDraw heart beat timeout, appid = Bb3e5dfebfa0f02d987a93865bc9aed3e81dee09, inst = 3 2020-08-31 10:16:01.389 2248 3800 [2] add event 4 (1) 2020-08-31 10:16:01.389 2248 3800 [2] add event 4 (1) 2020-08-31 10:16:03.854 2248 3360 [1] add action GDraw.新增圓 (1) 2020-08-31 10:16:03.869 2248 3360 [2] insert 1 record into db by transaction elapsed 3 ms 2020-08-31 10:16:10.843 2248 3360 [1] add action GDraw.新增橢圓 (1) 2020-08-31 10:16:10.843 2248 3360 [2] insert 1 record into db by transaction elapsed 3 ms 2020-08-31 10:16:14.290 2248 3360 [1] add action GDraw.新增多邊形 (1) 2020-08-31 10:16:14.290 2248 3360 [2] insert 1 record into db by transaction elapsed 2 ms 2020-08-31 10:16:19.251 2248 3360 [1] add action GDraw.新增文字 (1) 2020-08-31 10:16:19.251 2248 3360 [2] insert 1 record into db by transaction elapsed 3 ms 2020-08-31 10:16:24.274 2248 3360 [1] add action GDraw.新增直線 (1) 2020-08-31 10:16:24.274 2248 3360 [2] insert 1 record into db by transaction elapsed 3 ms 2020-08-31 10:16:37.363 2248 3360 [1] add action GDraw.設定文字 (1) 2020-08-31 10:16:37.378 2248 3360 [2] insert 1 record into db by transaction elapsed 3 ms 2020-08-31 10:16:37.378 2248 3360 [1] add action GDraw.設定文字 (1) 2020-08-31 10:16:37.378 2248 3360 [1] add action GDraw.設定文字 (1) 2020-08-31 10:16:37.394 2248 3360 [2] insert 2 record into db by transaction elapsed 17 ms 2020-08-31 10:16:41.325 2248 3360 [1] add action GDraw.置底橢圓 (1) 2020-08-31 10:16:41.325 2248 3360 [1] add action GDraw.置底文字 (1) 2020-08-31 10:16:41.325 2248 3360 [1] add action GDraw.置底直線 (1) 2020-08-31 10:16:41.341 2248 3360 [2] insert 3 record into db by transaction elapsed 3 ms 2020-08-31 10:16:45.943 2248 3360 [1] add action GDraw.移動橢圓 (1) 2020-08-31 10:16:45.943 2248 3360 [2] insert 1 record into db by transaction elapsed 2 ms 2020-08-31 10:16:49.390 2248 3360 [1] add action GDraw.刪除圓 (1) 2020-08-31 10:16:49.390 2248 3360 [2] insert 1 record into db by transaction elapsed 2 ms 2020-08-31 10:16:52.963 2248 3360 [1] add action GDraw.刪除圓 (1) 2020-08-31 10:16:52.978 2248 3360 [2] insert 1 record into db by transaction elapsed 3 ms 2020-08-31 10:16:55.162 2248 3360 [1] add action GDraw.貼上圓 (1) 2020-08-31 10:16:55.162 2248 3360 [2] insert 1 record into db by transaction elapsed 2 ms 2020-08-31 10:16:59.873 2248 3360 [1] add action GDraw.調整圓 (1) 2020-08-31 10:16:59.873 2248 3360 [2] insert 1 record into db by transaction elapsed 2 ms 2020-08-31 10:17:04.475 2248 3360 [1] add action GDraw.調整長方形 (1) 2020-08-31 10:17:04.475 2248 3360 [2] insert 1 record into db by transaction elapsed 4 ms 2020-08-31 10:17:07.798 2248 3360 [1] add action GDraw.調整多邊形 (1) 2020-08-31 10:17:07.798 2248 3360 [2] insert 1 record into db by transaction elapsed 3 ms 2020-08-31 10:17:11.511 2248 3360 [1] add action GDraw.調整弧 (1) 2020-08-31 10:17:11.511 2248 3360 [2] insert 1 record into db by transaction elapsed 2 ms 2020-08-31 10:17:16.019 2248 3360 [1] add action GDraw.調整直線 (1) 2020-08-31 10:17:16.035 2248 3360 [2] insert 1 record into db by transaction elapsed 3 ms 2020-08-31 10:17:44.973 2248 3360 [1] add action GDraw.設定文字 (1) 2020-08-31 10:17:44.973 2248 3360 [1] add action GDraw.設定文字 (1) 2020-08-31 10:17:44.973 2248 3360 [1] add action GDraw.設定文字 (1) 2020-08-31 10:17:44.989 2248 3360 [2] insert 3 record into db by transaction elapsed 7 ms 2020-08-31 10:17:48.140 2248 3360 [1] add action GDraw.調整文字 (1) 2020-08-31 10:17:48.156 2248 3360 [2] insert 1 record into db by transaction elapsed 3 ms 2020-08-31 10:17:55.924 2248 3360 [1] add action GDraw.刪除文字 (1) 2020-08-31 10:17:55.924 2248 3360 [2] insert 1 record into db by transaction elapsed 3 ms 2020-08-31 10:18:07.188 2248 3360 [1] add action GDraw.儲存檔案 (1) 2020-08-31 10:18:07.188 2248 3360 [2] insert 1 record into db by transaction elapsed 4 ms 2020-08-31 10:18:08.326 2248 3360 [1] add action GDraw.關閉檔案 (1) 2020-08-31 10:18:08.326 2248 3360 [2] insert 1 record into db by transaction elapsed 2 ms 2020-08-31 10:18:08.576 2248 3888 [4] local client disconnected: End of file 2020-08-31 10:18:08.576 2248 2936 [2] GDraw stop, appid = B84c955afdfe63ae1acf0e1a45f6dcc9e4fb64a3, inst = 1473 2020-08-31 10:18:08.576 2248 3360 [1] add action GDraw.退出 (1) 2020-08-31 10:18:08.576 2248 3360 [2] insert 1 record into db by transaction elapsed 3 ms 2020-08-31 10:18:08.576 2248 3800 [2] add event 4 (1) ……
可以看到關鍵日誌的格式類似如下:
2020-08-31 10:17:48.140 2248 3360 [1] add action GDraw.調整文字 (1)
分別對應的欄位是:
日期 時間(帶毫秒) 程序號 執行緒號 [日誌級別] add action 產品標識.操作 (入隊結果)
所以我們大體可以分兩步完成這個工作,一是提取關鍵日誌,二是根據資料繪製活躍圖。
提取日誌資料
因為日誌裡夾雜了太多不相關的資料,我們需要先進行一波清洗,這個是 sed 的拿手好戲:
$ sed -n '/add action .*\..*/p' gux_log.1.txt 2020-08-31 10:15:44.697 2248 3360 [1] add action GDraw.啟動 (0) 2020-08-31 10:15:44.837 2248 3360 [1] add action GDraw.新建檔案 (1) 2020-08-31 10:15:48.254 2248 3360 [1] add action GDraw.新增弧 (1) 2020-08-31 10:15:53.683 2248 3360 [1] add action GDraw.新增長方形 (1) 2020-08-31 10:15:57.349 2248 3360 [1] add action GDraw.新增圓 (1) 2020-08-31 10:16:03.854 2248 3360 [1] add action GDraw.新增圓 (1) 2020-08-31 10:16:10.843 2248 3360 [1] add action GDraw.新增橢圓 (1) 2020-08-31 10:16:14.290 2248 3360 [1] add action GDraw.新增多邊形 (1) 2020-08-31 10:16:19.251 2248 3360 [1] add action GDraw.新增文字 (1) 2020-08-31 10:16:24.274 2248 3360 [1] add action GDraw.新增直線 (1) 2020-08-31 10:16:37.363 2248 3360 [1] add action GDraw.設定文字 (1) 2020-08-31 10:16:37.378 2248 3360 [1] add action GDraw.設定文字 (1) 2020-08-31 10:16:37.378 2248 3360 [1] add action GDraw.設定文字 (1) 2020-08-31 10:16:41.325 2248 3360 [1] add action GDraw.置底橢圓 (1) 2020-08-31 10:16:41.325 2248 3360 [1] add action GDraw.置底文字 (1) 2020-08-31 10:16:41.325 2248 3360 [1] add action GDraw.置底直線 (1) 2020-08-31 10:16:45.943 2248 3360 [1] add action GDraw.移動橢圓 (1) 2020-08-31 10:16:49.390 2248 3360 [1] add action GDraw.刪除圓 (1) 2020-08-31 10:16:52.963 2248 3360 [1] add action GDraw.刪除圓 (1) 2020-08-31 10:16:55.162 2248 3360 [1] add action GDraw.貼上圓 (1) 2020-08-31 10:16:59.873 2248 3360 [1] add action GDraw.調整圓 (1) 2020-08-31 10:17:04.475 2248 3360 [1] add action GDraw.調整長方形 (1) 2020-08-31 10:17:07.798 2248 3360 [1] add action GDraw.調整多邊形 (1) 2020-08-31 10:17:11.511 2248 3360 [1] add action GDraw.調整弧 (1) 2020-08-31 10:17:16.019 2248 3360 [1] add action GDraw.調整直線 (1) 2020-08-31 10:17:44.973 2248 3360 [1] add action GDraw.設定文字 (1) 2020-08-31 10:17:44.973 2248 3360 [1] add action GDraw.設定文字 (1) 2020-08-31 10:17:44.973 2248 3360 [1] add action GDraw.設定文字 (1) 2020-08-31 10:17:48.140 2248 3360 [1] add action GDraw.調整文字 (1) 2020-08-31 10:17:55.924 2248 3360 [1] add action GDraw.刪除文字 (1) 2020-08-31 10:18:07.188 2248 3360 [1] add action GDraw.儲存檔案 (1) 2020-08-31 10:18:08.326 2248 3360 [1] add action GDraw.關閉檔案 (1) 2020-08-31 10:18:08.576 2248 3360 [1] add action GDraw.退出 (1) 2020-08-31 10:20:14.546 2248 3360 [1] add action GDraw.timer_duration (0)
接下來要從清洗後的日誌中提取日期、時間、程序號、產品標識、操作 5 個欄位到該產品標識對應的檔案中,這個是 awk 的分內活兒:
sed -n '/add action *.*/p' gux_log.1.txt | awk -F "[. ]+" '{ print $1,$2,$4,$9,$10 >> "plot/"$9".txt"; a[$9]++ } END { for (i in a) print i " : " a[i] }'
因為產品標識與操作是用 '.' 號分隔的,所以要在分隔符中加入點號 (預設為空格),不過這樣一來,就會把時間和毫秒也分成兩個欄位,所以我們要提取的欄位下標(從 1 開始)就分別變成了日期(1)、時間(2)、程序號(4)、產品標識(9)、操作(10)。提取到的資料行會寫入對應的產品標識命名的檔案中 ("plot/GDraw.txt"),注意這裡的 $9 不能包含在引號中,否則不起作用 (awk 語法補習一下)。並遞增對應的產品標識埋點資料,最後將所有的產品統計資訊打印出來:
GDraw : 34
同時在 plot 子目錄下可以找到以 GDraw 命名的資料檔案 (GDraw.txt):
$ cat plot/GDraw.txt 2020-08-31 10:15:44 2248 GDraw 啟動 2020-08-31 10:15:44 2248 GDraw 新建檔案 2020-08-31 10:15:48 2248 GDraw 新增弧 2020-08-31 10:15:53 2248 GDraw 新增長方形 2020-08-31 10:15:57 2248 GDraw 新增圓 2020-08-31 10:16:03 2248 GDraw 新增圓 2020-08-31 10:16:10 2248 GDraw 新增橢圓 2020-08-31 10:16:14 2248 GDraw 新增多邊形 2020-08-31 10:16:19 2248 GDraw 新增文字 2020-08-31 10:16:24 2248 GDraw 新增直線 2020-08-31 10:16:37 2248 GDraw 設定文字 2020-08-31 10:16:37 2248 GDraw 設定文字 2020-08-31 10:16:37 2248 GDraw 設定文字 2020-08-31 10:16:41 2248 GDraw 置底橢圓 2020-08-31 10:16:41 2248 GDraw 置底文字 2020-08-31 10:16:41 2248 GDraw 置底直線 2020-08-31 10:16:45 2248 GDraw 移動橢圓 2020-08-31 10:16:49 2248 GDraw 刪除圓 2020-08-31 10:16:52 2248 GDraw 刪除圓 2020-08-31 10:16:55 2248 GDraw 貼上圓 2020-08-31 10:16:59 2248 GDraw 調整圓 2020-08-31 10:17:04 2248 GDraw 調整長方形 2020-08-31 10:17:07 2248 GDraw 調整多邊形 2020-08-31 10:17:11 2248 GDraw 調整弧 2020-08-31 10:17:16 2248 GDraw 調整直線 2020-08-31 10:17:44 2248 GDraw 設定文字 2020-08-31 10:17:44 2248 GDraw 設定文字 2020-08-31 10:17:44 2248 GDraw 設定文字 2020-08-31 10:17:48 2248 GDraw 調整文字 2020-08-31 10:17:55 2248 GDraw 刪除文字 2020-08-31 10:18:07 2248 GDraw 儲存檔案 2020-08-31 10:18:08 2248 GDraw 關閉檔案 2020-08-31 10:18:08 2248 GDraw 退出 2020-08-31 10:20:14 2248 GDraw timer_duration
過濾掉了不相關的內容,顯得清爽多了。這還僅僅是一個產品的埋點資料,如果有多個產品的話,就可以看到更有趣的一些輸出,例如:
GDraw : 93 GTJ2021 : 17 GCCP5 : 70 GCCP6 : 154
每個產品會單獨統計埋點數,同時在 plot 目錄下生成每個產品的資料檔案。
不同的日誌格式
以為上面就解決了日誌清洗的所有問題?No!因為程式碼的不斷的調整,曾經出現過三種格式的日誌,下面分別列舉:
v2 [1] 2019-09-19 00:39:24 10772 5992 add action GDQ2018.合併 (1) v3 2020-06-11 13:38:34.666 17272 17712 [1] add action GCCP5.timer_duration (1) v3.1 2020-06-11 13:38:34.666 17272 17712 gmtsvc [1] add action GCCP5.timer_duration (1)
具體說我們剛才處理的日誌其實是 v3 版本的,另外還有 v2 版本,這個是最老的日誌,與 v3 的區別是日誌級別是放在最前面的,時間沒有毫秒值;還有 v3.1 版本,這個是最新的日誌,與 v3 的區別是日誌級別之前加了一個模組名稱欄位。這些欄位順序的調整與額外欄位的增加,會導致我們讀取時需要選擇的欄位下標不同,例如下面的 awk 語句分別對應上面三種版本的日誌:
# v2 $ sed -n '/add action *.*/p' gux_log.1.txt | awk -F "[. ]+" '{ print $2,$3,$4,$8,$9 >> "plot/"$8".txt"; a[$8]++ } END { for (i in a) print i " : " a[i] }' # v3 $ sed -n '/add action *.*/p' gux_log.1.txt | awk -F "[. ]+" '{ print $1,$2,$4,$9,$10 >> "plot/"$9".txt"; a[$9]++ } END { for (i in a) print i " : " a[i] }' # v3.1 $ sed -n '/add action *.*/p' gux_log.1.txt | awk -F "[. ]+" '{ print $1,$2,$4,$10,$11 >> "plot/"$10".txt"; a[$10]++ } END { for (i in a) print i " : " a[i] }'
它們的輸出都是一樣的。好在 awk 支援條件判斷,我們完全可以在一條語句裡進行不同版本的日誌處理:
$ sed -n '/add action *.*/p' gux_log.1.txt | awk -F "[. ]+" '{ if ( $1 ~ /\[[0-9]]/ ) { print $2,$3,$4,$8,$9 >> "plot/"$8".txt"; a[$8]++ } else if ( $9 == "action" ) { print $1,$2,$4,$10,$11 >> "plot/"$10".txt"; a[$10]++ } else { print $1,$2,$4,$9,$10 >> "plot/"$9".txt"; a[$9]++ } } END { for (i in a) print i " : " a[i] }'
被壓扁在一行了,單獨提取 awk 的部分列在下面看著更清晰一些:
1 { 2 if ( $1 ~ /\[[0-9]]/ ) { print $2,$3,$4,$8,$9 >> "plot/"$8".txt"; a[$8]++ } 3 else if ( $9 == "action" ) { print $1,$2,$4,$10,$11 >> "plot/"$10".txt"; a[$10]++ } 4 else { print $1,$2,$4,$9,$10 >> "plot/"$9".txt"; a[$9]++ } 5 } 6 7 END { for (i in a) print i " : " a[i] }
新增的部分主要在條件判斷的地方,如果第 1 個欄位符合正則式則走第一個條件的語句,這條正則式是說“方括號括起來的數字”,就是[1]、[2]、[3]……啦,這是用來匹配 v2 日誌的;如果第 9 個欄位為 “action",則走第二個條件的語句,這是用來匹配 v3.1 日誌的;剩下的 else 就當然是用來區別 v3 日誌的啦。這樣一來即使一個檔案中有不同版本的日誌,也能輕鬆處理。
繪製埋點活躍圖
其實經過日誌清洗後,得到的各產品”純純“的資料,就已經可以交差了,因為沒有什麼是比資料更準確的啦。但是考慮到我面對的人可能是連 Windows 服務在哪裡控制都找不到的人,我決定更進一步,通過各產品的資料繪製一幅直觀的埋點圖,來給使用者一個傻瓜式的體驗(誰叫我懶),先上效果圖:
這個圖的橫軸表示時間,縱軸表示會話,可以認為一個產品啟動到關閉整個過程為一次會話 (通過程序 ID 值代表)。如果對應的時間段有資料點的話,就表示該產品在那個時間活躍過。以上圖為例,如果我們想看 GCCP5 這個產品在哪天活躍過,就可以在圖中查詢圖例對應顏色的點,可以看到,大部分集中在 6.10-6.13 這個區間,其它時間段比較少。又例如,我想檢視 8.21-8.27 之間有無產品活躍,從圖中看到這段時間區間是一片空白,所以可以確定使用者在這個時間段沒有使用公司的產品。那麼這張圖是怎麼生成的呢,這就需要用到 gnuplot 了,其實我之前寫過的文章有對這個工具使用方法的詳細說明,感興趣的可以參考這篇《檢視部落格園積分與排名趨勢圖的工具》。下面直接上程式碼,首先是 shell 指令碼部分 (guxplot.sh):
1 #! /bin/bash 2 arg="logs='" 3 logs=$(cd plot; find . -type f -name "*.txt") 4 if [ -z "$logs" ]; then 5 echo "no data stripped, exit plotting!" 6 exit 7 fi 8 9 for log in $logs 10 do 11 # skip ./ 12 tmp=${log:2} 13 # emit .txt 14 tmp=${tmp%.*} 15 echo $tmp 16 arg="$arg$tmp " 17 done 18 19 arg="$arg'" 20 gnuplot -e "logpath='$1'" -e "$arg" ./gux.plt 21 # mspaint only know \ other than / 22 #mspaint "plot/gux.png" 23 mspaint "plot\\gux.png" &
這個指令碼的核心是遍歷之前生成資料的 plot 子目錄,將其中 .txt 結尾的資料檔案取出,依次放入 arg 引數中,將其作為一個引數 (logs) 傳遞給 gnuplot 指令碼 (gux.plt) 去執行。另外它還有一個輸入引數 ($1),也作為一個引數 (logpath) 傳遞給了 gnuplot 指令碼。下面來看繪製程式碼 (gux.plt):
1 #! /usr/bin/gnuplot 2 set terminal png size 1080,720 3 set title logpath 4 set output "plot/gux.png" 5 set grid 6 7 set xdata time 8 set timefmt "%Y-%m-%d %H:%M:%S" 9 set format x "%m/%d" 10 set xtic rotate by 315 # 270 + 45 11 set xtics 86400*3 # three day a big tic 12 set mxtics 3 # every day a small tic 13 14 # plot in double Y axises 15 set xlabel "day(s)" 16 set ylabel "session(pid)" 17 18 #plot "log.txt" using 2:4 with points pt 1 title "score" 19 plot for [log in logs] "plot/".log.".txt" using 1:3 w points pt 5 title log 20 21 quit
簡單解讀一下:
- 進行一些基本設定 (line 2-6);
- 設定橫座標 (line 7-12),包括設定時間軸、時間格式、橫座標刻度等等;
- 設定座標名稱 (line 15,16);
- 最後一條語句完成所有繪製 (line 19)。
重點說明一下最後一條,正常的資料繪製是如 line 18 所示,有輸入資料 (log.txt)、有使用的資料列 (using 2:4)、有展現形式 (with points pt 1)、有圖例名稱 (score)。它和 19 行的區別在於,後者是一條迴圈語句,它遍歷 logs 引數中的所有值並將當前值存放在 log 變數中 (其實就是產品標識啦),後面它用這個變數拼接出了輸入的資料來源 (例如 plot/GDraw.txt,注意 gnuplot 是使用 '.' 來連線字串與變數的);最後還用這個變數命名了圖例的名稱 (title log);其它和 18 行沒有什麼區別。如果對上一個指令碼 guxplot.sh 還有印象的話,它一共傳遞給這個指令碼兩個引數:logs 和 logpath,剛剛說明的是前者,後者是用來設定標題的,這樣就能清楚得知資料來源。
把一切串起來
關鍵指令碼搞清楚之後,我們就可以做總成了,下面是驅動”一切“的那個指令碼 (guxstrip.sh) :
1 #! /bin/bash 2 use_local=1 3 gdpdir=$(pwd) 4 logs=$(find . -type f -name "gux_log.*.txt") 5 if [ -z "$logs" ]; then 6 use_local=0 7 echo "no log found under current directory, try system..." 8 gdpdir="/cygdrive/c/ProgramData/Glodon/GUX/3.0" 9 iswinxp=$(Wmic OS Get Caption | grep "XP") 10 if [ ! -z "$iswinxp" ]; then 11 echo "winxp detected!" 12 gdpdir="/cygdrive/c/Documents and Settings/All Users/Application Data/Glodon/GUX/3.0" 13 fi 14 15 logs=$(find "$gdpdir" -type f -name "gux_log.*.txt") 16 if [ -z "$logs" ]; then 17 echo "no log found, nothing to do, quit" 18 exit; 19 fi 20 fi 21 22 n=1 23 sel=-1 24 echo "0: all" 25 for log in $logs 26 do 27 echo "$n: $log $(stat -c '%s' $log) bytes" 28 n=$(($n+1)) 29 done 30 31 until [ $sel -ge 0 ] && [ $sel -lt $n ] 32 do 33 echo -n "which log do you want to strip: (0) " 34 read sel 35 if [ -z "$sel" ]; then 36 sel=0 37 fi 38 done 39 40 rm -rf plot 41 mkdir plot 42 43 n=0 44 for log in $logs 45 do 46 if (( $sel == 0 )) || (( $n == $sel - 1 )); then 47 echo "handle log: $log" 48 echo "handle log: $log" >> "plot/plot.log" 49 sed -n '/add action .*\..*/p' "$log" > "plot/gux.data" 50 if [ ! -s "plot/gux.data" ]; then 51 echo "no valid data in that file, skip" 52 n=$(($n+1)) 53 continue; 54 fi 55 # 2.0 log format 56 # 57 # [1] 2019-09-19 00:39:24 10772 5992 add action GDQ2018.合併 (1) 58 # 59 # 3.0 log format v1 60 # 61 # 2020-06-11 13:38:34.666 17272 17712 [1] add action GCCP5.timer_duration : xxx 62 # 63 # 3.0 log format v2 64 # 65 # 2020-06-11 13:38:34.666 17272 17712 gmtsvc [1] add action GCCP5.timer_duration : xxx 66 # 67 # to adapt all of them 68 awk -F "[. ]+" '{ if ( $1 ~ /\[[0-9]]/ ) { print $2,$3,$4,$8,$9 >> "plot/"$8".txt"; a[$8]++ } else if ( $9 == "action" ) { print $1,$2,$4,$10,$11 >> "plot/"$10".txt"; a[$10]++ } else { print $1,$2,$4,$9,$10 >> "plot/"$9".txt"; a[$9]++ } } END { for (i in a) print i " : " a[i] }' "plot/gux.data" >> "plot/plot.log" 69 70 # prepare the remote file for later compress 71 if [ "$use_local" -eq 0 ]; then 72 cp "$log" "./plot/${log##*/}.log" 73 fi 74 fi 75 n=$(($n+1)) 76 done 77 78 logs=$(find plot -type f -name "*.txt") 79 if [ -z "$logs" ]; then 80 echo "no data stripped, exit plotting!" 81 exit 82 fi 83 84 echo -e "\nstart ploting" 85 bash.exe guxplot.sh "$gdpdir" # 2>>error.txt 86 87 # last pack them up! 88 echo -e "\nstart packing" 89 rm plot.tar.gz 90 rm "plot/gux.data" 91 92 # tar can NOT find gzip in our situation 93 # so use them separately 94 #tar cvzf plot.tar.gz plot 95 tar cvf plot.tar plot 96 gzip plot.tar 97 98 echo "work done!"
簡單解讀一下:
- 首先判斷輸入日誌來源 (line 2-20),如果在系統目錄下找不到日誌,就在本目錄查詢,這樣便於直接分析獲取到的日誌,而不是去使用者機器上裝一個工具。如果都找不到,就沒得可分析,直接退出。這裡系統的話由於WinXP 和 Win7+ 存放的目錄不一樣,所以要區分一下,注意這個指令碼雖然是 shell 指令碼,但是是執行在 Windows 系統上的,所以一些 windows 的工具和命令也是可以直接呼叫的 (Wmic);
- 接著將查詢到符合條件的日誌名稱 一 一 列出,供使用者選擇,使用者可以直接回車 (或輸入 0) 表示處理所有日誌輸入,也可以輸入一個序號,表示只處理序號對應的日誌。當用戶只關心最近的日誌資料的話,可以使用後面這個功能來縮小處理的日誌時間範圍,使生成的活躍圖更精確一些;
- 分別處理各個日誌 (line 43-76),用到了我們前面介紹的 sed 與 awk,注意新增資料時使用的是追加 (>>),這樣可以避免覆蓋前面生成的資料。同時如果日誌不在當前目錄下面,會自動將日誌也複製到打包目錄,方便萬一有問題時繼續進行分析;
- 如果沒有任何資料檔案生成 (line 78-82),表示日誌中沒有有效的資料,沒得分析,退出;
- 繪製活躍圖 (line 84,85);
- 將所有的資料和圖表壓縮成一個 tar 包 (line 87-96),方便後續提交。這時裡本來想用 'tar cvzf‘ 一步生成,結果 tar 老是報 gzip 找不到,雖然我已經提供了這個 exe,後來拆分成兩步 (tar + gzip) 就好了,可能是 tar 有它自己的查詢方式,在移植到 windows 環境後被破壞掉了。
下面是這個指令碼的執行截圖:
執行結束後會自動彈出埋點活躍圖。
結語
其實前面稍微提到了一點,就是我是在 Windows 環境下執行 shell 指令碼的,使用的是 msys2,一種類似 cygwin 的東東,我前幾篇文章都有提到過。最終這個日誌清洗工具做好後,我只需要把我用到的 msys2 命令提取出來,就可以在新的機器上運行了,既不用安裝,也不用註冊,完全是個”綠色“版本啊。下面一張工具內容的截圖為證:
其中 exe 除了 bash.exe 外,均為指令碼中用到的命令;dll 基本是各個 exe 依賴的檔案,這個是通過試執行報錯後一個一個從 msys2 系統扒過來的;bat、sh 與 plt 基本就是我們的指令碼檔案啦。整個工具佔用空間 23.3 MB,壓縮後僅僅 10 MB 多一點,非常便於給使用者交付,我想這也是 msys2 系統相比於各種指令碼語言最大的優勢 —— 不需要安裝龐大的語言虛擬機器或執行時庫,用到什麼再添什麼,比較小巧靈活,適合我這種簡單的任務。
參考
[1]. gnuplot圖例legend設定
[2]. gnuplot函式畫圖
[3]. 談談gnuplot(四十四):for 迴圈
[4]. command line - 如何將命令列引數傳遞給gnuplot?
[5]. gnuplot的常用技巧
[6]. (4) - 談談gnuplot(十四):第二座標軸
[7]. Using gnuplot to display data in your Web pages
[8]. Chapter 5. Plotting
[9]. gnuplot 學習小計3-字串 讀取檔案
[10]. gnuplot 讓您的資料視覺化
[11]. awk模糊匹配統計行數命令的詳解
[12]. AWK 簡明教程
[13]. awk呼叫系統命令引數傳遞的問題
&n