max group by 所有列_SAS快速刪除資料集中所有空白列
阿新 • • 發佈:2020-12-27
技術標籤:max group by 所有列
問題起源:在真正的資料分析開始之前,需要確定如何從原始資料集中提取有效資訊,而通常我們拿到的資料中,並非所有點位/變數都包含資訊,那麼,快速刪除空白列,而保留下有資訊的列,就會大大降低工作量,避免投入不必要的時間。
案例:data_08_1中儲存的是手術相關資訊,共包含2479個變數,總觀測有19262條,有很多變數是整列都為空的,為從中找到合適的資訊用於分析,擬刪除空白列,留下有資料的列,來考慮接下來的分析內容
解決思路1:計數空白行,如果空白行數等於總行數,說明整列為空,需刪除
解決方案1:
data temp; set data_08_1; array arr1{*} _numeric_; array arr2{*} _character_; do i=1 to dim(arr1); if missing(arr1(i))=1 then do; var=vname(arr1(i)); output; end; end; do i=1 to dim(arr2); if missing(arr2(i))=1 then do; var=vname(arr2(i)); output; end; end; keep var;run; proc sql; create table miss as select var,count(*) as frenquency from temp group by var;quit;proc sql; select count(*) into:ct from data_08_1;quit;proc sql; select var into:delete separated by "," from miss where frequency=&ct.;quit;proc sql; alter table data_08_1 drop &delete.; quit;
解決思路2:利用SQL中的max函式取單個變數最大值,如果最大值為空,則可認為整列為空,刪除最大值為空的列
解決方案2:
proc sql; select cat("max(",strip(name),") as ",strip(name)) into:vlist1 separated by "," from sashelp.vcolumn where libname="WORK" and memname="DATA_08_1" and varnum le 1000; select cat("max(",strip(name),") as ",strip(name)) into:vlist2 separated by "," from sashelp.vcolumn where libname="WORK" and memname="DATA_08_1" and varnum gt 1000 and varnum le 2000; select cat("max(",strip(name),") as ",strip(name)) into:vlist3 separated by "," from sashelp.vcolumn where libname="WORK" and memname="DATA_08_1" and varnum gt 2000;**限定varnum的原因是 如果不限定,巨集變數的長度會超出65534的最大長度而被截斷;quit;proc sql; create table temp1 as select &vlist1. from data_08_1; create table temp2 as select &vlist2. from data_08_1; create table temp3 as select &vlist3. from data_08_1;quit;data temp; merge temp1 temp2 temp3;run;data miss; set temp; array arr1{*} _numeric_; array arr2{*} _character_; do i=1 to dim(arr1); if missing(arr1(i))=1 then do; var=vname(arr1(i)); output; end; end; do i=1 to dim(arr2); if missing(arr2(i))=1 then do; var=vname(arr2(i)); output; end; end; keep var; run;proc sql; select var into:delete separated by "," from miss;quit;proc sql; alter table data_08_1 drop &delete.;quit;