1. 程式人生 > 其它 >max group by 所有列_SAS快速刪除資料集中所有空白列

max group by 所有列_SAS快速刪除資料集中所有空白列

技術標籤:max group by 所有列

6b693681517fcee60addc04a5a34836a.png

問題起源:在真正的資料分析開始之前,需要確定如何從原始資料集中提取有效資訊,而通常我們拿到的資料中,並非所有點位/變數都包含資訊,那麼,快速刪除空白列,而保留下有資訊的列,就會大大降低工作量,避免投入不必要的時間。

案例:data_08_1中儲存的是手術相關資訊,共包含2479個變數,總觀測有19262條,有很多變數是整列都為空的,為從中找到合適的資訊用於分析,擬刪除空白列,留下有資料的列,來考慮接下來的分析內容

解決思路1:計數空白行,如果空白行數等於總行數,說明整列為空,需刪除

解決方案1

data temp;  set data_08_1;  array arr1{*} _numeric_;    array arr2{*} _character_;    do i=1 to dim(arr1);      if missing(arr1(i))=1 then do;          var=vname(arr1(i));          output;        end;    end;    do i=1 to dim(arr2);      if missing(arr2(i))=1 then do;          var=vname(arr2(i));          output;        end;    end;    keep var;run; proc sql;  create table miss as      select var,count(*) as frenquency        from temp          group by var;quit;proc sql;  select count(*) into:ct from data_08_1;quit;proc sql;  select var into:delete separated by ","       from miss where frequency=&ct.;quit;proc sql;  alter table data_08_1 drop &delete.;  quit;

解決思路2:利用SQL中的max函式取單個變數最大值,如果最大值為空,則可認為整列為空,刪除最大值為空的列

解決方案2

proc sql;  select cat("max(",strip(name),") as ",strip(name)) into:vlist1 separated by "," from sashelp.vcolumn where libname="WORK" and memname="DATA_08_1" and varnum le 1000;    select cat("max(",strip(name),") as ",strip(name)) into:vlist2 separated by "," from sashelp.vcolumn where libname="WORK" and memname="DATA_08_1" and varnum gt 1000 and varnum le 2000;  select cat("max(",strip(name),") as ",strip(name)) into:vlist3 separated by "," from sashelp.vcolumn where libname="WORK" and memname="DATA_08_1" and varnum gt 2000;**限定varnum的原因是 如果不限定,巨集變數的長度會超出65534的最大長度而被截斷;quit;proc sql;  create table temp1 as select &vlist1. from data_08_1;    create table temp2 as select &vlist2. from data_08_1;  create table temp3 as select &vlist3. from data_08_1;quit;data temp;  merge temp1 temp2 temp3;run;data miss;  set temp;    array arr1{*} _numeric_;     array arr2{*} _character_;     do i=1 to dim(arr1);      if missing(arr1(i))=1 then do;           var=vname(arr1(i));           output;           end;     end;     do i=1 to dim(arr2);      if missing(arr2(i))=1 then do;          var=vname(arr2(i));           output;         end;     end;     keep var; run;proc sql;  select var into:delete separated by "," from miss;quit;proc sql;  alter table data_08_1 drop &delete.;quit;

84fe8862905c1a048ac7d91b663e44f1.png

6b693681517fcee60addc04a5a34836a.png