Stata: 虛擬變數交乘項生成和檢驗的簡便方法
阿新 • • 發佈:2019-01-13
簡介 虛擬變數(Dummy variables)和交乘項(Interaction)
在對有組別或者等級的資料進行處理時,常常需要利用虛擬變數和交乘項來探究各組之間或各等級之間的結構性的差異(Structural Difference)
例: 探究婚姻對女性工資造成的結構性的差異
sysuse nlsw88.dta, clear
sum
Variable | Obs Mean Std. Dev. Min Max -------------+------------------------------------------------ idcode | 2,246 2612.654 1480.864 1 5159 age | 2,246 39.15316 3.060002 34 46 race | 2,246 1.282725 .4754413 1 3 married | 2,246 .6420303 .4795099 0 1 never_marr~d | 2,246 .1041852 .3055687 0 1 -------------+------------------------------------------------ grade | 2,244 13.09893 2.521246 0 18 collgrad | 2,246 .2368655 .4252538 0 1 south | 2,246 .4194123 .4935728 0 1 smsa | 2,246 .7039181 .4566292 0 1 c_city | 2,246 .2916296 .4546139 0 1 -------------+------------------------------------------------ industry | 2,232 8.189516 3.010875 1 12 occupation | 2,237 4.642825 3.408897 1 13 union | 1,878 .2454739 .4304825 0 1 wage | 2,246 7.766949 5.755523 1.004952 40.74659 hours | 2,242 37.21811 10.50914 1 80 -------------+------------------------------------------------ ttl_exp | 2,246 12.53498 4.610208 .1153846 28.88461 tenure | 2,231 5.97785 5.510331 0 25.91667
基礎模型(Basic Model)
新增虛擬變數及交乘項的複雜方法
gen marriedtenure = married*tenure
gen marriedhours = married*hours
gen marriedttl = married*ttl_exp
reg wage tenure hours ttl_exp married*
test marriedtenure marriedhours marriedttl
Source | SS df MS
-------------+----------------------------------
Model | 6140.31754 7 877.188219
Residual | 67880.4931 2,219 30.5905782
-------------+----------------------------------
Total | 74020.8106 2,226 33.2528349
Number of obs = 2,227
F(7, 2219) = 28.68
Prob > F = 0.0000
R-squared = 0.0830
Adj R-squared = 0.0801
Root MSE = 5.5309
-----------------------------------------------
wage | Coef. Std. Err. t
--------------+--------------------------------
tenure | .1048823 .0412746 2.54
hours | .0874067 .0222925 3.92
ttl_exp | .2183548 .0515089 4.24
married | 1.029717 1.12407 0.92
marriedtenure | -.110726 .0532406 -2.08
marriedhours | -.0418236 .0261311 -1.60
marriedttl | .0869538 .0652744 1.33
_cons | 1.208404 .9551692 1.27
-----------------------------------------------
------------------------------------------------
wage | P>|t| [95% Conf. Interval]
--------------+---------------------------------
tenure | 0.011 .0239415 .1858232
hours | 0.000 .0436904 .1311231
ttl_exp | 0.000 .1173441 .3193655
married | 0.360 -1.174622 3.234056
marriedtenure | 0.038 -.2151326 -.0063194
marriedhours | 0.110 -.0930675 .0094204
marriedttl | 0.183 -.0410515 .214959
_cons | 0.206 -.6647154 3.081522
------------------------------------------------
( 1) marriedtenure = 0
( 2) marriedhours = 0
( 3) marriedttl = 0
F( 3, 2219) = 2.31
Prob > F = 0.0748
利用Factor Indicator 的便捷方法
Factor Indicator 的更多應用及詳情請見於fvvarlist。
help fvvarlist
簡便方式
global cx "tenure hours ttl_exp"
reg wage i.married##c.($cx)
testparm married married#c.($cx)
Source | SS df MS
-------------+----------------------------------
Model | 6140.31754 7 877.188219
Residual | 67880.4931 2,219 30.5905782
-------------+----------------------------------
Total | 74020.8106 2,226 33.2528349
Number of obs = 2,227
F(7, 2219) = 28.68
Prob > F = 0.0000
R-squared = 0.0830
Adj R-squared = 0.0801
Root MSE = 5.5309
----------------------------------------------------
wage | Coef. Std. Err. t
------------------+---------------------------------
married |
married | 1.029717 1.12407 0.92
tenure | .1048823 .0412746 2.54
hours | .0874067 .0222925 3.92
ttl_exp | .2183548 .0515089 4.24
|
married#c.tenure |
married | -.110726 .0532406 -2.08
|
married#c.hours |
married | -.0418236 .0261311 -1.60
|
married#c.ttl_exp |
married | .0869538 .0652744 1.33
|
_cons | 1.208404 .9551692 1.27
----------------------------------------------------
---------------------------------------------------
wage | P>|t| [95% Conf. Interval]
------------------+--------------------------------
married |
married | 0.360 -1.174622 3.234056
tenure | 0.011 .0239415 .1858232
hours | 0.000 .0436904 .1311231
ttl_exp | 0.000 .1173441 .3193655
|
married#c.tenure |
married | 0.038 -.2151326 -.0063194
|
married#c.hours |
married | 0.110 -.0930675 .0094204
|
married#c.ttl_exp |
married | 0.183 -.0410515 .214959
|
_cons | 0.206 -.6647154 3.081522
---------------------------------------------------
( 1) 1.married#c.tenure = 0
( 2) 1.married#c.hours = 0
( 3) 1.married#c.ttl_exp = 0
F( 3, 2219) = 2.31
Prob > F = 0.0748
注意此處應使用命令 testparm 而非 test
test 不支援 factor indicator 的#語法
若要用test,則需要改寫為
test married married#c.tenture married#c.hours married#c.ttl_exp
這樣則極為冗長和複雜。
總結
- 利用factor indicator 的語法極大的方便了虛擬變數交乘項的生成
- 在迴歸和檢驗中均可使用,注意test應用testparm命令替代
- 在自變數多的時候,該方法的便捷性更加明顯
- 可以利用global 命令將其他需要交乘變數,放入一個全域性暫元中,之後直接$引用就好,極大地減少程式碼的書寫量
關於我們
- 【Stata 連享會(公眾號:StataChina)】由中山大學連玉君老師團隊創辦,旨在定期與大家分享 Stata 應用的各種經驗和技巧。
- 公眾號推文同步釋出於 CSDN-Stata連享會 、簡書-Stata連享會 和 知乎-連玉君Stata專欄。可以在上述網站中搜索關鍵詞
Stata
或Stata連享會
後關注我們。 - 點選推文底部【閱讀原文】可以檢視推文中的連結並下載相關資料。
- Stata連享會 精彩推文1 || 精彩推文2
聯絡我們
- 歡迎賜稿: 歡迎將您的文章或筆記投稿至
Stata連享會(公眾號: StataChina)
,我們會保留您的署名;錄用稿件達五篇
以上,即可免費獲得 Stata 現場培訓 (初級或高階選其一) 資格。 - 意見和資料: 歡迎您的寶貴意見,您也可以來信索取推文中提及的程式和資料。
- 招募英才: 歡迎加入我們的團隊,一起學習 Stata。合作編輯或撰寫稿件五篇以上,即可免費獲得 Stata 現場培訓 (初級或高階選其一) 資格。
- 聯絡郵件: [email protected]
往期精彩推文
- Stata連享會推文列表1
- Stata連享會推文列表2
- Stata連享會 精彩推文1 || 精彩推文2