Welcome to DaxinPai's Blog

阿新 • • 發佈：2019-01-22

在資料分析中，遇到統計問題的時候，基本可以按照下表來：
statistics method

(圖片來源自網上，出處不詳)

那麼首先我們需要判斷是否是正態分佈（Normal Distribution）, 四種方法：

繪製資料的直方圖，看疊加線——這是一種粗略的方法，且不是硬性（ hard-and-fast）指標。一般來說看得多了你就知道哪些是屬於正態分佈，哪些不屬於。
看偏態值（Skewness）和峰態值（Kurtosis）:
Skewness 是關於分佈是否對稱的指標。
分為正偏態分佈（positively skewed distribution ，整體往左偏）和負偏態分佈（negatively skewed distribution，整體往右偏）
Kurtosis 是關於分佈峰值陡峭情況的一個指標。
它是指整個曲線的形狀是鍾型（bell-shaped ）的而不是例如肥胖型或尖峰型等等。
正態分佈的Skewness 和 Kurtosis 都是 0

，所以離0 越遠越不是正態分佈，但是到底多少距離 0 我們可以認為它是正態的呢？這個就難辦了，所以出現了下面的辦法，它是結合了偏態值和峰態值的一種統計檢驗方法。
Kolmogorov-Smirnov test (K-S) 和 Shapiro-Wilk (S-W) test
他們是通過comparing your data to a normal distribution with the same mean and standard deviation of your sample 來檢驗是否正態的。
如果檢驗不顯著（NOT significant，即大於0.05），則是正態的，顯著的話（significant，即小於0.05），則是非正態的。
需要注意的是，樣本越大，越有可能得到顯著的結果。
另外一種方法就是做圖畫點的方法，叫做“Normal Q-Q Plot”。
The black line indicates the values your sample should adhere to if the distribution was normal. The dots are your actual data. If the dots fall exactly on the black line, then your data are normal. If they deviate from the black line, your data are non-normal.

一些很明顯不是正態分佈的情形：
when the outcome is an ordinal variable or a rank
when there are definite outliers or
when the outcome has clear limits of detection.

第二個就是判斷是否是方差齊性(Homogeneity of variance)：

首先要知道什麼是方差齊性：是指每一個總體的方差都是相同的。它是ANOVA分析的基礎。數學表示如下：
H0=σ21=σ22=⋯=σ2n

檢驗是否具有方差齊性的方法主要有三個：
Bartlett’s Test
Levene’s Tes
Brown-Forsythe Test
同時 F statistic test 也可以用來判斷方差齊性。

關於 parametric 和 nonparametric：

什麼是 nonparametric？
即方差不齊或者不是正態分佈，Distribution-free，與 Parametric 對應

根據以上即可推出：
什麼時候用 Nonparametric Tests呢？
1、方差不齊或者非正態時
2、資料是名詞意義的或者有序的（非正態的常見情況）
3、樣本非常小的時候
當然每一組肯定還是任意分配的

In general, when compared to a corresponding parametric test, a nonparametric test is less powerful.
For very small samples, nonparametric tests can be as powerful as the parametric counter parts.

Q: 樣本大小跟 nonparametric 的關係？
• Validity of the unpaired t-test is not seriously compromised by violating the assumption of
equality of variance IF n1 = n2.
• If sample sizes are unequal, differences in variance can affect the accuracy of the t-statistics.
（From course material）

常見Test 說明

Parametric tests：

T-Test for Independent Samples：
Equal Variances、unpaired
T-test 是在比較兩者的均值（means）
independent 是指 unpaired t-test
既可以用作equal variance 又可以是在 unequal variance 的情況下：

Unpaired T-Test 2

T-Test for Paired Samples：

Subjects may be matched on relevant variables (age, twins, etc.), or using self as control.
Paired T-Test

One-Way Analysis of Variance for Independent Samples：
ANOVA 是比較兩組以上均值的
one way 指的是independent variable or factor, with 3 or more levels.(一個變數，但該變數在每組的程度不同)
它是基於 F 檢驗的：
F-Test
F-Test2

A significant F-ratio does NOT indicate each group is different from all other groups.
It only tells us that there is a significant difference between at least 2 of the means (largest vs smallest).

(Need to be continued)

Welcome to DaxinPai's Blog

在資料分析中，遇到統計問題的時候，基本可以按照下表來： (圖片來源自網上，出處不詳) 那麼首先我們需要判斷是否是正態分佈（Normal Distribution）, 四種方法：繪製資料的直方圖，看疊加線——這是一種粗略的方法，且不是硬性（ ha

Welcome to JRX2015U43's blog!

【英文題目】 A sequence of N positive integers (10 < N < 100 000), each of them less than or equal 10000, and a positive integer S (S <

Welcome to yjjr's blog!

T1 yyy點餐題意給出長度為nnn的序列，求有所有不同的組合的代價總和（每種組合的代價為該組合內所有數之和）對於全部資料，有1≤n≤1000000,0≤ai<9982443531\

Welcome to oopos's Blog

/*最大子段和問題:對於一個序列： -6,9,8,-10,100,-99其中：最大子段和為：100 子段長度為：10*/#include <stdio.h>#include <stdlib.h>#define MAX 101int main(void){ int i,j,k,n,

Welcome to ray's blog home page

在Java中，一個物件在可以被使用之前必須要被正確地初始化，這一點是Java規範規定的。本文試圖對Java如何執行物件的初始化做一個詳細深入地介紹(與物件初始化相同，類在被載入之後也是需要初始化的，本文在最後也會對類的初始化進行介紹，相對於物件初始化來說，類的初始化要相對

Welcome to tikeyc's column

蘋果靜止熱更新，可惜我的是企業APP...（當然有些熱更新已經可以通過蘋果稽核了，比如JSPatch）最近公司要新增熱修復BUG，其實早之前本人就有簡單實現過，剛好契合公司需求，在此總結一下iOS熱更新實現方式這個是我根據JSPatch寫的一個Demo：https://github.com/tike

Welcome to Simmel's Garden

看別人的論文時看到利用腳註將冗餘資訊放到頁面腳註位置，使得論文作者區域清爽、簡潔、高階。想要效仿。於是在網路中進行了一番搜尋，發現了一篇博文給了我答案：實現的效果是任意指定作者的標註符號，包括多個作者可

【Welcome to Smile-Huang 's Blog.】This Blog aims to share my experience with you. Please leave comments if you have any thoughts.

This Blog aims to share my experience with you. Please leave comments if you have any thoughts.

Welcome to DaxinPai's Blog

那麼首先我們需要判斷是否是正態分佈（Normal Distribution）, 四種方法：

第二個就是判斷是否是方差齊性(Homogeneity of variance)：

關於 parametric 和 nonparametric：

常見Test 說明

Parametric tests：

Welcome to DaxinPai's Blog

Welcome to JRX2015U43's blog!

Welcome to yjjr's blog!

Welcome to oopos's Blog

Welcome to ray's blog home page

Welcome to tikeyc's column

Welcome to Simmel's Garden

【Welcome to Smile-Huang 's Blog.】This Blog aims to share my experience with you. Please leave comments if you have any thoughts.

Welcome to Smile-Huang 's Blog.

Welcome to Feng.Chang's Blog

welcome to 浩·C's blog

Welcome to WindowsCE's World.

Quinn's blog ! I'm glad to be here!

Welcome to my blog. I hope you can communicate with me.

Deploying Facebox to AWS ECS @ Alex Pliutau's Blog

o means open. Simple CLI tool to open repository in browser. @ Alex Pliutau's Blog

How to build Go plugin with data inside @ Alex Pliutau's Blog

Different ways to block Go runtime forever @ Alex Pliutau's Blog

Transitioning from Engineer to Engineering Manager @ Alex Pliutau's Blog

Migrate blog to AWS’s ec2

Welcome to DaxinPai's Blog

那麼首先我們需要判斷是否是正態分佈（Normal Distribution）, 四種方法：

第二個就是判斷是否是方差齊性(Homogeneity of variance)：

關於 parametric 和 nonparametric：

常見Test 說明

Parametric tests：

相關推薦