Seaborn分佈資料視覺化---箱型分佈圖
阿新 • • 發佈:2022-01-07
箱型分佈圖
boxplot()
sns.boxplot( x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=0.75, width=0.8, dodge=True, fliersize=5, linewidth=None, whis=1.5, notch=False, ax=None, **kwargs, ) Docstring: Draw a box plot to show distributions with respect to categories. A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be "outliers" using a method that is a function of the inter-quartile range. Input data can be passed in a variety of formats, including: - Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the ``x``, ``y``, and/or ``hue`` parameters. - A "long-form" DataFrame, in which case the ``x``, ``y``, and ``hue`` variables will determine how the data are plotted. - A "wide-form" DataFrame, such that each numeric column will be plotted. - An array or list of vectors. In most cases, it is possible to use numpy or Python objects, but pandas objects are preferable because the associated names will be used to annotate the axes. Additionally, you can use Categorical types for the grouping variables to control the order of plot elements. This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, ... n) on the relevant axis, even when the data has a numeric or date type. See the :ref:`tutorial <categorical_tutorial>` for more information. Parameters ---------- x, y, hue : names of variables in ``data`` or vector data, optional Inputs for plotting long-form data. See examples for interpretation. data : DataFrame, array, or list of arrays, optional Dataset for plotting. If ``x`` and ``y`` are absent, this is interpreted as wide-form. Otherwise it is expected to be long-form. order, hue_order : lists of strings, optional Order to plot the categorical levels in, otherwise the levels are inferred from the data objects. orient : "v" | "h", optional Orientation of the plot (vertical or horizontal). This is usually inferred from the dtype of the input variables, but can be used to specify when the "categorical" variable is a numeric or when plotting wide-form data. color : matplotlib color, optional Color for all of the elements, or seed for a gradient palette. palette : palette name, list, or dict, optional Colors to use for the different levels of the ``hue`` variable. Should be something that can be interpreted by :func:`color_palette`, or a dictionary mapping hue levels to matplotlib colors. saturation : float, optional Proportion of the original saturation to draw colors at. Large patches often look better with slightly desaturated colors, but set this to ``1`` if you want the plot colors to perfectly match the input color spec. width : float, optional Width of a full element when not using hue nesting, or width of all the elements for one level of the major grouping variable. dodge : bool, optional When hue nesting is used, whether elements should be shifted along the categorical axis. fliersize : float, optional Size of the markers used to indicate outlier observations. linewidth : float, optional Width of the gray lines that frame the plot elements. whis : float, optional Proportion of the IQR past the low and high quartiles to extend the plot whiskers. Points outside this range will be identified as outliers. notch : boolean, optional Whether to "notch" the box to indicate a confidence interval for the median. There are several other parameters that can control how the notches are drawn; see the ``plt.boxplot`` help for more information on them. ax : matplotlib Axes, optional Axes object to draw the plot onto, otherwise uses the current Axes. kwargs : key, value mappings Other keyword arguments are passed through to ``plt.boxplot`` at draw time. Returns ------- ax : matplotlib Axes Returns the Axes object with the plot drawn onto it. See Also -------- violinplot : A combination of boxplot and kernel density estimation. stripplot : A scatterplot where one variable is categorical. Can be used in conjunction with other plots to show each observation. swarmplot : A categorical scatterplot where the points do not overlap. Can be used with other plots to show each observation.
#設定風格 sns.set_style('white') #匯入資料 tip_datas = sns.load_dataset('tips', data_home='seaborn-data') # 繪製傳統的箱型圖 sns.boxplot(x='day', y='total_bill', data=tip_datas, linewidth=2, #線寬 width=0.8, #箱之間的間隔比例 fliersize=3, #異常點大小 palette='hls', #設定調色盤 whis=1.5, #設定IQR notch=True, #設定中位值凹陷 order=['Thur','Fri','Sat','Sun'], #選擇型別並排序 )
# 繪製箱型圖 sns.boxplot(x='day', y='total_bill', data=tip_datas, linewidth=2, width=0.8, fliersize=3, palette='hls', whis=1.5, notch=True, order=['Thur','Fri','Sat','Sun'], ) #新增散點圖 sns.swarmplot(x='day', y='total_bill', data=tip_datas, color='k', size=3, alpha=0.8)
# 繪製箱型圖,hue引數設定再分類
sns.boxplot(x='day', y='total_bill', data=tip_datas,
linewidth=2,
width=0.8,
fliersize=3,
palette='hls',
whis=1.5,
notch=True,
order=['Thur','Fri','Sat','Sun'],
hue='smoker',
)
violinplot()
sns.violinplot(x='day', y='total_bill', data=tip_datas,
linewidth=2,
width=0.8,
palette='hls',
order=['Thur','Fri','Sat','Sun'],
scale='area', #設定提琴寬度:area-面積相同,count-按照樣本數量決定寬度,width-寬度一樣
gridsize=50, #設定提琴圖的邊線平滑度,越高越平滑
inner='box', #設定內部顯示型別--"box","quartile","point","stick",None
bw=0.8 #控制擬合程度,一般可以不設定
)
sns.violinplot(x='day', y='total_bill', data=tip_datas,
linewidth=2,
width=0.8,
palette='hls',
order=['Thur','Fri','Sat','Sun'],
scale='width',
gridsize=50,
inner='quartile',
bw=0.8
)
sns.violinplot(x='day', y='total_bill', data=tip_datas,
linewidth=2,
width=0.8,
palette='hls',
order=['Thur','Fri','Sat','Sun'],
scale='width',
gridsize=50,
inner='point',
bw=0.8
)
sns.violinplot(x='day', y='total_bill', data=tip_datas,
linewidth=2,
width=0.8,
palette='hls',
order=['Thur','Fri','Sat','Sun'],
scale='width',
gridsize=50,
inner='stick',
bw=0.8
)
boxenplot()
sns.boxenplot(
x=None,
y=None,
hue=None,
data=None,
order=None,
hue_order=None,
orient=None,
color=None,
palette=None,
saturation=0.75,
width=0.8,
dodge=True,
k_depth='proportion',
linewidth=None,
scale='exponential',
outlier_prop=None,
ax=None,
**kwargs,
)
Docstring:
Draw an enhanced box plot for larger datasets.
This style of plot was originally named a "letter value" plot because it
shows a large number of quantiles that are defined as "letter values". It
is similar to a box plot in plotting a nonparametric representation of a
distribution in which all features correspond to actual observations. By
plotting more quantiles, it provides more information about the shape of
the distribution, particularly in the tails. For a more extensive
explanation, you can read the paper that introduced the plot:
https://vita.had.co.nz/papers/letter-value-plot.html
Input data can be passed in a variety of formats, including:
- Vectors of data represented as lists, numpy arrays, or pandas Series
objects passed directly to the ``x``, ``y``, and/or ``hue`` parameters.
- A "long-form" DataFrame, in which case the ``x``, ``y``, and ``hue``
variables will determine how the data are plotted.
- A "wide-form" DataFrame, such that each numeric column will be plotted.
- An array or list of vectors.
In most cases, it is possible to use numpy or Python objects, but pandas
objects are preferable because the associated names will be used to
annotate the axes. Additionally, you can use Categorical types for the
grouping variables to control the order of plot elements.
This function always treats one of the variables as categorical and
draws data at ordinal positions (0, 1, ... n) on the relevant axis, even
when the data has a numeric or date type.
See the :ref:`tutorial <categorical_tutorial>` for more information.
Parameters
----------
x, y, hue : names of variables in ``data`` or vector data, optional
Inputs for plotting long-form data. See examples for interpretation.
data : DataFrame, array, or list of arrays, optional
Dataset for plotting. If ``x`` and ``y`` are absent, this is
interpreted as wide-form. Otherwise it is expected to be long-form.
order, hue_order : lists of strings, optional
Order to plot the categorical levels in, otherwise the levels are
inferred from the data objects.
orient : "v" | "h", optional
Orientation of the plot (vertical or horizontal). This is usually
inferred from the dtype of the input variables, but can be used to
specify when the "categorical" variable is a numeric or when plotting
wide-form data.
color : matplotlib color, optional
Color for all of the elements, or seed for a gradient palette.
palette : palette name, list, or dict, optional
Colors to use for the different levels of the ``hue`` variable. Should
be something that can be interpreted by :func:`color_palette`, or a
dictionary mapping hue levels to matplotlib colors.
saturation : float, optional
Proportion of the original saturation to draw colors at. Large patches
often look better with slightly desaturated colors, but set this to
``1`` if you want the plot colors to perfectly match the input color
spec.
width : float, optional
Width of a full element when not using hue nesting, or width of all the
elements for one level of the major grouping variable.
dodge : bool, optional
When hue nesting is used, whether elements should be shifted along the
categorical axis.
k_depth : "proportion" | "tukey" | "trustworthy", optional
The number of boxes, and by extension number of percentiles, to draw.
All methods are detailed in Wickham's paper. Each makes different
assumptions about the number of outliers and leverages different
statistical properties.
linewidth : float, optional
Width of the gray lines that frame the plot elements.
scale : "linear" | "exponential" | "area"
Method to use for the width of the letter value boxes. All give similar
results visually. "linear" reduces the width by a constant linear
factor, "exponential" uses the proportion of data not covered, "area"
is proportional to the percentage of data covered.
outlier_prop : float, optional
Proportion of data believed to be outliers. Used in conjunction with
k_depth to determine the number of percentiles to draw. Defaults to
0.007 as a proportion of outliers. Should be in range [0, 1].
ax : matplotlib Axes, optional
Axes object to draw the plot onto, otherwise uses the current Axes.
kwargs : key, value mappings
Other keyword arguments are passed through to ``plt.plot`` and
``plt.scatter`` at draw time.
Returns
-------
ax : matplotlib Axes
Returns the Axes object with the plot drawn onto it.
See Also
--------
violinplot : A combination of boxplot and kernel density estimation.
boxplot : A traditional box-and-whisker plot with a similar API.
#單變數簡易圖
ax = sns.boxenplot(x=tip_datas['total_bill'])
#多變數箱型圖
ax = sns.boxenplot(x='day', y='total_bill', data=tip_datas)
#多變數分類箱型圖,hue
ax = sns.boxenplot(x='day', y='total_bill',
data=tip_datas,hue='smoker'
)
#多變數分類箱型圖,hue
ax = sns.boxenplot(x='day', y='total_bill',
data=tip_datas,hue='time',
linewidth=2.5)
#多變數排序箱型圖,order
ax = sns.boxenplot(x='time', y='tip',
data=tip_datas,order=['Dinner','Lunch']
)
ax = sns.boxenplot(x='day', y='total_bill',
data=tip_datas)
#新增散點圖
ax = sns.stripplot(x='day', y='total_bill', data=tip_datas,
size=4,jitter=True, color="gray"
)
#多變數橫向箱型圖,orient
iris_datas = sns.load_dataset('iris', data_home='seaborn-data')
ax = sns.boxenplot(data=iris_datas, orient='h')
#分欄箱型圖
g = sns.catplot(x="sex", y="total_bill",
hue="smoker", col="time",
data=tip_datas, kind="boxen",
height=4, aspect=.7)
#其他引數,scale\k_depth
sns.boxenplot(x='day', y='total_bill', data=tip_datas,
width=0.8,
linewidth=12,
scale='area', #設定框大小:"linear"、"exponential"、"area"
k_depth='proportion', #設定框的數量: "proportion"、"tukey"、"trustworthy"
)
sns.boxenplot(x='day', y='total_bill', data=tip_datas,
width=0.8,
linewidth=12,
scale='linear', #設定框大小:"linear"、"exponential"、"area"
k_depth='proportion', #設定框的數量: "proportion"、"tukey"、"trustworthy"
)
sns.boxenplot(x='day', y='total_bill', data=tip_datas,
width=0.8,
linewidth=12,
scale='exponential', #設定框大小:"linear"、"exponential"、"area"
k_depth='proportion', #設定框的數量: "proportion"、"tukey"、"trustworthy"
)