1. 程式人生 > 其它 >Seaborn分佈資料視覺化---散點分佈圖

Seaborn分佈資料視覺化---散點分佈圖

散點分佈圖

綜合表示散點圖和直方分佈圖。

Jointplot()

繪製二變數或單變數的圖形,底層是JointGrid()。

sns.jointplot(
    x,
    y,
    data=None,
    kind='scatter',
    stat_func=None,
    color=None,
    height=6,
    ratio=5,
    space=0.2,
    dropna=True,
    xlim=None,
    ylim=None,
    joint_kws=None,
    marginal_kws=None,
    annot_kws=None,
    **kwargs,
)
Docstring:
Draw a plot of two variables with bivariate and univariate graphs.

This function provides a convenient interface to the :class:`JointGrid`
class, with several canned plot kinds. This is intended to be a fairly
lightweight wrapper; if you need more flexibility, you should use
:class:`JointGrid` directly.

Parameters
----------
x, y : strings or vectors
    Data or names of variables in ``data``.
data : DataFrame, optional
    DataFrame when ``x`` and ``y`` are variable names.
kind : { "scatter" | "reg" | "resid" | "kde" | "hex" }, optional
    Kind of plot to draw.
stat_func : callable or None, optional
    *Deprecated*
color : matplotlib color, optional
    Color used for the plot elements.
height : numeric, optional
    Size of the figure (it will be square).
ratio : numeric, optional
    Ratio of joint axes height to marginal axes height.
space : numeric, optional
    Space between the joint and marginal axes
dropna : bool, optional
    If True, remove observations that are missing from ``x`` and ``y``.
{x, y}lim : two-tuples, optional
    Axis limits to set before plotting.
{joint, marginal, annot}_kws : dicts, optional
    Additional keyword arguments for the plot components.
kwargs : key, value pairings
    Additional keyword arguments are passed to the function used to
    draw the plot on the joint Axes, superseding items in the
    ``joint_kws`` dictionary.

Returns
-------
grid : :class:`JointGrid`
    :class:`JointGrid` object with the plot on it.

See Also
--------
JointGrid : The Grid class used for drawing this plot. Use it directly if
            you need more flexibility.
#綜合散點分佈圖-jointplot

#建立DataFrame陣列
rs = np.random.RandomState(3)
df = pd.DataFrame(rs.randn(200,2), columns=['A','B'])

#繪製綜合散點分佈圖jointplot()
sns.jointplot(x=df['A'], y=df['B'],     #設定x和y軸的資料
              data=df,                  #設定資料
              color='k',
              s=50, edgecolor='w', linewidth=1,  #散點大小、邊緣線顏色和寬度(只針對scatter)
              kind='scatter',                    #預設型別:“scatter”,其他有“reg”、“resid”、“kde” 
              space=0.2,                         #設定散點圖和佈局圖的間距
              height=8,                          #圖表的大小(自動調整為正方形)
              ratio=5,                           #散點圖與佈局圖高度比率
              stat_func= sci.pearsonr,           #pearson相關係數           
              marginal_kws=dict(bins=15, rug=True))    #邊際圖的引數
sns.jointplot(x=df['A'], y=df['B'],
              data=df,
              color='k',
              kind='reg',             #reg新增線性迴歸線
              height=8,
              ratio=5,
              stat_func= sci.pearsonr, 
              marginal_kws=dict(bins=15, rug=True))
sns.jointplot(x=df['A'], y=df['B'],
              data=df,
              color='k',
              kind='resid',             #resid
              height=8,
              ratio=5, 
              marginal_kws=dict(bins=15, rug=True))
sns.jointplot(x=df['A'], y=df['B'],
              data=df,
              color='k',
              kind='kde',             #kde密度圖
              height=8,
              ratio=5)
sns.jointplot(x=df['A'], y=df['B'],
              data=df,
              color='k',
              kind='hex',             #hex蜂窩圖(六角形)
              height=8,
              ratio=5)
g = sns.jointplot(x=df['A'], y=df['B'],
              data=df,
              color='k',
              kind='kde',             #kde密度圖
              height=8,
              ratio=5,
              shade_lowest=False)

#新增散點圖(c-->顏色,s-->大小)
g.plot_joint(plt.scatter, c='w', s=10, linewidth=1, marker='+')
JointGrid()

建立圖形網格,用於繪製二變數或單變數的圖形,作用和Jointplot()一樣,不過比Jointplot()更靈活。

sns.JointGrid(
    x,
    y,
    data=None,
    height=6,
    ratio=5,
    space=0.2,
    dropna=True,
    xlim=None,
    ylim=None,
    size=None,
)
Docstring:      Grid for drawing a bivariate plot with marginal univariate plots.
Init docstring:
Set up the grid of subplots.

Parameters
----------
x, y : strings or vectors
    Data or names of variables in ``data``.
data : DataFrame, optional
    DataFrame when ``x`` and ``y`` are variable names.
height : numeric
    Size of each side of the figure in inches (it will be square).
ratio : numeric
    Ratio of joint axes size to marginal axes height.
space : numeric, optional
    Space between the joint and marginal axes
dropna : bool, optional
    If True, remove observations that are missing from `x` and `y`.
{x, y}lim : two-tuples, optional
    Axis limits to set before plotting.

See Also
--------
jointplot : High-level interface for drawing bivariate plots with
            several different default plot kinds.
#設定風格
sns.set_style('white')
#匯入資料
tip_datas = sns.load_dataset('tips', data_home='seaborn-data')

#繪製繪圖網格,包含三部分:一個主繪圖區域,兩個邊際繪圖區域
g = sns.JointGrid(x='total_bill', y='tip', data=tip_datas)

#主繪圖區域:散點圖
g.plot_joint(plt.scatter, color='m', edgecolor='w', alpha=.3)

#邊際繪圖區域:x和y軸
g.ax_marg_x.hist(tip_datas['total_bill'], color='b', alpha=.3)
g.ax_marg_y.hist(tip_datas['tip'], color='r', alpha=.3,
                 orientation='horizontal')

#相關係數標籤
from scipy import stats
g.annotate(stats.pearsonr)

#繪製表格線
plt.grid(linestyle='--')
g = sns.JointGrid(x='total_bill', y='tip', data=tip_datas)
g = g.plot_joint(plt.scatter, color='g', s=40, edgecolor='white')
plt.grid(linestyle='--')
#兩邊邊際圖用統一函式設定統一風格
g.plot_marginals(sns.distplot, kde=True, color='g')
g = sns.JointGrid(x='total_bill', y='tip', data=tip_datas)
#主繪圖設定密度圖
g = g.plot_joint(sns.kdeplot, cmap='Reds_r')
plt.grid(linestyle='--')
#兩邊邊際圖用統一函式設定統一風格
g.plot_marginals(sns.distplot, kde=True, color='g')
Pairplot()

用於資料集的相關性圖形繪製,如:矩陣圖,底層是PairGrid()。

sns.pairplot(
    data,
    hue=None,
    hue_order=None,
    palette=None,
    vars=None,
    x_vars=None,
    y_vars=None,
    kind='scatter',
    diag_kind='auto',
    markers=None,
    height=2.5,
    aspect=1,
    dropna=True,
    plot_kws=None,
    diag_kws=None,
    grid_kws=None,
    size=None,
)
Docstring:
Plot pairwise relationships in a dataset.

By default, this function will create a grid of Axes such that each
variable in ``data`` will by shared in the y-axis across a single row and
in the x-axis across a single column. The diagonal Axes are treated
differently, drawing a plot to show the univariate distribution of the data
for the variable in that column.

It is also possible to show a subset of variables or plot different
variables on the rows and columns.

This is a high-level interface for :class:`PairGrid` that is intended to
make it easy to draw a few common styles. You should use :class:`PairGrid`
directly if you need more flexibility.

Parameters
----------
data : DataFrame
    Tidy (long-form) dataframe where each column is a variable and
    each row is an observation.
hue : string (variable name), optional
    Variable in ``data`` to map plot aspects to different colors.
hue_order : list of strings
    Order for the levels of the hue variable in the palette
palette : dict or seaborn color palette
    Set of colors for mapping the ``hue`` variable. If a dict, keys
    should be values  in the ``hue`` variable.
vars : list of variable names, optional
    Variables within ``data`` to use, otherwise use every column with
    a numeric datatype.
{x, y}_vars : lists of variable names, optional
    Variables within ``data`` to use separately for the rows and
    columns of the figure; i.e. to make a non-square plot.
kind : {'scatter', 'reg'}, optional
    Kind of plot for the non-identity relationships.
diag_kind : {'auto', 'hist', 'kde'}, optional
    Kind of plot for the diagonal subplots. The default depends on whether
    ``"hue"`` is used or not.
markers : single matplotlib marker code or list, optional
    Either the marker to use for all datapoints or a list of markers with
    a length the same as the number of levels in the hue variable so that
    differently colored points will also have different scatterplot
    markers.
height : scalar, optional
    Height (in inches) of each facet.
aspect : scalar, optional
    Aspect * height gives the width (in inches) of each facet.
dropna : boolean, optional
    Drop missing values from the data before plotting.
{plot, diag, grid}_kws : dicts, optional
    Dictionaries of keyword arguments.

Returns
-------
grid : PairGrid
    Returns the underlying ``PairGrid`` instance for further tweaking.

See Also
--------
PairGrid : Subplot grid for more flexible plotting of pairwise
           relationships.
#匯入鳶尾花資料
i_datas = sns.load_dataset('iris', data_home='seaborn-data')
i_datas

#矩陣散點圖
sns.pairplot(i_datas,
             kind='scatter',                 #圖形型別(散點圖:scatter, 迴歸分佈圖:reg)
             diag_kind='hist',               #對角線的圖形型別(直方圖:hist, 密度圖:kde)
             hue='species',                  #按照某一欄位分類
             palette='husl',                 #設定調色盤
             markers=['o','s','D'],          #設定點樣式
             height=2)                       #設定圖示大小
#矩陣迴歸分析圖
sns.pairplot(i_datas,
             kind='reg',                     #圖形型別(散點圖:scatter, 迴歸分佈圖:reg)
             diag_kind='kde',                #對角線的圖形型別(直方圖:hist, 密度圖:kde)
             hue='species',                  #按照某一欄位分類
             palette='husl',                 #設定調色盤
             markers=['o','s','D'],          #設定點樣式
             height=2)                       #設定圖示大小
#區域性變數選擇,vars
g = sns.pairplot(i_datas, vars=['sepal_width', 'sepal_length'],
                 kind='reg', diag_kind='kde',
                 hue='species', palette='husl')
#綜合引數設定
sns.pairplot(i_datas, diag_kind='kde', markers='+', hue='species',
             #散點圖的引數
             plot_kws=dict(s=50, edgecolor='b', linewidth=1),
             #對角線圖的引數
             diag_kws=dict(shade=True))
PairGrid()

用於資料集的相關性圖形繪製,如:矩陣圖。功能比Pairplot()更加靈活。

sns.PairGrid(
    data,
    hue=None,
    hue_order=None,
    palette=None,
    hue_kws=None,
    vars=None,
    x_vars=None,
    y_vars=None,
    diag_sharey=True,
    height=2.5,
    aspect=1,
    despine=True,
    dropna=True,
    size=None,
)
Docstring:     
Subplot grid for plotting pairwise relationships in a dataset.

This class maps each variable in a dataset onto a column and row in a
grid of multiple axes. Different axes-level plotting functions can be
used to draw bivariate plots in the upper and lower triangles, and the
the marginal distribution of each variable can be shown on the diagonal.

It can also represent an additional level of conditionalization with the
``hue`` parameter, which plots different subets of data in different
colors. This uses color to resolve elements on a third dimension, but
only draws subsets on top of each other and will not tailor the ``hue``
parameter for the specific visualization the way that axes-level functions
that accept ``hue`` will.

See the :ref:`tutorial <grid_tutorial>` for more information.
Init docstring:
Initialize the plot figure and PairGrid object.

Parameters
----------
data : DataFrame
    Tidy (long-form) dataframe where each column is a variable and
    each row is an observation.
hue : string (variable name), optional
    Variable in ``data`` to map plot aspects to different colors.
hue_order : list of strings
    Order for the levels of the hue variable in the palette
palette : dict or seaborn color palette
    Set of colors for mapping the ``hue`` variable. If a dict, keys
    should be values  in the ``hue`` variable.
hue_kws : dictionary of param -> list of values mapping
    Other keyword arguments to insert into the plotting call to let
    other plot attributes vary across levels of the hue variable (e.g.
    the markers in a scatterplot).
vars : list of variable names, optional
    Variables within ``data`` to use, otherwise use every column with
    a numeric datatype.
{x, y}_vars : lists of variable names, optional
    Variables within ``data`` to use separately for the rows and
    columns of the figure; i.e. to make a non-square plot.
height : scalar, optional
    Height (in inches) of each facet.
aspect : scalar, optional
    Aspect * height gives the width (in inches) of each facet.
despine : boolean, optional
    Remove the top and right spines from the plots.
dropna : boolean, optional
    Drop missing values from the data before plotting.

See Also
--------
pairplot : Easily drawing common uses of :class:`PairGrid`.
FacetGrid : Subplot grid for plotting conditional relationships.
#繪製四個引數vars的繪圖網格(subplots)
g = sns.PairGrid(i_datas, hue='species', palette='hls',
                 vars=['sepal_length', 'sepal_width', 'petal_length', 'petal_width'])

#對角線圖形繪製
g.map_diag(plt.hist,
           histtype='step',             #可選:'bar'\ 'barstacked'\'step'\'stepfilled'
           linewidth=1)

#非對角線圖形繪製
g.map_offdiag(plt.scatter, s=40, linewidth=1)

#新增圖例
g.add_legend()
g = sns.PairGrid(i_datas)

#主對角線圖形
g.map_diag(sns.kdeplot)

#上三角圖形
g.map_upper(plt.scatter)

#下三角圖形
g.map_lower(sns.kdeplot, cmap='Blues_d')