1. 程式人生 > >C3-Probability and Information Theory

C3-Probability and Information Theory

  • probability==> degree of belief
  • frequentist probability==> directly related to the rates at which events occur.
  • bayesian probability==> related to qualitative levels of certainty
  • random varaible==> a varaible that can take on different values randomly.
    • discrete:has a finite or countably infinite number of states
    • continuous:is associated with a real value.
  • probability distribution==> a description of how likely a random varaible or set of random variables is to take on eac of its possible states.
    • probability mass function(PMF)==> a probability distribution over discrete variable
      • PMF maps from a state of random variable to the probability of that random variable taking on that state.
      • P(x=x)P(\text{x}=x) or xP(x)\text{x}\sim P(\text{x})
      • the domain of PP must be the set of all possible states of xx
      • xx,0P(x)1\forall x\in \text{x},0\leq P(x)\leq 1.
      • xxP(x)=1\sum\nolimits_{x\in\text{x}}P(x)=1.
    • joint probability distribution==> a probability distribution over many variables
      • P
        (x=x,y=y)P(\text{x}=x,\text{y}=y)
        or P(x,y)P(x,y)
    • probability density function(PDF)==> a probability distribution over continuous random variable
      • the domain of pp must be the set of all possible states of x\text{x}.
      • xx,p(x)0\forall x\in\text{x},p(x)\geq0.
      • p(x)dx=1\int p(x)dx=1.
      • u(x;a,b)u(x;a,b), where b>ab>a. For all x[a,b]x\notin[a,b], u(x;a,b)=0u(x;a,b)=0; within [a,b][a,b], u(x;a,b)=1bau(x;a,b)=\frac{1}{b-a}. Namely xU(a,b)\text{x}\sim U(a,b).
  • Marginal Probability
    • The probability distribution over the subset.
    • For discrete random variable,know P(x,y)P(\text{x},\text{y}), find P(x)P(\text{x}) with the sum rule: xx,P(x=x)=yP(x=x,y=y)\forall x\in\text{x},P(\text{x}=x)=\sum\limits_yP(\text{x}=x,\text{y}=y).
    • For cotinuous variable, p(x)=p(x,y)dyp(x)=\int p(x,y)dy
  • Conditional Probability
    • P(y=yx=x)=P(y=y,x=x)P(x=x)P(\text{y}=y|\text{x}=x)=\frac{P(\text{y}=y,\text{x}=x)}{P(\text{x}=x)}
    • intervention query(干預查詢)==>compute the consequences of an action.(the domain of causal modeling)
  • The Chain Rule of Conditinal Probabilities
    • P(x(1), ,x(n))=P(x(1))i=2nP(x(i), ,x(i1))P(\text{x}^{(1)},\cdots,\text{x}^{(n)})=P(\text{x}^{(1)})\prod_{i=2}^nP(\text{x}^{(i)},\cdots,\text{x}^{(i-1)})
  • Independence:
    • xx,yy,p(x=x,y=y)=p(x=x)p(y=y)\forall x\in\text{x},y\in\text{y},p(\text{x}=x,\text{y}=y)=p(\text{x}=x)p(\text{y}=y)
    • For simplify: xy\text{x}\perp\text{y}
  • Conditional Independce:
    • xx,yy,zz,p(x=x,y=yz=z)=p(x=xz=z)p(y=yz=z)\forall x\in\text{x},y\in\text{y},z\in\text{z},p(\text{x}=x,\text{y}=y|\text{z}=z)=p(\text{x}=x|\text{z}=z)p(\text{y}=y|\text{z}=z)
    • For simplify: xyz\text{x}\perp\text{y}|\text{z}
  • Expectation
    • For discrete variables, ExP[f(x)]=xP(x)f(x)\mathbb{E}_{\text{x}\sim P}[f(x)]=\sum\limits_xP(x)f(x).
    • For continuous variables, Exp[f(x)]=xp(x)f(x)\mathbb{E}_{\text{x}\sim p}[f(x)]=\int\limits_xp(x)f(x)
    • linear: Ex[αf(x)+βg(x)]=αEx[f(x)]+βEx[g(x))]\mathbb{E}_{\text{x}}[\alpha f(x)+\beta g(x)]=\alpha\mathbb{E}_{\text{x}}[f(x)]+\beta\mathbb{E}_{\text{x}}[g{(x))}]
  • Variance
    • Var(f(x))=E[(f(x)E[f(x)])2]\text{Var}(f(x))=\mathbb{E}\big[(f(x)-\mathbb{E}[f(x)])^2\big]
    • the square root of the variance is known as the standard deviation.
  • Covariance
    • Cov(f(x),g(y))=E[(f(x)E[f(x)])(g(y)E[g(y)])]\text{Cov}(f(x),g(y))=\mathbb{E}[(f(x)-\mathbb{E}[f(x)])(g(y)-\mathbb{E}[g(y)])]
    • how much two values are linearly related to each other and the scale of these variables.
    • high absolute value:
      • the values changes very much
      • far from their respective means
    • positive: both variables tend to be relatively high values
    • negative: one high and other low.
    • relationship between covariance and independence: independence==>0 covariance; 0 covariance!=> independence
    • covariance matrix:
      • For a random vector xRnx\in \mathbb{R}^n
      • Cov(x)i,j=Cov(xi,xj)\text{Cov}(\mathbf{x})_{i,j}=\text{Cov}(\text{x}_i,\text{x}_j), the diagonal elements of the covariance: Cov(xi,xi)=Var(xi)\text{Cov}(\text{x}_i,\text{x}_i)=\text{Var}(\text{x}_i).