Chat with us    Hi Andrius, Hope your August bicycle trip went well. I am very interested on your Entropy page but I need some help with the definitions. 
     
Econet / Entropy     Search:

Free Energy Principle, Active Inference, Energy

Ambiguity = Expected Value of Knowledge Gained = Entropy

The expected value of the knowledge gained (information learned, surprisal, self-information, information content) {$I(p)$} of a variable {$Z\in [n]$} is the ambiguity (entropy) {$H(Z)=\sum_{i\in [n]}p_i I(p_i) = \sum_{i\in [n]}p_i\log \frac{1}{p_i}$}.

  • Low ambiguity = the knowledge gained is expected to be little
  • High ambiguity = the knowledge gained is expected to be large
  • No such thing as negative ambiguity. This would mean that knowledge is expected to be lost. Yet life seems to decrease ambiguity, which means knowledge is shed into the environment.
  • Probability implies an investigatory window where a question (of what will happen) is replaced by an answer (of what did happen), indicating the passage of time (an event) and implying an observer and their knowledge gained.
  • Entropy differs from energy in that {$P(x,y)$} in the latter gets replaced by {$Q(x)$}, yielding a degeneracy whereby nothing is achieved, as when the model equals what it is modeling,

Axioms for {$I(p)$}

{$I(1)=0$}If an event is certain, then there is no knowledge gained.
If {$p_1 > p_2$} then {$I(p_1) < I(p_2)$}If the probability increases, then the knowledge gained decreases or stays the same.
{$I(p_1p_2)=I(p_1)+I(p_2)$}The knowledge gained from two independent events is the sum of the knowledge gained from each of them.
{$I(p)$} is a twice continuously differentiable function of {$p$}Knowledge gained, as a function of probability, is twice continuously differentiable.

Given:

  • a finite set {$ [n]$}
  • a partition {$E = E_1 \sqcup E_2 \dots E_{r-1} \sqcup E_r$} of {$ [n]$}. Note that {$\sum_{j=1}^r |E_j| = n$}.
  • a uniform distribution {$Y$} on a finite set {$ [n]$}. For all {$i\in [n]$} we have {$Y(i)=\frac{1}{n}$}. Thus {$\sum_{j=1}^r \frac{1}{n} = 1$}.
  • a distribution {$Z$} on {$ [n]$} such that for all {$i\in E_j$} we have {$Z(i)=p_j=\frac{|E_j|}{n}$}. Note that {$\sum_{j=1}^r p_j = \sum_{j=1}^r\frac{|E_j|}{n}=1$}.

Note that {$k$} likelihoods of a single value each with probablity {$\frac{1}{n}$} being chosen is equivalent to choosing one of {$k$} values each with probability {$\frac{1}{n}$}. It's just a matter of relabeling.

Therefore:

{$H[Y]$}{$=H[Z]+H[Y|Z]$}By construction. We construct {$Y$} by starting with {$Z$} and consider uniform distributions within each part of {$E$}
 {$=H[Z]+\sum_{i\in [r]}p_i H[Y|Z=i]$}The key point: Breaking up the entropy in terms of the probability for each value of {$Z$}
 {$=H[Z]+\sum_{i\in [r]}p_i H[Y|Y\in E_i]$}Rethinking {$Z$} in terms of {$Y$}
 {$=H[Z]+\sum_{i\in [r]}p_i(\log |E_i|)$}the entropy of a uniform distribution is given by the logarithm
 {$=H[Z]+\sum_{i\in [r]}p_i(\log p_in)$}because {$|E_i|=p_in$}
 {$=H[Z]+\sum_{i\in [r]}p_i(\log p_i + \log n)$}log of product is sum of logs
 {$=H[Z]+(\log n)\sum_{i\in [r]}p_i + \sum_{i\in [n]}p_i(\log p_i)$}distributing and pulling out the constant
 {$=H[Z]+\log n + \sum_{i\in [r]}p_i\log p_i$}sum of probabilities is {$\sum_{i\in [r]}p_i=1$}
{$\log n$}{$=H[Z]+\log n + \sum_{i\in [r]}p_i\log p_i$}the entropy of a uniform distribution is given by the logarithm
{$H[Z]$}{$=-\sum_{i\in [r]}p_i\log p_i$}subtracting
{$H[Z]$}{$=\sum_{i\in [r]}p_i\log \frac{1}{p_i}$}inverting {$p_i$} and removing the minus sign

Sources