Differences

This shows you the differences between two versions of the page.

--- science:phd-notes:2025-05-13-porter-thomas-fluctuations [2025/05/13 12:28] – Add intro to the PT distribution jon-dokuwiki
+++ science:phd-notes:2025-05-13-porter-thomas-fluctuations [2025/05/26 12:43] (current) – jon-dokuwiki
@@ Line 21: / Line 21: @@
 $$
-and the distribution of $y$ values are hypothesised to follow the $\chi^2_1$ distribution, aka. the Porter-Thomas distribution.
+and the distribution of $y$ values are hypothesised to follow the $\chi^2_1$ distribution, aka. the Porter-Thomas distribution. In the following figure we see an example of $B$ values plotted as a histogram and scaled to the height of the PT-distribution to show the resemblance.
+{{ :science:phd-notes:v50_porter_thomas_j_e1_m1.png |}}
+==== Porter-Thomas fluctuations ====
+... is just really a fancy way of saying how much we expect $y$ values to vary. The PDF of the PT distribution is given by
+$$
+g(x) = \dfrac{1}{\sqrt{2 \pi x}}e^{-x/2}, \quad x > 0,
+$$
+with a mean of 1 and a variance of 2. Just check [[https://en.wikipedia.org/wiki/Chi-squared_distribution|the Wikipedia page]] if you don't believe me. Let us now invoke the almighty Central Limit Theorem (CLT)! Let us now draw a value from the PT distribution and we'll name it $X_1$. Suppose we want to know the sample average
+$$
+\bar{X}_n = \dfrac{X_1 + ... + X_n}{n}.
+$$
+The law of large numbers tells us that the sample average will converge to the expected value $\mu$ as $n$ goes to infinity. The CLT states that as $n$ gets larger, the distribution of $\bar{X}_n$ gets arbitrarily close to the normal distribution with a mean of 1 and a variance of $2/n$ (The PT distribution has a mean of 1 and a variance of 2).
+Let us quickly check that this is true! Let's say that $n = 1000$ and with some quick Python magic:
+<code>
+>>> from scipy.stats import chi2
+>>> n = 1000
+>>> sum(chi2.rvs(df=1, size=n))/n
+.013582747288161
+</code>
+Pretty close to 1 that is.
+<code>
+>>> draws = [sum(chi2.rvs(df=1, size=n))/n for _ in range(100000)]
+>>> np.mean(draws), np.var(draws), 2/n
+(0.9999389145605803, 0.002000149052396594, 0.002)
+</code>
+Mic drop?
+Now! How can we use this information to determine how much $y$ should vary? And what does //vary// even mean here? Vary-ance maybe. If $y$ is PT-distributed, then $y$ has a variance of 2. The variance is a measure of dispersion; a measure of how far a set of numbers is spread out from their average value. In mathematical terms, the variance of a random variable $X$ is the expected value of the squared deviation from the mean of $X$:
+$$
+\text{Var}(X) = E[(X - \mu)^2].
+$$
+So maybe what we want is to check that the variance of the $B$ distribution is (close to) 2? We can also draw a bunch of values from the distribution and check that the variance of the mean of all the $n$ draws are indeed equal to $2/n$, as the CLT predicts is true.