Differences

This shows you the differences between two versions of the page.

--- tanszek:oktatas:techcomm:information [2024/08/27 13:18] – [Redundancy] knehez
+++ tanszek:oktatas:techcomm:information [2024/10/15 06:30] (current) – [Example of Entropy calculation] knehez
@@ Line 15: / Line 15: @@
 If an event space consist of two equal-probability event \(p(E_1) = p(E_2) = 0.5 \) then,
-$$ I_{E_1} = I_{E_2} = \log_2 \frac{1}{0.5} = - \log_2 2 = 1 [bit] $$
+$$ I_{E_1} = I_{E_2} = \log_2 \frac{1}{0.5} = - \log_2 0.5 = 1 [bit] $$
 So the unit of the information means the news value which is connected to the simple, less likely, same probability choice.
-If the event system consist of 'n' number of events and all these events have the same probability then the probability of any event is the following:
+If the event system consist of ''n'' number of events and all these events have the same probability then the probability of any event is the following:
 $$ p_E = \frac{1}{n} $$
@@ Line 69: / Line 69: @@
 Thus, redundancy is minimal when the probabilities of the events are equal.
+**Example:** The occurrence probabilities of a system consisting of four events are as follows:
+$$ E = \{E_1, E_2, E_3, E_4 \}, $$
+and the probabilities of the individual events are as follows:
+$$ p = \{0.5, 0.25, 0.2, 0.05\}. $$
+The individual information content for the states of the system are:
+$$ I_{E_1} = -\log_2 0.5 = 1 \, \text{[bit]}, $$
+$$ I_{E_2} = -\log_2 0.25 = 2 \, \text{[bit]}, $$
+$$ I_{E_3} = -\log_2 0.2 = 2.32 \, \text{[bit]}, $$
+$$ I_{E_4} = -\log_2 0.05 = 4.32 \, \text{[bit]}, $$
+**What is the entropy of the message set?**
+$$ H_E = \sum_{i=1}^{4} p_i \cdot I_{E_i} = 0.5 \cdot 1 + 0.25 \cdot 2 + 0.2 \cdot 2.32 + 0.05 \cdot 4.32 = 1.68 \, \text{[bit]}. $$
+**What is the redundancy of the message set?**
+Let's calculate the maximum entropy:
+$$ H_{\text{max}} = \log_2 n = \log_2 4 = 2 [bit]$$
+Then, substituting into the formula for redundancy:
+$$ R = 1 - \frac{H}{H_{\text{max}}} = 1 - \frac{1.68}{2} = 0.16, $$
+which means the redundancy of the event system is approximately 16%.
+==== Example of Entropy calculation ====
+Why do not store raw password strings in compiled code?
+<sxh c>
+#include <stdio.h>
+#include <math.h>
+float calculateEntropy(unsigned int bytes[], int length);
+char sample[] = "Some poetry types are unique to particular cultures and  genres and respond to yQ%v?FY}ZT=/5cJ.m~A{9^8Lse characteristics of the language in which the poet writes. Readers accustomed to identifying poetry with Dante, Goethe, Mickiewicz, or Rumi may think of it as written in lines based on rhyme and regular meter. There are, however, traditions, such as Biblical poetry and alliterative verse, that use other means to create rhythm and euphony. Much modern poetry reflects a critique of poetic tradition,[6] testing the principle of euphony itself or altogether forgoing rhyme or set rhythm";
+int main()
+{
+    unsigned int byteCounter[256];
+    const int windowWidth = 20;
+    for(int i = 0; i < sizeof(sample) - windowWidth; i++)
+    {
+        memset(byteCounter, 0, sizeof(unsigned int) * 256);
+        char *p = &sample[i];
+        char *end = &sample[i + windowWidth];
+        while(p != end)
+        {
+            byteCounter[(unsigned char)(*p++)]++;
+        }
+        float entropy = calculateEntropy(byteCounter, windowWidth);
+        printf("%d - %.3f\n", i, entropy);
+    }
+}
+float calculateEntropy(unsigned int bytes[], int length)
+{
+    float entropy = 0.0f;
+    for (int i = 0; i < 256; i++)
+    {
+        if (bytes[i] != 0)
+        {
+            float freq = (float) bytes[i] / (float) length;
+            entropy += -freq * log2f(freq);
+        }
+    }
+    return entropy;
+}
+</sxh>