tanszek:oktatas:techcomm:information
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
tanszek:oktatas:techcomm:information [2024/08/27 13:18] – [Redundancy] knehez | tanszek:oktatas:techcomm:information [2024/10/15 06:30] (current) – [Example of Entropy calculation] knehez | ||
---|---|---|---|
Line 15: | Line 15: | ||
If an event space consist of two equal-probability event \(p(E_1) = p(E_2) = 0.5 \) then, | If an event space consist of two equal-probability event \(p(E_1) = p(E_2) = 0.5 \) then, | ||
- | $$ I_{E_1} = I_{E_2} = \log_2 \frac{1}{0.5} = - \log_2 | + | $$ I_{E_1} = I_{E_2} = \log_2 \frac{1}{0.5} = - \log_2 |
So the unit of the information means the news value which is connected to the simple, less likely, same probability choice. | So the unit of the information means the news value which is connected to the simple, less likely, same probability choice. | ||
- | If the event system consist of ' | + | If the event system consist of ''n'' number of events and all these events have the same probability then the probability of any event is the following: |
$$ p_E = \frac{1}{n} $$ | $$ p_E = \frac{1}{n} $$ | ||
Line 69: | Line 69: | ||
Thus, redundancy is minimal when the probabilities of the events are equal. | Thus, redundancy is minimal when the probabilities of the events are equal. | ||
+ | **Example: | ||
+ | $$ E = \{E_1, E_2, E_3, E_4 \}, $$ | ||
+ | |||
+ | and the probabilities of the individual events are as follows: | ||
+ | |||
+ | $$ p = \{0.5, 0.25, 0.2, 0.05\}. $$ | ||
+ | |||
+ | The individual information content for the states of the system are: | ||
+ | |||
+ | $$ I_{E_1} = -\log_2 0.5 = 1 \, \text{[bit]}, | ||
+ | $$ I_{E_2} = -\log_2 0.25 = 2 \, \text{[bit]}, | ||
+ | $$ I_{E_3} = -\log_2 0.2 = 2.32 \, \text{[bit]}, | ||
+ | $$ I_{E_4} = -\log_2 0.05 = 4.32 \, \text{[bit]}, | ||
+ | |||
+ | **What is the entropy of the message set?** | ||
+ | |||
+ | $$ H_E = \sum_{i=1}^{4} p_i \cdot I_{E_i} = 0.5 \cdot 1 + 0.25 \cdot 2 + 0.2 \cdot 2.32 + 0.05 \cdot 4.32 = 1.68 \, \text{[bit]}. $$ | ||
+ | |||
+ | **What is the redundancy of the message set?** | ||
+ | |||
+ | Let's calculate the maximum entropy: | ||
+ | |||
+ | $$ H_{\text{max}} = \log_2 n = \log_2 4 = 2 [bit]$$ | ||
+ | |||
+ | Then, substituting into the formula for redundancy: | ||
+ | |||
+ | $$ R = 1 - \frac{H}{H_{\text{max}}} = 1 - \frac{1.68}{2} = 0.16, $$ | ||
+ | |||
+ | which means the redundancy of the event system is approximately 16%. | ||
+ | |||
+ | ==== Example of Entropy calculation ==== | ||
+ | |||
+ | Why do not store raw password strings in compiled code? | ||
+ | |||
+ | <sxh c> | ||
+ | #include < | ||
+ | #include < | ||
+ | |||
+ | float calculateEntropy(unsigned int bytes[], int length); | ||
+ | |||
+ | char sample[] = "Some poetry types are unique to particular cultures and genres and respond to yQ%v? | ||
+ | |||
+ | int main() | ||
+ | { | ||
+ | unsigned int byteCounter[256]; | ||
+ | const int windowWidth = 20; | ||
+ | |||
+ | for(int i = 0; i < sizeof(sample) - windowWidth; | ||
+ | { | ||
+ | memset(byteCounter, | ||
+ | |||
+ | char *p = & | ||
+ | char *end = & | ||
+ | |||
+ | while(p != end) | ||
+ | { | ||
+ | byteCounter[(unsigned char)(*p++)]++; | ||
+ | } | ||
+ | float entropy = calculateEntropy(byteCounter, | ||
+ | printf(" | ||
+ | } | ||
+ | |||
+ | |||
+ | } | ||
+ | |||
+ | |||
+ | float calculateEntropy(unsigned int bytes[], int length) | ||
+ | { | ||
+ | float entropy = 0.0f; | ||
+ | |||
+ | for (int i = 0; i < 256; i++) | ||
+ | { | ||
+ | if (bytes[i] != 0) | ||
+ | { | ||
+ | float freq = (float) bytes[i] / (float) length; | ||
+ | entropy += -freq * log2f(freq); | ||
+ | } | ||
+ | } | ||
+ | return entropy; | ||
+ | } | ||
+ | </ | ||
tanszek/oktatas/techcomm/information.1724764710.txt.gz · Last modified: 2024/08/27 13:18 by knehez