tanszek:oktatas:techcomm:information
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| tanszek:oktatas:techcomm:information [2024/10/15 06:29] – [Redundancy] knehez | tanszek:oktatas:techcomm:information [2025/10/14 06:25] (current) – [Entropy] knehez | ||
|---|---|---|---|
| Line 11: | Line 11: | ||
| $$ I_E = \log_2 \frac{1}{p_E} = -\log_2( p_E ) [bit] $$ | $$ I_E = \log_2 \frac{1}{p_E} = -\log_2( p_E ) [bit] $$ | ||
| - | The properties of a logarithm function play an important role in the modeling | + | Shannon used the logarithm to measure information because only the logarithmic function makes the information of independent events additive. If two independent events 𝐴 𝐵 occur, their joint probability is: \( p(A,B) = p(A) \cdot p(B) \). |
| + | |||
| + | We expect that the total information should add up: | ||
| + | |||
| + | $$ I(A,B) = I(A) + I(B) $$ | ||
| + | |||
| + | Only the logarithm satisfies this property: | ||
| + | |||
| + | $$ I(p) = -\log p \quad \Rightarrow \quad I(A,B) = -\log(p(A)p(B)) = I(A) + I(B) $$ | ||
| + | |||
| + | If we used \( I(p) = 1/p \), the values would multiply, not add. | ||
| + | |||
| + | The properties of a logarithm function play an important role in modeling the quantitative properties of a given information. | ||
| If an event space consist of two equal-probability event \(p(E_1) = p(E_2) = 0.5 \) then, | If an event space consist of two equal-probability event \(p(E_1) = p(E_2) = 0.5 \) then, | ||
| Line 33: | Line 45: | ||
| The average information content of the set of messages is called the //entropy// of the message set. | The average information content of the set of messages is called the //entropy// of the message set. | ||
| - | $$ H_E = \sum_{i=1}^n p_i \cdot I_{E_i} = \sum_{i=1}^n p_i \cdot \log_2 \frac{1}{p_i} = - \sum_{i=1}^n p_i \cdot \log_2 p_i$$ | + | $$ H_E = \sum_{i=1}^n p_i \cdot I_{E_i} = \sum_{i=1}^n p_i \cdot \log_2 \frac{1}{p_i} = - \sum_{i=1}^n p_i \cdot \log_2 p_i [bit]$$ |
| **Example**: | **Example**: | ||
| Line 46: | Line 58: | ||
| Entropy can also be viewed as a measure of the information " | Entropy can also be viewed as a measure of the information " | ||
| + | |||
| + | Example: | ||
| + | |||
| + | * If a source always sends the same letter (“AAAAA…”), | ||
| + | * If every letter occurs with equal probability (e.g., random characters), | ||
| This concept is crucial in various fields, including //data compression//, | This concept is crucial in various fields, including //data compression//, | ||
| Line 101: | Line 118: | ||
| ==== Example of Entropy calculation ==== | ==== Example of Entropy calculation ==== | ||
| + | |||
| + | Why do not store raw password strings in compiled code? | ||
| <sxh c> | <sxh c> | ||
| #include < | #include < | ||
| #include < | #include < | ||
| + | #include < | ||
| - | float calculateEntropy(unsigned | + | float calculateEntropy(unsigned |
| - | char sample[] = "Some poetry types are unique to particular cultures and genres and respond to yQ%v? | + | int main(void) { |
| + | const char sample[] = | ||
| + | | ||
| + | "and respond to yQ%v? | ||
| + | "poet writes. Readers accustomed to identifying poetry with Dante, | ||
| + | "Goethe, Mickiewicz, or Rumi may think of it as written in lines " | ||
| + | "based on rhyme and regular meter. There are, however, traditions, | ||
| + | "such as Biblical poetry and alliterative verse, that use other " | ||
| + | "means to create rhythm and euphony. Much modern poetry reflects | ||
| + | "a critique of poetic tradition, testing the principle of euphony | ||
| + | "itself or altogether forgoing rhyme or set rhythm."; | ||
| - | int main() | ||
| - | { | ||
| - | unsigned int byteCounter[256]; | ||
| const int windowWidth = 20; | const int windowWidth = 20; | ||
| + | unsigned char counts[256]; | ||
| - | | + | int sampleLength |
| - | { | + | |
| - | memset(byteCounter, | + | |
| - | char *p = & | + | for (int start = 0; start <= sampleLength - windowWidth; start++) { |
| - | char *end = & | + | memset(counts, |
| - | | + | |
| - | | + | for (int j = 0; j < windowWidth; |
| - | | + | |
| + | counts[c]++; | ||
| } | } | ||
| - | | + | |
| - | printf(" | + | |
| + | printf(" | ||
| } | } | ||
| + | return 0; | ||
| } | } | ||
| - | + | float calculateEntropy(unsigned | |
| - | float calculateEntropy(unsigned | + | |
| - | { | + | |
| float entropy = 0.0f; | float entropy = 0.0f; | ||
| - | + | | |
| - | | + | if (counts[i] > 0) { |
| - | | + | float freq = (float)counts[i] / length; |
| - | if (bytes[i] != 0) | + | entropy |
| - | | + | |
| - | float freq = (float) | + | |
| - | entropy | + | |
| } | } | ||
| } | } | ||
| return entropy; | return entropy; | ||
| } | } | ||
| + | |||
| </ | </ | ||
tanszek/oktatas/techcomm/information.1728973751.txt.gz · Last modified: 2024/10/15 06:29 by knehez
