tanszek:oktatas:techcomm:information
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
tanszek:oktatas:techcomm:information [2024/08/27 13:02] – [Entropy] knehez | tanszek:oktatas:techcomm:information [2024/10/15 06:30] (current) – [Example of Entropy calculation] knehez | ||
---|---|---|---|
Line 9: | Line 9: | ||
So the f function was selected according to Shannon' | So the f function was selected according to Shannon' | ||
- | IE=log21pE=−log2(pE)[bit] | + | $$ I_E = \log_2 \frac{1}{p_E} = -\log_2( p_E ) [bit] $$ |
The properties of a logarithm function play an important role in the modeling procedure of the quantitative properties of a given information. | The properties of a logarithm function play an important role in the modeling procedure of the quantitative properties of a given information. | ||
Line 15: | Line 15: | ||
If an event space consist of two equal-probability event p(E1)=p(E2)=0.5 then, | If an event space consist of two equal-probability event p(E1)=p(E2)=0.5 then, | ||
- | $$ I_{E_1} = I_{E_2} = log_2 \frac{1}{0.5} = - log_2 2 = 1 [bit] $$ | + | $$ I_{E_1} = I_{E_2} = \log_2 \frac{1}{0.5} = - \log_2 0.5 = 1 [bit] $$ |
So the unit of the information means the news value which is connected to the simple, less likely, same probability choice. | So the unit of the information means the news value which is connected to the simple, less likely, same probability choice. | ||
- | If the event system consist of ' | + | If the event system consist of ''n'' number of events and all these events have the same probability then the probability of any event is the following: |
pE=1n | pE=1n | ||
Line 25: | Line 25: | ||
In these cases, the news value which is connected to these events will be the following: | In these cases, the news value which is connected to these events will be the following: | ||
- | IE=log21pE=log2(n)[bit] | + | $$ I_E = \log_2 \frac{1}{p_E} = \log_2 (n) [bit] $$ |
==== Entropy ==== | ==== Entropy ==== | ||
Line 34: | Line 34: | ||
HE=n∑i=1pi⋅IEi=n∑i=1pi⋅log21pi=−n∑i=1pi⋅log2pi | HE=n∑i=1pi⋅IEi=n∑i=1pi⋅log21pi=−n∑i=1pi⋅log2pi | ||
+ | |||
+ | **Example**: | ||
+ | |||
+ | HE=−[p1⋅logp1+(1−p1)⋅log(1−p1)] | ||
+ | |||
+ | This function can be represented as follows: | ||
+ | |||
+ | {{: | ||
+ | |||
+ | We can see that entropy is highest when the two events are equally likely. In general, in this model, entropy is low when our event system includes events with low probabilities. | ||
+ | |||
+ | Entropy can also be viewed as a measure of the information " | ||
+ | |||
+ | This concept is crucial in various fields, including //data compression//, | ||
+ | |||
+ | ==== Redundancy ==== | ||
+ | |||
+ | The average information content of a message set describing an equally probable, completely random set of events is the highest. In contrast, the average information content of a message set describing a completely ordered, i.e., fully known event set, is the lowest. | ||
+ | |||
+ | A probability distribution that deviates from the maximum possible entropy leads to a message set that is redundant. | ||
+ | |||
+ | The measure of redundancy is: | ||
+ | |||
+ | R=Hmax−HHmax=1−HHmax | ||
+ | |||
+ | If the event space consists of //n// equally probable events: | ||
+ | |||
+ | Hmax=log2nandR=1−H(p1,...,pn)log2n | ||
+ | |||
+ | Redundancy plays a significant role in information theory. Redundancy enables the secure communication of messages over a noisy channel. The redundancy of human verbal communication is typically more than 30%. The changes in redundancy for a two-event message set are illustrated in the figure below: | ||
+ | |||
+ | {{: | ||
+ | |||
+ | Thus, redundancy is minimal when the probabilities of the events are equal. | ||
+ | |||
+ | **Example: | ||
+ | |||
+ | E={E1,E2,E3,E4}, | ||
+ | |||
+ | and the probabilities of the individual events are as follows: | ||
+ | |||
+ | p={0.5,0.25,0.2,0.05}. | ||
+ | |||
+ | The individual information content for the states of the system are: | ||
+ | |||
+ | IE1=−log20.5=1[bit], | ||
+ | IE2=−log20.25=2[bit], | ||
+ | IE3=−log20.2=2.32[bit], | ||
+ | IE4=−log20.05=4.32[bit], | ||
+ | |||
+ | **What is the entropy of the message set?** | ||
+ | |||
+ | HE=4∑i=1pi⋅IEi=0.5⋅1+0.25⋅2+0.2⋅2.32+0.05⋅4.32=1.68[bit]. | ||
+ | |||
+ | **What is the redundancy of the message set?** | ||
+ | |||
+ | Let's calculate the maximum entropy: | ||
+ | |||
+ | Hmax=log2n=log24=2[bit] | ||
+ | |||
+ | Then, substituting into the formula for redundancy: | ||
+ | |||
+ | R=1−HHmax=1−1.682=0.16, | ||
+ | |||
+ | which means the redundancy of the event system is approximately 16%. | ||
+ | |||
+ | ==== Example of Entropy calculation ==== | ||
+ | |||
+ | Why do not store raw password strings in compiled code? | ||
+ | |||
+ | <sxh c> | ||
+ | #include < | ||
+ | #include < | ||
+ | |||
+ | float calculateEntropy(unsigned int bytes[], int length); | ||
+ | |||
+ | char sample[] = "Some poetry types are unique to particular cultures and genres and respond to yQ%v? | ||
+ | |||
+ | int main() | ||
+ | { | ||
+ | unsigned int byteCounter[256]; | ||
+ | const int windowWidth = 20; | ||
+ | |||
+ | for(int i = 0; i < sizeof(sample) - windowWidth; | ||
+ | { | ||
+ | memset(byteCounter, | ||
+ | |||
+ | char *p = & | ||
+ | char *end = & | ||
+ | |||
+ | while(p != end) | ||
+ | { | ||
+ | byteCounter[(unsigned char)(*p++)]++; | ||
+ | } | ||
+ | float entropy = calculateEntropy(byteCounter, | ||
+ | printf(" | ||
+ | } | ||
+ | |||
+ | |||
+ | } | ||
+ | |||
+ | |||
+ | float calculateEntropy(unsigned int bytes[], int length) | ||
+ | { | ||
+ | float entropy = 0.0f; | ||
+ | |||
+ | for (int i = 0; i < 256; i++) | ||
+ | { | ||
+ | if (bytes[i] != 0) | ||
+ | { | ||
+ | float freq = (float) bytes[i] / (float) length; | ||
+ | entropy += -freq * log2f(freq); | ||
+ | } | ||
+ | } | ||
+ | return entropy; | ||
+ | } | ||
+ | </ | ||
+ |
tanszek/oktatas/techcomm/information.1724763728.txt.gz · Last modified: 2024/08/27 13:02 by knehez