Entropy is a measure of information content in a given probability distribution. Shannon's 1948 paper demonstrated how entropy and information are related, and how entropy can be calculated using a continuous, monotonically increasing and self-consistent functional form. Shannon's axioms were then applied to the self-consistency equation and the concept of entropy, which was established by Clausius, reconciled with the statistical properties of a microscopic system. Entropy is a powerful tool to understand the rate of change of physical entropy with energy.

In a 'random experiment' or process, information is the probability of n outcomes. Entropy and information are related in that the definition of information is the logarithm of one over the probability, meaning rare outcomes convey more information. To illustrate this, the speaker gives an example of the game of 16, where the listener has an increased chance of guessing correctly if the outcome is rare. This shows how the definition of information works.

In a game of 16, the logarithm base used was two and the unit of information was a bit. The Craft William theorem states that longer words are needed to encode the same information with a short alphabet, while shorter words can be used with a longer one. Submarine is a simplified version of Battleships, with a 16-square board and a single submarine square. The probability of a miss on the first guess is 15/16, with 0.9 bits of information revealed, while a hit has a probability of 1/16 and reveals 4 bits. On the second turn, a miss has a probability of 14/15 and reveals 0.995 bits, while a hit reveals 3.91 bits, adding up to 4 bits in total.

Shannon's 1948 paper demonstrated that entropy is an average measure of information content in a given probability distribution. Intuitively, this can be understood by considering the example of a submarine game, where the probability of finding the submarine is 1/16. As turns progress, the sum of logs of the probability of miss and the probability of hit can be calculated, giving the sum of logs of the expected information content. By the end of the game, the expected information content is 4 bits, which matches the intuition that the submarine's coordinate is a number between 0 and 15. Entropy is continuous, monotonically increasing and self-consistent.

Shannon proposed a functional form of HP which satisfies a given probability distribution. This form is based on self-consistency and the continuity and monotonic increasing properties. A monotonically increasing Axiom assumption is used to define an 'a of n' which is the information revealed when all outcomes are equally likely. The proof of the theorem shows that the quantity a n is equal to h of 1 over n, n times, and a non-trivial example is given to illustrate the self-consistency property.

Shannon's axioms of continuity, monotonic increase and self-consistency are discussed, with the latter applied to a total of n outcomes organized into n groups. Self-consistency states that a to the m equals m a s, which is written as h of P1 P2 ... PN. He shows that the functional form applied to some integer must be a constant times the logarithm of that integer, and combines the sum over the ni with the constant and the logarithm of the same thing. Shannon then discusses how to find an M that sandwiches T to the N for any other S given T to the N, taking logs and dividing by n log s to give M over n is less than or equal to log T over log s. He then applies the information measure and cancels Epsilon to give the absolute value of M over n minus a of T over a of s is less than or equal to Epsilon. For any capital N, there exists a pair Little M comma little n with little n greater than or equal to capital n and Little M and N satisfying the inequality.

Clausius established the concept of entropy which describes the rate of change of physical entropy with energy. It is related to the macroscopic properties of a gas at a constant volume and particle number and was reconciled when considering the statistical properties of the microscopic description of the system. The self-consistency equation was used to show that a continuous function agreeing with C(t) on the rationals must be equal. This equation has relevance to physics and the etymology of the word is unknown, but it is believed to have been coined by Clausius.

Russell Goiter works for a financial technology company called Zaffin. He has a background in physics and is interested in statistical and singular learning theory. He is giving a guided tour of the concept of entropy, looking at it from an information theoretic point of view. Entropy and information are intimately related, and Goiter believes they are actually the same thing. He starts by discussing information, which is the probability of n outcomes in a 'random experiment' or process. He then looks at how this relates to entropy.

The speaker is introducing a definition of information associated with an outcome that occurs with a given probability. The definition is the logarithm of one over the probability, which ensures that rare outcomes convey more information. To illustrate this, the speaker gives an example of the game of 16, where the speaker is thinking of an integer between 0 and 15 and the listener must guess it. The listener has an increased chance of guessing correctly if the outcome is rare. This is an example of how the definition of information works.

A game of 16 was played, where each question revealed one of the bits in the binary representation of a number. The probability of each question was one half, and the amount of information gained was one bit. The logarithm base used was two, and the unit of information was a bit. The Craft William theorem was discussed, stating that if the alphabet is short, then longer words are required to encode the same information, but if the alphabet is long, then shorter words are needed.

Submarine is a simplified version of the board game Battleships. The board consists of 16 squares and the submarine is one square in size. The probability of a miss on the first guess is 15/16 and the information revealed by a miss is 0.9 bits. If the first guess is a hit, the probability is 1/16 and the information revealed is 4 bits. On the second turn, the probability of a miss is 14/15 and the information revealed is 0.995 bits. If the second guess is a hit, the information revealed is 3.91 bits, adding up to 4 bits in total.

The speaker is discussing a game of 16, where the player needs to find a submarine in a 4x4 grid. After 8 turns, two bits of information have been accumulated, and after 14 turns, three bits of information have been accumulated. On the 15th turn, all four bits of information have been accumulated. Special turns where half or three quarters of the squares have been eliminated also give round numbers of accumulated information. The speaker explains that the sum of logs of the misses and the hit on the nth turn can be calculated, giving the sum of logs of the probability of miss on the nth turn and the probability of hit.

Shannon's 1948 paper showed that entropy is an average measure of information content in a given probability distribution. It is continuous, monotonically increasing and self-consistent. Intuitively, it can be understood by considering the example of a submarine game, where the probability of finding the submarine is 1/16. By the end of the game, the expected information content is 4 bits, which matches the intuition that the submarine's coordinate is a number between 0 and 15.

Shannon proposed a functional form of HP which satisfies a given probability distribution. This form is based on self-consistency, meaning that the functional form of HP should be the same when the same information is revealed in stages. For example, when three outcomes have probabilities of one-half, one-third and one-sixth, the information gained from the first stage will be one-half, and the information gained from the second stage will be a half of two-thirds, which is one-third, and a half of one-third, which is one-sixth. This form is also based on the continuity property and the monotonic increasing property.

In the second experiment, a monotonically increasing Axiom assumption is used to define an 'a of n' which is the information revealed when all outcomes are equally likely. The proof of the theorem is short, taking up roughly one page. To illustrate the self-consistency property, a non-trivial example is given with probabilities P1, Q1, P2, Q2, etc. The proof shows that the quantity a n is equal to h of 1 over n, n times. This is the sort of thing the professor had in mind.

Shannon's Axioms 1, 2 and 3 of continuity, monotonic increase and self-consistency are discussed. Self-consistency is focused on, which states that a to the m equals m a s. This is applied to a total of n outcomes organized into n groups, with the probability of hitting the ith group being P. The self-consistency property can be written as h of P1 P2 ... PN, which is the functional form needed to be discovered. This is then summed over to get the desired result.

Shannon's axioms state that a functional form must be increasing and monotonic. He shows that the functional form applied to some integer must be a constant times the logarithm of that integer. This constant must be positive due to the increasingness of the functional form. Combining the sum over the ni with the constant and the logarithm of the same thing leads to HP being equal to a minus sign and pi. The only missing piece is where the answer comes from, which will be explained in the second part of the transcript.

Shannon is discussing how to find an M that sandwiches T to the N for any other S given T to the N, where both S and T are positive rational numbers. He takes logs of both sides and divides by n log s to give M over n is less than or equal to log T over log s, which is less than or equal to m plus 1 over n. He then applies the information measure to these values and cancels Epsilon to give the absolute value of M over n minus a of T over a of s is less than or equal to Epsilon. For any capital N, there exists a pair Little M comma little n with little n greater than or equal to capital n and Little M and N satisfying the inequality.

Given two arbitrary functions of s and t, if the first inequality holds, then by multiplying another s, the left-hand side can become larger than the right-hand side of the st inequality. This means that the left-hand side must be zero, regardless of the size of n or the epsilon. This implies that the two arbitrary functions of s and t must be equal.

The speaker discussed the self-consistency equation in which a continuous function C(t) is defined for arbitrary s and t. This function is equal to a constant, and the speaker used the density of the rationals in the reals to show that a continuous function agreeing with C(t) on the rationals must be equal. The speaker then hinted that this equation has relevance to physics.

Entropy is a concept established by Clausius which describes the rate of change of something called physical entropy with energy. It is related to the macroscopic properties of a gas at a constant volume and particle number. This relationship was reconciled when considering the statistical properties of the microscopic description of the system. Entropy is an amazing example of a coherent theory developed from a confusing and lack of Foundations. The etymology of the word is unknown, but it is believed to have been coined by Clausius.

as it's my first time uh speaking in one of these seminars I thought I'd just give a quick uh introduction my name is Russell goiter I I'm not in Academia I work for a company called zaffin it's a financial technology company my team applies mathematics to various problems in finance including pricing derivatives but also helping Banks kind of understand um their data better I come from a a place of physics originally that's my academic background I'm not a working physicist these days um I'm not I've never been an expert in statistical statistical mechanics um you know but I am very interested in statistical and singular learning theory I think the connection with you know statmec is um fascinating and entropy um is a concept that is kind of at the heart of the connection uh in various ways you know and so my goal here is to kind of work on our intuition for the concept of entropy um I as you can see in the notes that I posted in the SLT Channel I have just kind of drawn together a few of my favorite sources on the topic um and I'm hoping to give a bit of a guided tour and I hope you know I hope it's useful and maybe uh I don't know some sort of light relief or entertainment or complimentary somehow to the kind of main thread if you like of this seminar which is um more mathematical so the overall structure here we go you know physical and information in for Mage entropy the overall game plan here is to kind of do to to do this one first look at um entropy from an information theoretic point of view and then do this one and then um well already by the time we get uh to sort of the the the latter stages of number two here connections between the two will be um clear and emerging but I think we can kind of focus further on those connections um and uh yeah look look at how intimately they are related I mean they're really without wanting to you know uh give you any spoilers well that's those words aren't really well chosen here's here's a huge spoiler um you know I think they aren't really different things um they are the same thing um and I think that will uh come through quite strongly as we as we look at the topic Okay so let's start with not entropy but information right so I I have n uh outcomes of some in quotes random experiment or some process you know I have some uh probabilities the probability of the ith outcome is pi right so uh I guess um it was true where I is an integer called Stephen gear and I want to say that the information associated with

uh observing the ith outcome that occurs with probability pi is the logarithm of one over pi and I'm not I'm giving you um a plausible definition of the information associated with that um outcome uh it's not kind of rigorously uh it doesn't come from anywhere deeper than us just wanting an additive measure for compound events right so if I have um if my experiment is actually you know n uh has n times M outcomes with probability p i and QJ and these events are independent then the probability of the I jth outcome is the product of Pi and QJ and if I have a logarithm in my definition of information then um the extra information I learn by discovering the QJ part of that product is added to the information I've got from the from the pi part so that's what the log of logarithm does for us and this part um you know the reciprocal here ensures that um you know that uh surprises are teachable moments as it were right so there's more information when we're surprised if what we see happening is what always happens then we haven't learned as much as if what happens is an unusual event so rare outcomes convey more information and so I just like to like this was covered implicitly in an earlier seminar given by by then um you know on channel Source coding theorem with a nice follow-up on the crafting villain theorem um but I I listen to both and I still found that I benefited from um kind of going through a couple of examples to understand the sense in which this definition of information makes sense so I'd like to just quickly take you through those examples right now um maybe a brief pause for any questions or comments so far you're all still there we're okay yeah okay yep maybe it's a bit ahead of time but I guess what you've motivated is to have what some increasing function of one over Pi there but it need not be the identity do you have is there any good like physical reason to to want one over pi as opposed to I don't know uh some other function of that um yes um and we'll get there soon that so that's so so that so the those those reasons I mean so I think the ultimate formula that we know and love for information entropy of a distribution is uniquely determined by Shannon's three axioms so I think the answer lies in those three axioms so the first example um to kind of build our intuition muscles for this idea of information is similar to 20 questions um it is uh the game of 16. Okay so I'm thinking of an integer n that is from 0 to 15. and you need to guess what I'm thinking

of based on asking the questions right and the questions that uh you should answer are as follows the first is is n greater than or equal to eight is with an s um and I'm going to just play it through with a couple of values here we'll do 13 and 6. right so that's a yes and that's a no in fact to uh no yeah that's fine we'll keep going like this okay and then is uh n mod eight is that an acceptable notation among you people for mod a the modulus yeah I've never seen that before that's interesting okay good so in that case I'm going to get rid of it and I'm going to use author program is Among Us n mod h uh greater than or equal to four is is n OD four greater than or equal to two and is n mode 2 greater than zero right so so um V remainder after dividing eight into 13 is 5 and that is greater than four so that's a yes um and mod four is one Which is less than two and so that's a no um and then n mod 2 is is one so that's a yes and just for Speed um from my notes I have no yes yes no right so these are um like in in each question we are revealing uh one of the bits in the binary representation of these numbers all right so 13. is um is one one zero one six is zero one one zero and like if if each uh yeah if each integer choice that I make in this game is equally probable then the uh the outcome of each question um has the same probability p i of one half and so the so h i is log 2 which if my logarithm base is to um is an amount of information of one bit right so I haven't put a two here yet and I don't need to of course right that's just units um if it's an e I've heard the unit called um the NAT n a n n a t for natural logarithm based information unit or whatever I guess um and you know the the craft William theorem says that um you know that that uh if if you're um if you're alphabet is short then you need longer words but if you have a very long alphabet then you can get away with shorter words to encode the same information um we could use logarithm to the base 10 here and then I suppose the unit of information would be the digit we could use 26 as the base of the logarithm in which case the unit might be the letter or the um I don't know uh you know Roman Latin alphabet letter and so on and so forth but I think bits is a pretty good um Choice here so I'm going to put a two Okay so that's the game of 16. that's a kind of starter and I I again will maybe I won't leave the board but I'll pause for any questions or thoughts before uh moving on to a more non like a less trivial

case uh you know where the pis aren't all the same um which I think is a bit more instructive but this is perhaps a good warm-up okay so I'm going to hit the Red X leave that board and have a new board great um actually would you mind Dan putting the first image up here yeah sure so this is the the game of submarine and it's very similar to the game of battleships can I take a quick poll among you um who knows this game battleships it's a board game yep okay okay good all right so so this is um a simplified version of battleships where um a submarine which is one square in size is hiding somewhere in the board um and uh it's one-sided again um so you just keep guessing squares and I tell you you know one by one whether you've got a hit or a miss um and and that's a miss uh that's an X and then the check here is a hit okay so we actually know um that the submarine is in that is in that square Okay so There are 16 squares and so the probability of a Miss as the X on the first guess is 15 16. and the information revealed by a Miss on the first guess is you know log 2 1 over that so 16 over 15. which is about you know zero point nine bits right it's about a tenth of a bit that you learn from from that information sorry from that outcome in contrast if on the first um turn you're lucky enough to get a hit well that occurs with probability 1 16. and the information derived from that outcome is you know 1 over 1 over 16 is 16. is four bits which makes sense because the location of the submarine is you know can be represented parametrized I suppose by um a uh a number right from our previous game 0 to 15. and we know that uh you know we can represent such a number with four bits so I think that makes sense now on the second turn like we can keep going um if uh we didn't get a hit on the first turn we've got the probability of a miss on the second turn being 14 15 now because we've eliminated one of the um you know the the one of the squares that one on the first try and let's suppose that we guess here we got a little more information now um h x 2 turns out to be well it's a lot of the base 2 of 15 over 14. which is another approximately tenth of a bit it turns out to be 995 something something rather um now if we're lucky enough to get a hit on the second turn then the information we learn from that is log 2 15 over 1. which is about three point nine one bits um and if you add uh that to that you get exactly four bits again because you've discovered the location of the

screen you've learned a a number between 0 and 15. so would you mind uh down the next board the next image on the next board we're just going to kind of cut to the Chase and jump to the eighth turn and consider another few special turns I see it foreign I'm sorry I think I'm gonna have to go out leave and come back oh no I have it great I don't know how but I tapped um that's right I get it okay so what we you know what we what we know uh in at this state of the board right is I've got one two three four five six seven eight X's we know that the submarine is not in half of uh the squares and if you if you compute this um you get one bit it turns out which again makes sense right it's it's like the first of the four questions in the game of 16. and buy the um so this is the eighth term the nine ten eleven twelfth turn uh you get two bits because we've only got a quarter of the um the squares left and I think actually the 14th term gives you three bits of information so we'll just add those in green here and here um 2014 you know and so there's there's one there's one bit left to learn in these remaining squares right so there are some special uh turns huh sorry let me just put that up here it's not gonna take too long by the 15th term we've accumulated all four bits of information right because so there's the 15th turn even if it's a miss we know where the submarine must be okay I'll make some space for myself here okay so there are you know you can see that the total accumulated information from misses is some you know some real number um according to the base of the logarithm and well it's two here and so the logarithm of you know 13 12 or whatever is is not a particularly round number but there are special turns where we have eliminated half the squares three quarters of the squares and so on where the total accumulated information is these nice round numbers um but of course we know that when you um find the submarine you have learned four bits worth of information and it doesn't matter what turn that happens on I can accumulate the information from n minus 1 misses and then add a hit on the nth turn right so that gives me the sum of logs which is the log of a product of 1 over p i multiplied by the final uh well sum of I'm adding the log of the hit one over the hit probability and so that can go into the product as well and be 1 over p uh and um I'm sorry that's not quite right so these are um probability of Miss on the knife and the probability of hit

on the end uh which is well you know in this case we can start writing these down write one over px1 we saw with 16 15. and then we had 15 14. and so on and then ultimately we get to 1 over um one over n which is n over one and the previous term is uh is n plus one over n right so there is much cancellation here and we end up with just the start and the end which of course is four bits right so so that works out and matches our intuition which is that by the time we've learned where the submarine is we've just nailed this number from from the first game the coordinate if you like of the submarine which is a number from 0 to 15. so that's that's it that's an inspiring example my intuitive sorry I was just saying that's a really satisfying example the cancellations yes yeah yep so that's you know that's my kind of intuitive feel for for why um that definition of information makes sense and now you know we're ready um to uh kind of use it maybe I'll do it on the same board um given also the board risk right we're ready to oh no I won't it's worth it it's worth a new board so here we go sorry um there you go sorry I'm uh experiencing operator charges I occasionally drag I forget that when I touch board it goes into edit mode sometimes I want that screen real estate for um changing the direction yeah so you know now we have um entropy as average information content in a distribution all right so I was just saying that you know I think the the minus sign here um and the log are doing most of the heavy I mean you know this this to me is um really s there's nothing you know the only new thing here is this right we we already understand everything to do with little h i um and the minus sign and the log uh are really all that's kind of going on there um and we've just added one new ingredient and so that's that's kind of it that's what you know entropy makes sense to me as intuitively as the expected or average information content available in a given random variables probability distribution okay so then um I still haven't you know there's nothing particularly rigorous here we've just sort of motivated the three things in the blue Rings um but what Shannon did in his 1948 paper it's listed on the first page of the notes it's one of my sources um is he showed that if you assume that there is a measure of information that is continuous that is monot monotone fully increasing as a um as a function of the number of possibilities number of possible outcomes and self-consistent in a certain sense

which we'll get to in a second then then there is only one functional form that satisfies those given a probability distribution pi and really it's um it's this one where almost all the action is his proof um is discrete and he uses the continuity property to kind of extrapolate between rational numbers um to real numbers and it's a kind of afterthought and the monotonic increasing property is used um yeah it is used I don't want to undersell it but the main um you know the main action is in this self-consistency and and what I thought I might do I don't know maybe I should get your feedback here um you know I thought I might um I can either kind of go through the whole proof or I can sketch I can give you a bit of a a primer um that should be useful if you go and look at his original um paper which by the way is really clear I mean sometimes when you read historical papers you know like like Einstein's papers are in a different sort of notation different language uh from today's um expression of them um but but Shannon's um it's uh it's it spoke very clearly to me as a a reader 80 years on so let me start with the uh the guided tour like the uh the sort of the primer um part and if that's enough then maybe we can move on um but if there's an appetite to kind of go deeper maybe we can choose to do that about okay so what what Shannon the example that he has in his paper which is the same example that James uses by the way when when he goes through this in his book uh probability Theory the the logic of science is um is this he says that um that let's say you have three possibilities three outcomes with probability one-half one-third and one-sixth sorry I'm just looking at this uh yep okay good sorry and it looked wrong for a moment but it's not verses the same information ultimately being revealed but in stages right so first we have uh equally likely outcomes and then that's one half and one half and then a two-thirds one-third um subsequent uh subsequent I don't know choice event random experiment right so um one half of two-thirds is one-third and one half of one-third is one sixth all right so we get to the same probabilities ultimately but they're revealed in two stages self-consistency to Shannon means that whatever this the functional form of HP is as a function of the first set of probabilities the first distribution here that should be the same as well the information gained from the first stage Plus um well a half because it happens half the time and then the information gained

from uh the second experiment if you like and he has some notation for uh well used throughout which is particularly relevant here in the monotonically increasing um Axiom assumption and that is that if all of the outcomes the N outcomes are equally likely then Pi is one over n for y and we're going to Define an a of n which is the information revealed in that case right so it's a of n that is monotonically increasing foreign next board which uh we can't look at um the next set of words right okay that's fine yeah we can't that pair isn't a thing you said how many pages the papers how long is the proof roughly um it's uh the setup is on one page and the actual proof is in an appendix which is another it's it's really it's short um it's like one page yes I have not read the proof of this theorem so I'm quite interested to hear okay I just I think um in my notes you know it for me it took a little while to kind of get my head around what he meant by the self-consistency property so I'm just going to draw if you I don't know I feel like slightly indecisive about whether this um is worth your time or not but I think I should probably just chant and draw it yeah you can never go wrong with a picture okay so I just wanted to to draw a more realist not realistic complicated non-trivial example of the kind of thing that he might have had in mind with some probabilities here uh one r two are three this will be S1 S2 S3 T1 T2 U1 U2 okay so here um he's saying that h of well the first root is P1 q1 and the second root is P1 Q2 and the third root is P2 R1 S1 and so on all the way to the last root which is p two R three U2 must be equal to well the first stage is H1 of P1 too Plus well with probability P1 we've got H oh this is H I'm dropping the P but anyway that's maybe okay h of q1 Q2 up top there and then P2 with probability P2 well we're now we've got the same thing nested right so h of R1 R2 R3 plus with probability R1 h of all the s's let me just as a shorthand to draw okay write Vector s and R2 H Vector T are of 3 h Vector U Okay so that's um that's what he means by self-consistent right that's how that's I think that's um the sort of thing he had in mind and the proof and so if we go if you are able to look at the previous board and and see that um diff the the quantity a n defined right which was h of 1 over n dot dot one over n n times my n's need a bit more discipline there well he um he uses uh he proves he shows well that but hopefully he doesn't show he just uses

the fact that a to the S sorry a of s to the m is M times a to the s and I think it's even probably worth a quick picture for that as well which is this one foreign [Music] so this is um m it's not m equals one m equals two m equals three uh this is 2 to the 3 equals eight right so so m is sorry um this is stage one and stage two and stage three um and the total number of stages is M which is three right so s is 2. m is 3 and of course s to the m is eight in this example so if we take the direct route right and go straight there like that then we have h of one over eight dot dot one over eight with commas Maybe right so that's that's a uh to the sorry a of s to the m in this example and if we apply the same as the above we get well let's use the A's it's a of 2. for the first choice and then with probability half we have um the second stage right which gives us an a of two but within that so you know just as we had um inside this square bracket here we have the Third um stage uh and we have another copy of this so I'm going to do a little more color coding in a second so these come from here and that these and these come from here here here and here and if you add all that up it ends up being three a of two which in this example is uh is m a of s which is what we were trying to well which is what is is what we're trying to show and we you know it's only one example um but I think you can um see how it generalizes uh to any s and any m all right so I will go to the next board we're already looking at them both or become wise right yes good here I am so just to recap then right where are we we've got Shannon's axioms one two and three uh continuity uh monotonically increasing and self-consistent and the self-consistency is where we've been sort of focusing and that something quite key there is um uh is the is this um property of pages to the m equaling m a s o lordy sorry I can do better okay so then what he says is uh consider a total of n outcomes but organized into little n groups right with little ni outcomes in each group right with uh the the probability that you hit the ith group being that so we can apply the self-consistency um property to to that right we can say that [Music] um that a n a big n right so that's going all the way to the uh large number of kind of micro outcomes if you like can be written as h of P1 P2 dot dot doctor PN in other words like this is this is the thing we want this is HP we want to discover that functional form Plus well now we sum over

with probability [Music] um same functional form applied at the next stage or the next level now there's something that he needs and shows separately which I do have a page in my notes on page 7A but I'm just going to State it for now and get to the punch line and then we can decide whether to kind of go go through that part or or move on right so what he States is well he shows that the functional form of a applied to some integer must be some constant times the logarithm of that integer so that's a little bit magical um at this point I mean I'm just sort of introducing it the increasingness of the functional form just means that this constant must be positive that's the only place that the second Axiom or the the the fact that it's an increasing function is used but the monotonicity is used in order to um to show that a must have a logarithmic form but if it does then we can apply it to uh to this we can say um that you know we must have C log Big N in other words the sum over the ni on the left that must be equal to the thing we want Plus well C sum over i p i log enter and so that so HP um yeah must be equal to well knowing the answer let's pull the minus sign up front um and we have sorry delete delete uh what's happening oh that's right okay so so this um uh this isn't a function this isn't um once you've summed over the eyes right there's no eye dependence there right so um so that's a plus uh C I I uh so I have this right so I can insert one in the form of of that in here [Music] and I have log of that same thing in fact I don't really know why I didn't just call this n first time round so I'll do that now and by the time I you know combine these in this way this of course is pi and because this C is positive I can absorb it into the logarithm with a suitable choice of bass and therefore unit Okay so we've arrived we started from Shannon's axioms we understood you know with some examples what the third one was uh talking about and uh based on what's in the uh the Box in the middle here uh this one we have the answer so the only missing piece is where that comes from I'm dying to know what what were the other things how long will that take to explain um it's uh it's one board if I arrange the writing thoughtfully yeah I guess I would vote for going through that uh I guess that means maybe you will defer some other things until the second part of this uh do you mind doing that okay I mean it's up to you no um let's do it cool all right so so here we go

okay the boards seem to be uh playing ball which is nice okay so we've got a of K is C log uh okay all right so that was in the magic box so what Shannon says I believe is that for um I think for rational s and t given t to the n I can find an m that sandwiches t to the n for any other s um I guess you want these both to be positive rational numbers right I do so that's his starting point and he takes logs of both sides right so M log s is less than or equal to n log t Which is less than or equal to n Plus One log s um I think he would divide by n log s to give the M over n is less than or equal to log T over log s Which is less than or equal to well m plus 1 over n and he says that he's going to do this for an arbitrarily large n and call 1 over n Epsilon so that means the difference the absolute difference even between M over n and log T over log s is less than or equal to Epsilon so then he goes back to the beginning and applies the information measure to these values so a of s to the m and because it's monotonic these inequalities are preserved and because we know that a of s to the m is m a to the s through the third Axiom the self-consistency property we can write that as m a to the S is less than or equal to n a to the a of t and then similarly we can divide by n Airbus cancel Epsilon so cancel so just for a better visual let me just write this in we have something that basically looks the same as we had before and so from that um the absolute value of M minus M sorry M over n minus a of T A of s is less than or equal to Epsilon all right so I mean if um like if this is M over n say you know and that is log T over log s and and that's our Epsilon uh than the the worst case I suppose is that a of T over a of s is over here and that's another Epsilon so I can subtract these two to give um the absolute value of a of T over a of s minus log the other log s is less than or equal to 2 Epsilon but um I mean the two is no big deal because I can run this whole thing for an arbitrarily large n and therefore an arbitrarily small Epsilon the original statement is that uh for any in maybe say capital N for any capital N there exists a pair Little M comma little n with little n greater than or equal to capital n and Little M and N satisfying that inequality right maybe somehow need to be able to make little n big and have M work alongside that little end is that right I mean it's certainly not true that this will that will work yeah so you're focusing on that right that's right yeah

yeah I mean if if s and t are the same then it's certainly not true if I take s and t arbitrary that there always exists appear Little M and little n right if yeah that satisfies the Red Box that's very well uh I guess you need to worry about things like T equals one in in our case s and t are both less than one right we don't know oh sorry yes in my in my example um yes was two I think that's what s and t are quite large yeah that's right okay so they're greater than is that one over if they're the number of about of possible outcomes in the distribution so maybe so if s is if T is 10 say um and and S these are all quite large yeah I I it's it's a good your right to um question it and I to be honest don't fully I haven't thought through what's in the red box it was just sort of stated um as a starting point as far as I can tell but maybe there's something else in the paper that um gives more background so maybe that's a bit of homework for next time I can have a look I guess you just you need the yeah all right I mean I think I think all you need is that you can find a pair Little M comma little n with n larger than a given integer and then the argument works and that that I believe so yeah yes I think it's fun it's clever obviously I have to think of doing this yes indeed yeah so what I I I get lost in the four degrees of freedom the S and the T and the m and the n I mean I can well believe that that if there's an S and A T and an m and an N such that the first inequality holds then uh you know by multiplying another S I make the left hand side now bigger than the right hand side of the of the the St inequality from the left there um but anyway let me uh perhaps dig into that um ready for next time so I guess sorry sorry no I was just gonna say so it seems like we're basically done now though I think so yeah I mean you know basically um this is um let's call that zero we're among friends and N can be as big as we like uh and so you know yeah if um and so that's an equal sign there even among enemies the left-hand side is independent of of the N or the Epsilon so since it's less than or equal to Any Given it's so arbitrarily small real number it must be zero so yeah right and uh yeah so if so given that okay so if it's if it is zero then I can write this down I suppose well in fact if that is true given that you know I've got an arbitrary function of t on the left an arbitrary function of something else on the right of s on the right then they burst they must both

equal a constant or at least yeah something that doesn't depend on S or t and those are the only variables so I think we've got our box now that last bit on curious about so it's definitely true that 80 on log t oh I see no that literally says that it's a constant function because it's for arbitrary s and t sure yeah yep it's it's the I remember first meeting that argument when separating variables in partial differential equations for example so this part doesn't use the self-consistency at all right so this is just well we haven't used continuity yet I guess you're about to do that as you said to get rational to real we have we have used the surface here thank you oh of course yes yeah and I I don't know what to say about going from rational to real other than uh in fact I don't fully understand it to be honest I mean there's just a it's a like a throwaway sentence at the end yeah this part I can explain so you've shown that you've assumed well you've assumed the existence of the function a is that right so if a is a continuous function and or all rational numbers it agrees with the continuous function C log yeah two continuous functions which agree on a dense subset must be equal so the rationals are dense in the reals so that is you're done right okay awesome that was so clever I'm so glad you presented this I hadn't seen this before that's very nice yeah and I'm looking at the time and I think you know as I as I look in my notes the next uh board would be the physical entropy part of the story so we have spent longer than I thought but maybe it was um worthwhile um and so maybe next time we can do physical uh entropy and see how far we get in the um the connections between the two yeah sounds great any questions about this part for Russell seems not so far what was next in the notes uh sorry I didn't quite follow what you said I was thinking uh what we call them yeah what would you like to do in the five to ten minutes we we sort of have left you want us um suppose it's not obvious at all from what you've said so far that this has any relevance whatsoever to to physics right uh why yeah what is the uh secret to to why that will be the case well maybe the best thing to do with the last five um minutes is sort of tantalize and hint in that direction so I think to fully explain well I know maybe I can do more than that because tantalizing and hinting will just be foreign but I can't do it all so so let's let's see what I can do um so that that we're now on

physical entropy right so there's something else there's something which is people call entropy which um has very different roots and and this is it I mean I'm just going to write down a formula that says inverse temperature is the rate of change of something called physical entropy s with energy and we're gonna the the easiest physical system to consider here um is a gas so let's say that it's at constant volume and constant particle number of particles and volume and this is the internal internal energy which just means like this whole this gas might just just ignore external Fields essentially right the fact that the gas uh might be on a planet experiencing gravity or subject to some external I don't know um electric field say which starts sort of interfering with the charged particles in the molecules of the gas all that sort of stuff ignore and just consider the um the energy in this gas that it would have in a complete vacuum with no other sort of physical influences coming from the outside all right so this there's a story like a thermodynamic story um to do with you know Carno engines and clausius or clausius and it was clausius really who um established the existence of something called called entropy he even chose the name um and this was before the existence of atoms was discovered right this was based on very kind of General arguments to do with trying to make Steam Engines better essentially and so it's really not a priori obvious at all that um that s is the same thing as our age but the kind of the clue is in the subscript or the lack of subscript right so everything on the left over here is um to do with kind of macroscopic measurable properties of the gas in your piston without knowledge of any any kind of micro structure or microstates it's when you look at it's it's when you look at this picture on the left in terms of the statistical properties of the microscopic uh description of the system that the two ideas are reconciled so that's the journey that we're on yeah I find it astonishing that physicists can find some coherent theory in the state of such confusion and lack of Foundations thermodynamics is an amazing example of that I I would have just given up if I were in their position it's just so difficult to see through the massive stuff they had in front of them and and come up with a concept like entropy it's really amazing I just looked up the etymology of the word I I don't know if it was used colloquially before clausius maybe not I