Entropy is an important concept in both physics and information theory. In this video, we explore the link between entropy and the ideal gas system, and how Shannon's formula can be used to calculate the average information content of a discrete probability distribution. We will also discuss the Kullback-Leibler divergence, the Boltzmann constant, and how to calculate temperature and entropy. Join us for a thorough exploration of entropy and its implications.

The speaker is discussing entropy from a physical viewpoint, linking Shannon's formula and information theory to Clausius and Carno's discovery of entropy. This involves discussing microstates and how to calculate the probability of two gases being in equilibrium. Shannon's formula describes the average information content of a discrete probability distribution, and the Kullback-Leibler divergence. This gives an interpretation of entropy in terms of the rate of change of something with respect to the internal energy of an ideal gas system. This is linked to the Boltzmann constant, which is used to calculate temperature and define entropy.

The speaker is continuing their seminar on entropy, focusing on the physical viewpoint. They recap the information theory part from the first session and explain that the information content of observing a system's outcome is described by the logarithm of one over that probability. This logarithm adds up the information conveyed by a compound event with probabilities pi and qj. This proposal makes intuitive sense in two ways: when the outcome is usually expected, less information is learned, and when the outcome is surprising, more information is learned.

Shannon's result was discussed, which is the average information content of a discrete probability distribution, and how it is uniquely constrained by Shannon's axioms. This formula also gives an interpretation of the Kullback-Leibler divergence. Shannon's result was then used to talk about entropy in physics, in terms of the rate of change of something with respect to the internal energy of an ideal gas system. To illustrate this further, a diagram of the Carnot cycle was shown.

Clausius and Carno were trying to understand how steam engines worked and make them better. They discovered a quantity that tracks the point at which the gas in the Piston undergoing the condo cycle is at as it moves through the cycle. Clausius named this mysterious property entropy. Shannon's formula, probability distributions and counting were very far from this discovery. The link between knowledge and macro states was established, as macro states are what is knowable. Micro states are not knowable, and therefore not part of knowledge.

The transcript discusses the concept of microstates in an ideal gas. A microstate is defined as the position and momentum of each particle in a gas at a given time. Omega is the number of microstates compatible with a given internal energy, volume and number of particles. The transcript then considers a mixture of two gases and their respective microstates.

Boltzmann and Clausius's definitions of entropy can be connected by considering the probability of a certain split of energy in terms of microstates. By dividing the total energy of two gases by the product of the Omega of E1 and Omega of E2, the probability of finding the two gases in equilibrium can be maximised. This leads to a quantity, beta, which is equal to 1 over the Boltzmann constant times temperature. By considering the microstates, this quantity can be used to define temperature and make contact with Clausius's definition of entropy.

The transcript discusses the concept of microstates and how they can be counted. It explains that microstates are defined by what we don't know and that by measuring the positions and momentum of particles in phase space, the number of microstates can be determined. This is done by dividing the total number of particles by the number of particles in each phase space region, and using Sterling's approximation. This gives an expression for the number of microstates, which is then plugged into an equation to calculate the entropy.

The transcript discusses how to use Shannon's formula to measure information entropy in bits. It begins with a comparison of a million to realistic values and then explains how to calculate the numerator and denominator of the formula. It then explains how to simplify the formula by cancelling terms and introducing a factor of n. It then explains how to calculate the probability of finding a particle in the ith group and how the formula is related to Shannon's formula. Finally, it explains how the formula connects a certain definition of information from information theory to something in physics based on macro information about the system.

The transcript discusses the application of statistics and probabilistic thinking to thermodynamics. It questions why energy is the only macrostate that is relevant in thermodynamics, and why other macrostates would not result in the same theory of thermodynamics. The underlying philosophical foundations of thermodynamics are not completely resolved, but there are not arbitrarily many extensive macroscopic quantities that can be chosen. Russell mentioned earlier that the energy is undefined in some sense, and Callan's book on thermodynamics addresses this to some extent.

Thermodynamics is founded in two ways: either from statistical mechanics, or from a set of high-level axioms. There is debate about which is the correct founding for thermodynamics, and it is not clear if there is a satisfactory resolution. The confusion could be resolved if it is discovered that there are very compressed scalar or few variables about the system that can determine its entire future evolution. Knowing the state of these variables can predict a lot about the system.

okay so um welcome everybody to part two of um this seminar on entropy the concept of entropy let's see if I can write the word I can um in which you know so I did post a link to the notes that I put together when preparing for this on the SLT channel in Discord um so you can feel free to grab those and uh kind of follow along look ahead um catch my my mistakes or my hesitations or whatever as I try to recall them because it was a while ago now um in the first um I mean the overall um thrust of this um Endeavor right now is to get a conceptual feel you know build kind of intuitive muscle for the concept of entropy um and come at it in two ways one is the information theoretic uh View and the other is the physical view the the physics thing um entropy and the first session in the first session we basically got through the information theory part um and so this uh remaining um 54 minutes is going to be focusing on the physical uh viewpoint but really trying to make the point that they aren't um so different after all so just to briefly recap what we said uh last time um I said if you have a discrete probability distribution such that the probability of the ice um outcome is pi then the information um content of observing that your system has resulted in the ith outcome is is well described by the logarithm of one over that probability and the one over P part um kind of says that you learn more or when you observe that's what what you know when when what happens sorry when the outcome is what usually happens then you haven't learned as much as um when uh you're surprised um you know it's like last time we had a game of Submarine which is a little bit like battleships and if you happen to guess straight away where um the opponent's um battleship is or submarine is then you've learned a whole lot you've determined uh where it is um and there's only one of them there are many more empty squares on the board and so if you you know if you miss then or you haven't learned quite so much you've just eliminated that one square and then the log achieves additivity right the idea that um the information conveyed by um like a compound event with probability p i q j is the sum of h i and um the h j if you like or the the information you know it's the sum of the inverse of those two probabilities and I'm just um I mean so far that's just a a kind of proposal for the definition of the concept of information um and it it makes sense it makes intuitive sense in those two ways

um but it um it lets us write down the average or expected information content of this discrete probability distribution which of course is this when you substitute for for hi and we then looked at Shannon's axioms and Shannon's derivation of of that formula which um kind of puts it on a more rigorous footing Okay so um so you know H so so this I suppose let me draw a box around Shannon's result that's where we got to last time and that's really um the you know what information theory has to say about um about uh entropy right um because entropy as the average information content of distribution and also as um the uh concept that is or the expression this formula that is uniquely constrained um by Shaman's axioms um it also you know uh gives you a nice interpretation of the Callback Library Divergence right as the expected difference in information right hi for the truth minus the hi for the model um expectation of that over the truth the true distribution is the kale Divergence okay so that's um the quick recap of last time any questions or thoughts before we move on okay that's a no so um this is in for formation why did I write ow information Theory erratic I guess and now we start on a physics Journey but we're gonna but it's a round trip right um we've done uh We've walked kind of this path we're now going to go back to the beginning and walk in a different path but pretty soon it's going to converge something like that so um there's an idea of entropy um as this in physics as s where the rate of change of this thing with respect to the internal energy of the system and let's just talk about ideal gases um because life is easiest in that context it doesn't mean this doesn't apply to other systems but it'll just be easiest to work in that place you know so if I hold the volume and the number of particles in the in the gas constant and consider only the internal energy so not uh the um gravitational potential energy that the whole thing has because it's in a jar on a shelf or something but just the um you know no no external Fields um having any effect just an ideal gas in in a container in space somewhere very far from everything so just that energy you know temperature well inverse temperature um is actually is defined if you like by the rate of change of something s with respect to internal energy now to give you an idea of how um this arises I'd like to show you a diagram of the Carnot cycle and I'm hoping that Dan can take that image and paste it on the next board

I will try I may get the wrong image so this might take two attempts but we'll see okay I think that's the right one you should be able to see it shortly fantastic I can see it Okay so given the time and everything I I let's not dwell here um but the the story is I mean to be honest this this is always I've never felt um really comfortable here like I know what engines do I know what how an internal combustion engine works I know how a steam engine works and Carno and others were trying to understand how steam engines worked and make them better and work out how well they could theoretically work and devised this um well the concepts of thermodynamic reversibility and the the concept of this cycle as the most efficient um way to turn heat into work or vice versa um what uh well clausius clausius I think I guess and so if I'm just going to go back to the first board and use the space there so Carno um and others and clausius well ultimately classiest leading to clausius found that there is some quantity uh that they found that the change um in the internal energy divided by the temperature at a given temperature so you know we're not changing the we're not putting so much energy in that the temperature changes appreciably there's an infinitesimal change of energy he discovered that you know the uh the the integral of that over the cycle is zero in other words there's there's some variable that uh tracks the um point at which the um the gas in the Piston undergoing the condo cycle is at as it moves through the cycle this was before the knowledge of atoms uh this was just sort of magic as far as I can understand in terms of a kind of scientific discovery uh maybe if I knew more about the history of this I'd have some idea of how they were able to figure this out um but you know clausius named this this s he chose the name entropy um and um you know as as a kind of mysterious property um that uh well that that is defined um by by this okay so we're very far like we are now sort of here on the journey if you like right we're very far from um Shannon's formula um we're also very far from uh probability distributions and and sort of counting anything um so in two steps though two um boards probably we're going to get back um to uh to that world so let's uh just move over to the sport okay and um talk about micro and macro States so you know conceptually a and this already we see a link with knowledge because essentially macro states are what you know or what is knowable

what you've measured and microstates the microstate of the system is all of the details you don't know all right so there's a link with knowledge with information Theory kind of already at least at a loose level um that's not a cow in in meta uni is it because there aren't any that's a real one you can hear that you can hear the cow that means my audio settings are incorrect so that's good okay yeah can you still hear me I can still hear you yes [Music] okay now I can mute successfully and you won't have to listen to the cow but I'm glad you pointed that out because it was the audio was coming through the wrong that's wrong I think I might miss the count but um it's being fed so it'll shut up so yeah so so uh you know it's at some level this is like known and this is unknown um but to make it as concrete as possible because I was always I remember being a bit mystified by what the concept of a microstate was um when I was younger um to make it as concrete as possible in our ideal gas um there are n uh Point particles each one has a position in 3D space so three coordinates and each one has a momentum and if you can write down uh one instant of time where every particle is and what its momentum is so in what direction it's moving and how fast then that is one microstate of the gas but already you can see that there's it's a kind of arbitrary distinction because I mean it's not an ideal I mean no gas is ideal um even if it's just hydrogen atoms there's a proton and an electron there so there's some other stuff other sort of electron energy level states and you know all sorts of um you know qcd type um gluon stuff happening inside the nucleus maybe and all that is there as well but we're ignoring that um but if we could know that too I mean we could choose to include that in the microstate as well um so I would like to so let's just have in our minds the positions and Memento of all the particles of the gas and Define Omega as a function of this internal energy the volume of the gas and the number of particles as the number of microstates compatible with in other words sharing the same internal energy volume and number of particles right so there are many many possible microstates in general but not all of them have the same energy and volume and number of for a given number of particles the number that do is Big Omega and I'm just going to drop um I'm going to drop the v in the end and just focus on the E for now and I'd like to consider a mixture of two gases

with total energy um Big E and right so E1 plus E2 equals e that's the total energy of two gases together and if I happen to know the macro state of one of the gases being E1 the energy of the first gas is E1 um then and the energy of the second gas is E2 then I can um write down the um the product this product of so this is the total number of macro microstates compatible with um this division of energy between the two gases E1 and E2 now I I want to say that actually it's the same type of gas like it doesn't need to be a fundamentally different type of gas so there's no Omega 1 and Omega 2 there it's just Omega it's the same distribution of microstates over energy and I can write it of course like this sorry B1 given this that they sum to E and If I multiply this by some constant C then I have the probability of um finding the two gas or that the mixture with this split of energy and in equilibrium I would expect zero to um I would expect this probability to be maximized right so I'm going to differentiate and stick it equal to zero so that gives me my constant out front um and then Omega of e 2 T Omega V 1 divided E1 well Plus um oh my God everyone D Omega of e minus E1 by dv1 which is equal to that um well minus D Omega by de2 given this of well maybe I'll go back to E2 right and so if I divide both sides by the product of Omega of E1 and Omega ob2 um then I get d by D E1 log Omega of E1 is equal to D by E2 log Omega of E2 in equilibrium all right so this there's a there's some quantity beta which is d by D energy log number of microstates with that energy which is the same for um systems in equilibrium and you know we if we if we choose beta equal to one over the boltzmann constant times temperature and follow boltzmann and Define s as K log Omega then on the left hand side I have 1 over KT and on the right hand side I have um 1 over K times d s by d e so by considering microstates um the microstates compatible with you know by considering the probability uh and the the most likely um uh the most what have we done the we wrote down the probability um of a certain split of energy in terms of microstates and the most probable uh division led us to this um yeah this this quantity which will led us to temperature and made contact with um with clausius's definition of entropy so this is you know this this one is boltzmann it's on his tombstone well it's my own and this one is classiest okay quick pause four questions and all thoughts before D going like one level deeper

into microstates and starting to count them okay let's go sorry where did the lock come from just now the log came from so if I divide by Omega of E1 Omega of E2 Omega of E1 Omega of E2 then I yeah right thanks sorry yeah no problem not at all yeah and then it's one over then it's uh yep there it is okay so continuing the theme of microstates are kind of defined by what we don't know let's imagine that we tried to find out right so if if we do start to measure the positions of the gas particles if we do start to measure the momentum of them in other words if we start to measure the locations of the particles in Phase space then inevitably we have some kind of resolution and our um number of um microstates looks something like the so consider that the the total number of um particles is Big N uh which is the sum from I equals 1 to K of the of n i where n i is the number of particles uh in the ice position or with five momentum or in the ice phase space region or within the error of the ice measurement because it always has some kind of error if you're going if you're acquiring knowledge through that mechanism right so if there are n Big N total particles then I can arrange them in N factorial ways but if I only care if I if I am I'm only able to determine which of the K groups they're in if you like then within the group within the first group I can make all of those rearrangements and reduce the total number of microstates by that factor same for the second group and so on until we get to the kth group okay so that's quite a general um you know expression for microstates uh it's not specific to an ideal gas or even a gas Okay so let's plug that into s over k b equals log Omega right and so thanks to um Sterling's approximation which is uh the log of a factorial is n log n minus n plus something that goes as log n right which is a really good approximation for I mean even if it's like if n is uh you know let's say it's let's just it's this is a natural logarithm but if we just pretend it's a a base 10 logarithm for a moment and if it's like the log of a million factorial um you know so that's um like that's n is six so six times log of six minus six uh no sorry N is a million so log n is six that's right so it's like the first time n log n is six million minus a million five million um plus the next term is like six relative to 5 million you know so that's that's not a bad um number of digits to get right particularly when in typical systems like Avogadro's number is 10 to the 23.

right so a million is a small fry compared to realistic values here anyway so we plug in and the numerator gives us Big N log Big N minus n uh and then the first term in the denominator gives us minus n one log N1 and then Plus n one similarly for all others so let's just do I already wrote I freudianly there I'm just going to write the sum separately because of a cancellation that's going to come up right so um that um equals n by definition and so that cancels with that which is nice and then let's take a factor of n out front and we get um log hmm if you don't mind I'm gonna swap the order of terms so we'll do minus this sum of an I over n log ni right and this is the at least frequency probability of um finding a of of what um it's the probability of finding a particle in the ith group and we'll call that pi so you can kind of see where we're going maybe we're going we're going to get back to Shannon's formula pretty soon then I've got a plus um log n um but I can throw a um some of a pi in there just as as a factor like that because it's equal to one and write this as minus n sum um of log of p i multiplied by log and I minus log n and so we have that s is minus k b n some of p i log p i given that the I is n i of n compared with h o h p which is minus the same sum Pi well we were measuring information entropy in bits um so a base 2 logarithm as opposed to Nats which we've had to use uh because of Sterling's approximation I mean it's only a constant um there's a different I guess Sterling approximation with a base 2 logarithm but there's only a constant difference right so the constants only determine units um so they they don't really matter right so I drew I drew this picture and I said we're going to head off in a different direction but pretty soon we're going to converge um and I think the you know here was one over t uh equals DS by d e and we're getting closer over here with s uh equals K log W and beta equals one over KT but now I think we're we're pretty much here right we've converged with the information theoretic route that's the physics route because we have the same formula apart from units so how do you feel about that um it's up to you what have we got um so a thing about that that has confused me ever since I've heard it for the first time is but this connects a certain definition of information from information Theory to Something in physics that's based on well if we have certain macro information about the system but don't know certain facts about its microstate

then we get the same thing which makes sense to me both of these are in the end applications of um statistics and probabilistic thinking if you will but the colored cycle thing and then the loss of thermodynamics that you get out of this they talk about the actual evolution of physical systems which I mean for this thing here we have or prior on what we happen to know about the system on the macro level we know it's for example it's total energy and pretty much nothing else but how do you pull out that this is the kind of Prior that is somehow relevant if you want a theory of thermodynamics that predicts the world why not some other prior like I don't always know the energy state of a system that I applied a loss of thermodynamics to yet it continues to obey them anyway yeah I um I mean when I I you you're using the word prior and I you know we we as singular learning theory people and Bayesian inference people we know what that means in that context um but I don't think you yeah but that's that's maybe a red herring I mean I think your question stands I think I think your question is um you know it probes the like if you choose different macro variables do you get a different sort of view of entropy or version of entropy do I understand you I think what question well partially I would say it seems to me a a given that you would get um a somewhat different entropy I mean entropy is dependent on your state of knowledge of the system what is interesting to me is how apparently some specific choices for the information that really observers have are macrostate some priors corresponds to giving you a theory of thermodynamics that seems to be awfully good at predicting things like when stuff turns into gases and I would assume that some other entropy that derived from some other macro stage but I arbitrarily make upward not do this and this confuses me somewhat yeah this is a foundational question in thermodynamics that I don't think I wouldn't say has a satisfactory resolution so callan's book on thermodynamics which is the reference partly we're going off here addresses this to some extent but you can read whole volumes on the sort of underlying philosophical foundations of thermodynamics and not come away feeling this is completely resolved I can make two remarks the first is that there aren't arbitrarily many extensive macroscopic quantities that you can choose of course the energy is undefined in some sense because as Russell was saying earlier you can choose to pay attention to rest

mass or to some other form of energy right there is no such thing as the energy okay but maybe you hope to account for all the forms of change in energy that are relevant to the physical processes under consideration if you don't do that your theory won't work if you do it will and so it's kind of a statement partly about there are physical processes that we care about in which only certain kinds of changes in energy like chemical reactions or changes in volume are present and as to the kind of extensive quantities that we consider a calendar addresses that in one of the appendices he talks about symmetry principles and it's a little bit controversial I would say but you can argue that many of the extensive quantities that we consider in the setup somehow have an origin in symmetry principles okay so maybe they're somewhat canonical uh and then you can but you can take two foundational attitudes you can say thermodynamics is founded in statistical mechanics and that you start from like the boltzmann distribution and work up or you can say you start with kind of high level axioms about observable stuff which is how Callan would start and how many physicists would think about it so you set up a system of axioms that say uh there are these things called extensive quantities and there's something called the entropy which has such and such a property and then you prove that that quantity you called entropy you derive these formulas that um like the the accounting microstates here for an ideal gas you derive this from those high level principles but who's to say which kind of founding is correct for thermodynamics they they interact and you know which one is somehow the correct view depends partly on on what you like so I yeah I I think what I'm saying is that I I'm not sure that this confusion if that's the right word for it that you're expressing really has a satisfactory resolution at least that I understand I think it might cost I'm sorry I suspect it has one but oh sorry I tried to say you go ahead okay yeah I suspect it has one but it would not surprise me if it has yet to be discovered or at least written up in understandable format uh my my guess from this would be that like from the actual physics side sort of there are some um very compressed scalar or like just a few variables about the system that if the system is big enough to determine almost its entire future Evolution and so if you know the state of these variables You can predict a whole lot about the system