In this video, we explore the normalized loss k_n and prove the main formula using real analytic manifolds and functions. We use a converging power series to define a real analytic function, and a triangle inequality to show that the modulus of a control is bounded. We then use a dominated convergence argument to prove the theorem, leading to the conclusion that the expectation of a linear combination of features can be expressed as a double sum.

The main goal is to study the normalized loss k_n. Main Theorem 1 states that k_n converges almost surely to its expectation and is normally distributed when k_n is greater than epsilon. Hironaka proved resolution and Atia showed that if f is analytic, then fg of u is equal to lv u to the k. A theorem states that a real analytic function can be resolved into a singularity, and the speaker proposes to prove the main formula using assumptions and the Bernstein starter B function. Variance of set w_naught is equal to zero.

Real analytic manifolds and functions are used to resolve a non-negative real analytic function k in a neighborhood of its zero set W_0. A finite subcover of triples w, u, and g are taken and the gluing lemma for analytic functions is invoked to resolve the function k. A converging power series is used to define a real analytic function, and the main lemma states that, under a re-parameterization, the normalized psi function remains analytic after division. A measure q is defined by the true density of the true distribution Rsq.

A real analytic function f can be expressed as a power series in u and composed with another real analytic function g. This can be factored into two terms, one of which is u to the alpha minus k times u to the k and the other is axu. By integrating two expressions, one of which is multiplied by 1, and using a Taylor expansion with a Lagrange remainder, it can be shown that bxu is identically zero. A contradiction is found when integrating axu plus bxu cos squared k of x dx, as if a alpha x in the series is not all zero, then bxu squared goes to infinity and cannot be bounded by two e to the l. Therefore, pxu must be equal to zero.

A theorem states that the fluctuation of the learning process in a local chart is equal to an expression involving an empirical process defined on the resolve space. This can be proven by cancelling out the u to the k on both sides, as it does not depend on the integration variable. Expectation over the entire data set is equal to the expectation of axuaxv multiplied by u to the k v to the k, which is known as the correlation function. The expectation is linear and can be pushed into the sum, resulting in the expression being equal to zero.

A theorem states that the expectation of a linear combination of features can be expressed as a double sum. Using the independence assumption, the variance of a Gaussian process at the pullback of w0 is equal to two. A real analytic function on R^d can be complexified to resolve in C^d by finding a resolution map and using Cauchy Integral Formula, which states that the expectation over x of a x u^2 is equal to two.

A triangle inequality is used to show that the modulus of a control is bounded, and a geometric series is used to show that the expectation of x over a x u squared is constant. A dominated convergence argument is then used to prove the theorem, with the integrand bounded by an expression involving maximums and constants. If the integral of this expression is integrable, the limit of the integral as u to the k approaches zero can be taken inside, leading to the desired result.

Given a truth cube model, data drawn from the truth, and realizability assumption, the main goal is to study the normalized loss k_n. Main Theorem 1 is an extension of existing results, which states that k_n converges almost surely to its expectation of f, and is normally distributed when k_n is greater than epsilon. Theorem from empirical processes by Glivenko states that these results hold uniformly. Variance of set w_naught is equal to zero.

Hironaka proved resolution and Atia in 1970 wrote a paper which Watanabe read as giving the result that if f is analytic, then fg of u is equal to lv u to the k where one on a of u. This follows closely the result the speaker is trying to prove, which is that a psi random variable can be re-parametrized by introducing a new manifold and a parametrization function such that it converges to a gaussian process everywhere. The speaker wants to get a handle on the learning behavior near w0, which is inside a double epsilon set.

A theorem states that a real analytic function defined in an open neighborhood can be resolved into a singularity. The paper claims that a resolution of a real analytic function has a complexification, but the paper does not explicitly prove this. The speaker then proposes to prove the main formula one carefully, and will start by stating assumptions. They will also use the Bernstein starter B function and the convergence property of psi in the resolved space.

Real analytic manifold of d-dimensional u is used to apply resolution on a non-negative real analytic function k in a neighborhood of its zero set w0. The resolution gives a set of triples w, u, and g, which tile the zero set of k in w. Since w is compact, a finite subcover of these triples is taken, and the gluing lemma for analytic functions is invoked to resolve the function k in the neighborhood.

We have a real analytic manifold M and a function f which is LSq-valued and analytic on W_ε, a subset of M containing W_0, the zero set of k. We assume that f is not a singularity of f at infinity, otherwise we must choose a large enough set in M and close it. We then interpret these as charts G and assume that kG(u) = u^2k. Finally, we have a measure q defined by the true density of the true distribution Rsq.

A converging power series is used to define a real analytic function. The main lemma states that under a re-parameterization, the normalized psi function remains analytic after division. The division is defined as the ratio of the reparameterized function and u to the k, where kg is the pullback of k by g of u. This expression is not a well-defined function at zero, but it can be divided and evaluated as a function.

A real analytic function f can be expressed as a power series in u. After composition with another real analytic function g, the result is also a power series. The expression can be factored into two terms, one of which is u to the alpha minus k times u to the k and the other is axu. The goal is to show that bxu is identically zero, which can be done by observing that u to the 2k is equal to an integral of e to the negative fxg of u plus fxg of u minus 1. This is equal to pxg of u over q of x, and the q of x cancels out.

Integrating two expressions, one of which is multiplied by 1, results in the original expression. This is done using a Taylor expansion with a Lagrange remainder, which is equal to the second derivative evaluated at an alpha between 0 and the argument of the series. Dividing by u^2k results in a bound for bx0 being equal to 0, which is done by shrinking the integration region to where the supremum of fxg(u) is less than or equal to l.

A contradiction can be found when integrating ax u plus b x u cos squared k of x dx. As x u is equal to the sum of a alpha u and a over x u to the alpha minus k, if a alpha x in the series is not all zero, then as u goes to zero, b x u squared goes to infinity and cannot be bounded by two e to the l. Therefore, p x u must be equal to zero.

The speaker is discussing a potential issue with gluing multiple patches together in order to construct a single analytic function. It is not obvious that this can be done, as there are multiple resolutions on different open sets which must be taken into account. It is suggested that this could be done sequentially, with each patch being resolved one at a time, but the speaker is not sure if this is the right approach. It is noted that the Riemann-Lebesgue theorem does not require a global resolution, as coordinates can be chosen locally.

A professor is discussing a theorem which states that the fluctuation of the learning process in a local chart is equal to an expression involving a empirical process defined on the resolve space. This expression is equal to the expectation of a vector of u to the k, which is defined as an integral involving f and can be factored out. This is proven by cancelling out u to the k on both sides, as it does not depend on the integration variable.

Expectation over the entire data set of x1 up to xn is equal to the expectation of axuaxv multiplied by u to the k v to the k. This is also known as the correlation function in stochastic process literature. The expectation is linear and can be pushed into the sum, reducing it to expectation of xi of axi u. This is equal to bxi axu and is independent of the summation variable. The u to the k cancels out, resulting in the expression being equal to zero.

A theorem is presented that states the expectation of a linear combination of features can be expressed as a double sum. Using the independence assumption, the cross terms cancel and the non-cross terms are equal to one over n, leading to an expression of the variance of a Gaussian process at the pullback of w0.

The Green Book does not mention the need for complexification of a real analytic function, however Theorem 6.3 states that at the pullback of the zero set of k, the variance of the empirical process is equal to two. This is achieved by introducing a vector normalization by that vector, which allows the empirical process to be defined in the resolve space.

A real analytic function on R^d can be complexified to resolve in C^d. This can be done by finding a resolution map and using Cauchy Integral Formula. This formula states that the expression for aαx is equal to 1/(2πi)^d * √(-1)^d, where aαx is the multivariate push integral from the version. This can be used to prove that the expectation over x of a x u^2 is equal to 2.

A formula is written for expanding near w0, which is multiplied by dz1 up to dz. Using a triangle inequality, this is equal to 1/(2π^d) multiplied by the modulus of fxz over the set minus wα+1, where the controls are circular and have radius r1 up to rd. An inequality is used to take this inside the control, and an expression mx is defined as the integral over a larger domain where fx is defined. This is equal to the integral over the control divided by rα, which is less than or equal to mx/rα.

In order to prove that the expectation of x over a x u squared is constant, a triangle inequality is used to show that modulus of u is bounded. This is then used to invoke geometric series in each component and sum it to something finite. The same trick as in the proof of Main Theorem 621 is used to arrive at an expression of e to the negative t times a x u u to the k. This expression is then substituted for f x g of u, with t being between zero and one, and is a constant.

In order to prove the theorem, a dominated convergence argument is used on the integrand. The integrand is bounded by c mx squared over r to the 2k, multiplied by the maximum of either one or an expression involving q and p. The final assumption is that the integral of this expression is integrable. If this is the case, the limit of the integral as u to the k approaches zero can be taken inside, and the two can be pulled to the other side of the equation, producing the desired result.

okay right um okay i'm just going to go through what i prepared and uh hopefully that won't take too much time um i i did uh cut out a few coloration and and so on that um requires um the theorems about empirical processes and i mean the understanding about empirical processes um for now we're just focusing on some main parts right um okay so um let's remind ourselves what um those things are so um what we have here is um so given uh truth cube x model p x given w with w in some parameter space and i id data um d sub n from x1 up to xn um and uh where each one of them is drawn from the truth and we are assuming uh for this talk we are assuming realizability uh trout um for some w in uh w naught the main thing we want to do is to study k n which is equal to that and as we discussed before we should think of that as uh normalized lot loss um and the the the the whole proof should be we should have we should be able to push through the whole proof by replacing [Music] that part with ln of w up if we are talking about not non-realizable essentially unique uh case um [Music] right um but i haven't done that so what part of the collaboration and stuff like that uh fails to uh go through uh remains to be seen okay so uh throughout we'll denote um f of x w as uh log of q uh this log likelihood ratio right and and therefore um know w is just um the average of the data of f okay and we have um so let me state uh main theorem one in uh as kind of a extension of existing existing result so we have a law of large number uh tell us that well for each w for any w k n converges um almost surely um to its expectation of the f that expectation of x which is equal to the familiar k of w function right and central limit theorem um say that well this convergence um this point pointwise convergence um [Music] has is normally distributed so for ow but we need this condition such that kow is greater than epsilon then if we normalize it properly as with any central limit theorem uh result that converges in distribution to uh normal uh with mean zero and some uh variance the the the problem that the required thing for um the reason we require this this extra condition is that the expression for that um variance at um at the set w naught is equal to zero and that's not a normal distribution and it's not undefined there um so uh theorems from empirical processes um by say glivenko that actually says that um so they are both both they are both results hold uniformly well for the first one it hold uniformly

everywhere the for the second one it holds uniformly everywhere uniformly in w uniformly in w outside of some [Music] small w epsilon sets where w epsilon is equal to the set where w is less than or equal to epsilon where k is less than or equal to epsilon right um so and and this this immediately implies that um if we uh if we uh normalize this expression by this factor and i i'm so if someone wants to work with me to figure out what um like uh the x exact expression of this uh standard deviation and see that um i'm guessing that this is trying to divide out um standard deviation uh near uh w0 okay um uh yeah and and you you will see that when we when we prove the main formula one and uh show that at w0 but in the pullback instead uh in view zero um uh this this number this normalization this denominator exactly make the uh standard deviation a constant instead of something that depending on w um near w0 uh but for now we have that this thing sorry this this random variable converges in distribution to a gaussian process uh on uh w but outside of uh wf epsilon right but the thing is that's not how you run what we want is to to to get um to to get uh get a handle on the learning behavior near uh near w0 which is inside this double epsilon set we want to be near it not bounded away from it so let's go to the next board so i'll go um is to show that we can re-parametrize um by introducing a new manifold and a parametrization function like that w geo view such that under this re-parametrization this psi random variable after remote parameterization converges to a gaussian process everywhere um but in the in the in the newer larger space okay um uh by the way i found a kind of a little historical historical remark in sir in remark 2.5 in the great book and it's it might be indicative of how these things get discovered i'm not sure but um the the history seems to be so hironaka proved um resolution and then um and then at atia um in 1970 um uh wrote a paper which is i think some of you might be very familiar with which is a division of distribution using resolution paper uh but watanabe seems to read it as um so seems to read that paper as giving the result if f is analytic that implies that f g of u is equal to lv u to the k where one on a of u um so a is a has a reciprocal um so so this this follows very closely what uh that the result that we want we will be proving uh today um but i i i i read through this paper but i would love to work through someone with uh

with someone to if someone knows how to read their paper and get this conclusion i would very much like to know i think it's i think it's not in there is it not in that uh well not exactly the form we want i think yeah um well it's definitely not exactly the formula one because we are dealing with um uh sort of distribution valued um analytic function which um but i i can't even find uh results about like f of x analytic in there so i'm not sure how he read that right that way um uh and also he claims that uh this paper shows that um a resolution on real analytic function has a complexification but then this paper actually just claimed that hironaka has uh already has a version uh already proved that it can be complexified and extend to uh the whole complex plane but um again i i hear like on a paper is probably too too too large to find as that pinpoint exactly where these things are um but then the complexity will be important but probably not for us today um we might get there today okay um right and and then uh watanabe in uh in uh i think the first paper was uh 1999 um we get slt um in there um but i think interestingly uh it's it's using the bernstein starter b function first and not via formula one like formula one is not explicitly proven but uh but the the the convergence property of psi and uh in the resolved space is is used implicitly right um okay um so let's let's prove um main formula one uh carefully and the way i'm going to do that is to uh to to to sort of build things up from uh maybe maybe more on the uh bottom-up level where um i will state the assumption um and i'm right i'll write assumption in red as we use them uh just to to to show where the assumptions are important right because because if you recall in the gray book there is a whole fundamental condition as a series of conditions called fundamental condition ones and and those things are assumed at different places okay so um let's okay let's let's use this board let's keep using this board and um i'm going to state um so let's let's start from resolution singularity and i think this is this is the theorem quoted as theorem 2.3 in the gray book and i think it's the version uh sort of stated clearly and explicated by atiyah in the in his 1970 paper so um theorem uh this is 2.3 um again all of this is in the gradebook um so let k of w um b so i'm overloading the the k here kow is just any for for now in this theorem is just any real analytic function define in and in a open neighborhood

of zero in r to the d um and k of zero is zero and k is none negative and non-constant so it's not zero somewhere so it's strictly positive somewhere um then there exists a triple r w or maybe i should write u plus u w r g where uh g is a function from u to w um so w is um open neighborhood of zero in r to the d uh possibly smaller than where k of w is strictly defined um and u is uh real analytic manifold of d-dimensional right and g is proper real analytic and and furthermore g is analytic isomorphism um on one to one and on two um away from the zero set of k and it's pullback right and the important thing for us is that um pulling k back by g in my g gives us u to a 2k and g itself has jacobian um equal to so k becomes normal crossing and the jacobian is crossing as well with b of u not equal analytic and not equal to zero okay so let's go to the next board oops so we want to apply the resolution um to kw um where we are oops let me write it in the notation we have today we want to apply we want to apply it to this function which means here we assume k is real analytic right and the fact that it is non-negative non-negative follows from properties of k right um and we are focusing on uh neighborhood of w0 which is equal to the zero set of k right um okay so the idea is that we will be um applying resolution on every point of w naught in w0 so let me draw a picture so that's w and then we have i'm use green for the good ones for the good parameters say something like that um so we will be for each point in that we will get a triple w so u w and g and the other point we'll get uh u prime w prime uh g prime and and so on and then we are going to tile the whole whole w node the green is the zero set of k tell the whole w note with the resolve triple um and then we will here's where we assume uh w is compact uh and and therefore w zero is compared as well because it is a closed set of the compact side um but from the construction of both uh we get uh so we have a set of triples of like w uh sorry u i w i and gi which is the resolution triple we get from uh the theorem um because the remember recorded theorem uh requires at the origin k of zero to be zero until w not to be zero um so and and this set this set of w i is um is an open cover for w naught and by compactness we take a finite sub cover um say w1 to wk um and then we uh we uh invoking uh the gluing lemma for um for analytic uh functions uh which i actually haven't seen uh uh approved myself but um i've seen a proof

for smooth functions and i think it's uh not too different and i think people uh doing things from using shift theories and so on has a very slick proof of this okay correct me if i'm wrong then okay so glue into a global resolution so we just interpret these as charts um uh g where m is now a d-dimensional real analytic analytic manifold uh so it's resolving a neighborhood of um of w0 and it's double epsilon and epsilon here is determined by um the epsilon can be determined by um looking at epsilon balls within this finite sub cover right okay um okay and and and then we have um in local charts so every point in uh in in w0 um we can we that there is a w uh wi up there such that um k g of u is equal to u to the 2k okay right um okay next let me squeeze it down here next we are going to assume um so this is another assumption assume that this f function is a lsq where q is the uh is a measure on uh r to the d uh defined by the truth density the density of the true distribution rsq valued analytic function um on w epsilon and where where s is um greater than or equal to um right um so so lsq is the uh the space of binary function uh is a binary space of functions um under the s norm um the ls node um specifically this mean that let me write go to the next part what this mean okay so this means that for any uh w in okay i'm introducing new notation w epsilon superscript r and this is an open set in r to the d um containing uh w um epsilon okay let me draw a picture sorry if we can somehow think of a line as uh as d dimensional out to the v um if you are uncomfortable with that just think of d s one um we have um we have a subset um which is which which we which we assume to be compact which is w by the way the continuous assumption here um there is a remark that say that we can uh if we can prove that when the parameter is uh at infinity uh it's not a uh it's not a analytic singularity of f so it's like if uh f still remains holomorphic but remains reanalytic at infinity uh then we can just consider the projective space rp d-dimensional projects project is real projective space so but otherwise we will have to just choose a large enough in practice we just choose a large enough uh set in d um and close it there right um so so that's the compact w um and then we have our uh w0 the zero set of k there and we were considering a compact set around it which is w epsilon uh with non-empty interior by the way um and now we are saying that because uh because the the the property of being

real analytic is defined by a converging power series with uh with a radius of convergence so it's actually and sort of an open condition so there is a w epsilon to the r and open set um containing w epsilon there so that um for the everywhere in that set um we can um okay let me let me make that constant so that we we have w is equal to um sum over so and that's w the f function is represented by a converging power series index by n to the d so alphas are multi indices um w minus w naught uh to half alpha and and we are using multi-index notation everywhere and this is um absolutely convergent um in some neighborhood of w naught um and these a alpha functions are lsq functions okay so the next come the main lemma like it should be a theorem by itself um it's called its theorem 6 m 6.1 in the gray book it's saying that under the re-parameterization w equal to g of u we can divide this function f which which if we recall is the normalization we want in uh the psi function uh and this this remains um analytic uh after division it's it's not so okay uh sorry that's that's i shouldn't i shouldn't write that um that's a terrible statement so that's kind of how i think about it but what what i should say is um is that we want to show that if we define axu as the division of the reparameterization but recall that kg the pullback of k by g of u is u to the 2k and the square root of that is just used as a k so we get uh we want to show that this is analytic okay that function isn't fine right you want to say that i mean it isn't the easiest form to understand this is clearly what it sort of says but the formal statement you want to say that there exists some a such that a f is equal to u to the k a right um i mean i guess that's how you interpret saying it's analytic but i mean both ever f of x g of u uh so so the f function exists and the u to the k exists so i'm thinking about dividing well the thing is that yeah i guess this is maybe not an uncommon thing to write in math but that that ratio as literally interpreted as a function you'd have to evaluate the numerator and denominator and then i i think okay so that expression is not a well-defined function at zero now of course we know what you mean is that you you can divide it and then like as a function and then evaluate that that that result but uh yeah so so that's uh that's the proper way to say say that in uh complex analysis sure uh yeah i i see what i mean so so sine sine that on that is not defined at zero

until you define it um right um yeah uh thanks for the correction yes that's that's what i should mean so it's a um maybe that's not a definition okay um but yeah there exists so the proper statement will be that there exists a real analytic axu such that um f is equal to axu times u to the k right um okay so um let's go ahead and prove this um uh right by the way these this expression are all in local charts um so um we know that in local chart geo view by the fact that it is real analytic can be expanded as a real analytic function centered center at w0 to give um that so it's a power series in u now after composition with g um uh because g itself is uh real analytic um so the composition of rheumatic wave uh function with reality function is again analytics so we get the power series expansion um right and we will write this in two ways uh sorry or we will factor this into two terms one term is equal to um where the the indices uh recall that we we want to um sort of morally divide this by u to the k right so we want to factor out u to the k out here so we will look at indices that can be divided by multinations that can be divided by u to the k without them going negative so this is so this is saying um alpha i is greater than or equal to k i um for oi in the multinic multi-index so none of the division gives you one over u to the something so that gives us um a series of functions um u to the alpha minus k where none of this is negative um times u to the k plus another term which is everything else so alpha less than k a alpha x u to the alpha minus k u to the k right and we will call we will code this term here axu which is the existence of an analytic function that we want that's clearly analytic because those are all positive powers and we'll call that bxu and now our goal is to show that b x u is identically zero so that non-analytic part is uh vanish okay let's go back to the first board uh we will do that by observing by making the following observation and i think uh we are familiar with this trick from last week so we know that and at in local coordinates we have u to the 2k is kg of u which is equal to by definition this integral right and by a trick that might be familiar to us now we this is equal to e to the negative f um x g of u um plus f x g of u minus 1 q of x x um just to remind yourself this term because f x g of u is is equal to log of q of x divided by p and e to the negative that gives us p x g of u over q of x okay and that q of x cancels with that q

of x and p of p x g of u x uh it's a density in of x and that integrates to 1 and q of x multiplied by 1 integrates to 1 and those cancel and we recover the original expression so those two expressions are equal and and that's equal to another integral um it's equal to f squared over 2 times e to the negative t at g of u q of x dx for some um t between zero and one and that's from uh uh color expansion with uh lagrange type remainder so that's using um so e to the negative f can can be taylor expand into one minus f uh sorry second order mean value theorem with lagrange remainder says that that's equal to the second derivative evaluated at some alpha so the second derivative is um it is negative wait is it positive yeah it's it's positive positive e to the x and evaluated at some alpha where alpha is between um zero and uh the um the the argument um of the series um on two factorial um times um f squared so we are expanding around zero um right so if we set alpha equal to t times f then this is saying that alpha sorry this is saying that this condition turns into t is between zero and one okay and therefore e to the negative f minus one plus f is equal to f squared on 2 times e to the negative tf and that's how we get the expression we have both right um so dividing both sides by u to the to k um yeah so i think um we have to so for u to the two k not equal to zero um we divide both sides by u to the two k we get um uh we get one is equal to um a half times f x g of u divided by u to the k of o squared so usually k is absorbed in that um e to the negative t f x g of u q of x dx and then we are going to restrict integration region so the goal is to um so previously we have an expression for for for for this right so we have express f of x in two terms times u to the k and the uk can be divided out so this this here this here is equal to ax u plus bxu um and we want we we want a bound somehow that says that bx0 is equal to zero so we want to remove the influence of this term and we use do that by an incredible inequality so that's equal that's greater than so we shrink and the integration region so choose an l um so so for arbitrary l define um a smaller integration region which is um x where the supreme over u of um f of x g of u um is less than or equal to l right so um so first first notice that every everything in the integrand is positive and therefore restricting to a smaller domain um makes things uh smaller um and then uh notice that um this term can

be made smaller if we make that as the exponent as the negative the exponent to make t and f as large as possible so we will so in this integration domain it's that the largest possible value for f is l and uh the largest possible value for t is one as we said above so this is greater than or equal to a x u plus b x u squared times e to the negative l q of x dx okay uh next mode um so let let me rewrite what we have so far we have um integral ax u plus b x u cos square k of x dx and moving everything to the other side this is bounded above by 2 e to the l right um but we have that this integral um i don't need to rewrite this but this integral is also bounded below by this expression um maybe i'll write in let me write this way sorry i forgot the integration region which which has now changed um so this in this inequality comes from um noting that um well the difference between the both side of the inequality is something like a plus b square so minus a half b square plus a square and if we do the algebra we get this is equal to 2 a squared plus a b plus a quarter b square which is equal to 2 times a plus a half b o squared so this is um because it's uh it's a squared value so it's always greater than zero so um this thing is greater than negative um that thing so that uh a times b squared is greater than half b squared minus a squared that's where this comes from um and this is greater than um so it since ax u is always positive xq squared is always positive this is greater than um half dx u square q of x dx um and and then we have um almost arrived at a contradiction if so recall that the x u is um is equal to the sum of a alpha u and so a over x and u to the alpha minus k where we know that these terms um all have um uh some one over u i to the alpha r k i minus alpha i where this is um positive and which means that um if um v x u is not identically zero meaning like these terms uh the ax the a alpha x in in this series where alpha is less than k are not all zero uh then um as uh u goes to zero um as we get closer and closer to the um origin um b x u squared uh goes to infinity um and and therefore cannot be bounded by uh bound of both by two e to the l okay so that's the contradiction we need to show that so that's a contradiction um and we we have shown that so p x u is equal to zero four [Music] did you die yeah you can't play malaysian in here anymore that's right yeah he found a contradiction and vanished [Laughter] i remember being a bit unsure about this

gluing of resolution stuff i'll have to ask gibman about it it seems like the compactness uh hypothesis does it for us right i mean we were you were i think you were worried about being uh potentially gluing infinitely many patches together yeah maybe maybe [Music] well it's it's not exactly well it's not obvious that you can do this right i don't know this maybe it's a elementary technique but i don't know that i mean these resolutions on these different open sets they could be there's more than one resolution right so it's not like you take the unique resolution so you pick a resolution yeah and there are there are overlaps that you have to take care of in order to glue those functions into a single function and then you have to check that function remains analytic and solvents actually you're right yeah so maybe it's not just the finiteness of the peptides thing hmm i think there's something going on there maybe it is a standard thing it's just not something that i know yeah i never think about all right i guess in algebraic geometry um you know not not over reels not analytic geometry this this is not not a problem yeah why why does this sort of not come up um [Music] i mean usually we we don't construct it in that way of gluing right we do it sequentially uh that's right that's what it is yeah exactly so then we just hey we can do the same here right you resolve one patch i mean but that's that's just going through like i mean there's an algorithm to to to do this right and the algorithm is sequential and you'll get a resolution whatever that happens to be um you know why is it that we don't we don't just take that approach here i have a feeling it's just that watanabe couldn't find the right thing to cite in the literature because it's really right maybe the exact thing he wants to cite is well it certainly isn't in hiranaka's paper it's maybe implicit there so maybe i think there's actually kind of a gap in his book where he only talks about the local resolutions he well do you even really need the global resolution you could just say that i'm going to treat one point of w0 at a time do this asymptotic expansion and then be like well uh yeah i'm not sure yeah the rlct doesn't require a global resolution right i mean all of whenever you write down coordinates the way that we do with these monomials you've chosen that you've chosen some coordinates this term and that's that's local right you're not that won't work on some other far away point necessarily

so so maybe maybe it's okay maybe it's okay yeah i got back in it seems yeah they both got kicked and then had trouble rejoining maybe it's i think it must be yeah um hey guys hello that was fun you might have like a little something um well it happened to matt as well so it seems like it's roblox's fault right yeah um the roblox has was was slow when i tried to rejoin like the entire episode yeah yeah sorry about that no no no no no nothing you can do yeah please continue i guess right so the punch line is okay f we f g f x j of u is equal to a x u as defined at both um u to the k um locally uh and uh in locally in in any local chart and at any point in m uh okay uh okay that's that's to get the main thing to to prove main theorem one uh dan do you want me to continue with that a few calories um following this okay cool yeah let's do that so uh maybe sorry to bug you but maybe attach a speaker so the orb follows of course [Music] okay um um okay this is the board with a lot of assumptions written on it but um let us retain that board let's go to the next board so that's uh so then we contain all the assumptions okay uh and at the fourth board okay so uh uh coronary i'm i'm saying this corollary but um wait okay so there's a first an important query which is the expectation um over x so that's over the the true distribution of that function that we just defined is uh u to the k and that's not hard to prove um because we know that u to the 2k is equal to k g of u which by definition is that integral involving f but we have just proven that f can be affected can has can have a vector of u to the k factor out and if we just cancel u to the k on both sides because that doesn't depends on x which is the integration variable we get u to the k is equal to q of x k x u d x which is equal to the expectation which is just the expectation of x at each point u okay um and that allows us to prove our main theorem 1 which is saying in our local chart as we have constructed both view of the pull up space of the resolve space the quantity that we want to study which is the fluctuation of the learning process is equal to this expression where c n of u is an empirical process defined on o of n the resolve space so minus so uh cn is written as an as a empirical process and uh usually have the form of a function applied on um the data set which are iid are drawn from the same distribution minus its expectation but we know that expectation of u of a x u is u to the k right um and this is

the proof um is perhaps uh just a combination of the things that we have done before and just notice that um if we multiply uh root n u to the k by c n so multiply the expression below we get well that becomes one of n and then if we put u to the k into multiply u 2k into the sum we get u to the k minus u to the 2 k and that's f x i of u and therefore that part of the sum becomes um k and g of u well that part of the sum becomes oh it becomes n times e to the 2k divided by 1 on n so that part of sum becomes kg of u which is equal to u to the 2k and we recover the expression that we want okay okay there is perhaps um let's go back to the first board and talk about the empirical process itself now so um so another query is that um the expectation and this is the expectation over dn so over the entire data set of x1 up to xn because um c n um the expression in there depends on the sum of um over depends on the average over all the data set uh or the data in the data set and this mean is equal to zero and i think in the stochastic process literature this is called the correlation function uh i must say i'm not very familiar with that um so and and it's so um it's a correlation of this empirical process um is equal to i think they usually express this as a kernel something like that but this is also expectation of a dn yeah yeah expectation over dn and it's equal to the expectation over x of um ax u a x v uh let me clap bracket carefully times u to the k v to the k um um okay the the proof of the first part is uh not too hard um so expectation over dn of c n of u so if we just use the expression we have before so we have 100 n and then expectation is linear so we just push it into the sum we get expectation over dn of a x i uh u uh minus um well expectation of something constant is just the constant itself so that's u to the k um let me make my brackets clear okay and and this this excitation here um so this this variable only depends on the x i in the in the dn so all the other expectation is just a a constant one and this reduced to e x of so e x i of a x i u but e x i is i i d x right uh so so uh uh this is equal to b x a x u and it's independent of the summation variable um so so and we have proven that this is u to the k so u to the k cancels use the k and this is just zero so done first part okay second part um expectation over the end of uh c and u c and v is equal to again we'll we're going to uh okay we're going to multiply out this expression and then push through the expectation by

linearity we get so the one on root n become one on n and we get a double sum i equal to one to n j equal to one to n of uh of expectation um uh okay let me be careful about that again using the same uh reasoning um uh uh the expectation over the data set reduced to expectation by eye the the excitation over each individual observation a x i u and multiply by the explanation over x j of a x j v and then minus excitation over x i a x i u so recall that um there is a uh so so there will be this term multiplied by v to the k and then the same term but multiplied by u to the k when we multiply out the expression so the j is always with b so that's with x j and then minus u to the k e to the k okay the summation is over the whole thing here okay so um looking at the first term here um all the cross term where i is not equal to j will cancel with um some some version of of this one so um so so again so so the u terms here this becomes u to the k and this becomes v to the k and then uh when there is exactly um n number of these things that cancel with um that cancel with the cross terms in here and uh same for that one so all the cross terms cancel is what i'm saying so um and and the non cross term there is exactly n of those and they are all equal to so i is equal to j um uh uh so so when i is equal to j we can uh again by by the independence assumption we can absorb them into um the same expectation so that's equal to 1 on n so reduce down to the o cross then cancel we only got one sum um is equal to expectation of v minus u to the k v to the k um still in here um and nothing in here depends on the summation variable and there's the one on n out there so that's equal to the expression that we want sorry i might be hand waving a bit with the accounting of the crosstalk okay um all right so i i think um i think i might i might stop there in terms of query uh unless people want me to maybe we should go into the working session but i i want to comment on uh on the assumption a little bit so we have only assumed that w is compact uh k is analytic f x w is is real analytic um in real analytic in uh in the open neighborhood of w0 if we want to push further um so first of all we we claim that this e this psi n of u um uh empirical process it is defined over o of um m um but we want to show that this uh converged gaussian process on all of the resolved space um and uh uh the next theorem we should prove is to is to show that at the pullback of w0 the variance this covariance

function at u0 is is well defined it's it's not like previously where we we have to be bounded away from it and for that um the grade will assume that f x of w is not just real analytic but it's it it can be complexified meaning the meaning the power series you can substitute a complex variable into the power series and um um and then that is that is holomorphic as a function in c to the d in a complex d-dimensional space um the green book kind of papers over this or does it not need this hypothesis it seems like in the green book there's no mention of this real versus complex and extending functions to c and so um i think the green book doesn't even mention like it it quotes this us as a as a theorem like from it quits paper isn't it like that's fine yeah because it always struck me and i don't uh to be honest i for example never checked that 10 h networks actually satisfy these conditions right i mean probably it's not difficult he never uh when in his examples never really shows how to check these more technical conditions i assume it's okay but i didn't check it seems like it could be in principle kind of check continue with the next theorem to see exactly how that is used yeah i think it's a good idea i was going to say anyway that i think maybe we can just skip the working session today so i think maybe you should just finish off what you're saying and go on yeah what do you think yeah sounds good okay cool sorry hijack the whole session it's not like not like we will keel over and die if we don't get our fix of workings right okay i'm going to continue with um uh theorem 6.3 i'm skipping theorem 6.2 um so because okay so basically i'm isolating assumptions here and and during 6.3 at least part of it that the part doesn't involve um taking anti-infinity and conversion process requires the complexification assumption okay let me let me do that so um theorem 623 um so um okay uh as assuming what we have assumed uh just before this and um uh we are going to claim that um sorry um at uh let me use this notation um at u0 which is the pullback of the zero set of k we have that the variance um of um of the empirical process is equal to two it's a constant so so that's kind of what the that's kind of the answer to the question we have before where we have an empirical process but it's not defined where the zero set is but in the resolve space um uh it is defined and it is after the that that is why we introduce the um that vector after that um vector normalization by that vector we get a

constant value for standard deviation or for variance okay so proof um using the quality we have before this is um the same as checking that the expectation over x of a x u squared so recall that we have this expression uh so c and u c n of v is equal to x u times x v minus u to the k v to the k but the sorry minus u sub k v 0k but that's zero at in this set um and and v is equal to u um so we get um we get this expression we want to check that this is equal to 2. so to show that oops okay um [Music] so f x by the way i can't access my note at the moment because it's under my ipad and i create roblox um uh i would get disconnected so i'm based i'm going off in my memory here okay so uh fgrfu as we have seen before locally in the coordinate charts it's equal to this expression and here is where we assume that not only is um so so not only is f x uh w a real analytic function on r to the d um so this is the picture we have before right so that's r to the d and this is our compact w and our um w naught is in there and it's uh it's surrounded by w epsilon which is compact set but it's re it's really real analytic in this w are epsilon and what we're doing what we're doing now is to extend um everything into uh into into c to d so we're saying that there is an open set in c to the d containing w which is that's w c and then uh and then f f x of w is uh holomorphic meaning complex analytic in w epsilon c uh containing the the real part w uh epsilon in r uh these are all open sets okay um and here and and here is where the uh complexified so um as artia mentioned and watanabe quote um that once we find a resolution map for the real analytic function it can be complexified to resolve um uh in in c to the d i i don't know how the the details of how that work but um what what we the e what authentic how like what time we use it is just to say that in local coordinates now you uh now these are complex variables that's that's that's it that's that's how that's the option of um this whole construction um and the main point here is that once we have this we can use cauchy integral formula um because to get an expression for to get an expression for what is a alpha that's not u that's x what is a alpha x um ah because by quotient integral formula um in the uh in the multivariable version this is equal to one to the two pi i where is the square root of unit square root of negative one to the d because there is um that's the multivariate push integral from the version um it's equal to the

close control integral um uh where the control are c1 circular control c1 up to cd with uh each one of them has radius um say r1 up to rd where the this this control are strictly in the radius of convergence um of f and and then we'll just write down questions formula that divided by that minus okay i should i should i should write like we're expanding um near w0 and let me write the version for uh w0 first so that's w minus w0 to the alpha so that that's that minus w 0 to the alpha again alpha plus 1 meaning plus 1 in every coordinate in every multi-index again multi-index annotation is in is in use here that's multiplied by d z1 up to d that d okay and then what we need is to bound this a alpha x so by triangle inequality this is equal to 1 over 2 pi to the d the i just disappeared because of absolute value um integral um i'm just going to write theta of f is that um over um well that's the set minus w alpha plus one but recall that this that's d zap uh recall that these are circular control so z minus um whether a circular control centered at w by the way these are uh uh what is it uh cartesian products of circular controls so each of these uh is equal to you know i to the alpha i plus one um that's equal to well the modulus is equal to that so um uh again so in multi-index notation that's r to the alpha plus one okay um and then uh we can use another inequality in the same direction to take this in suit of that inside the control and therefore now this is independent of the integration variable so we can pull up the front so this is equal to one okay i'm going to use the notation what's gonna be used that's equal to m of x where m of x is equal to soup um let me be careful and make it uh inequality as well because this is soup over the uh the control but i'm going to take the ship over a larger domain and and call that m of x um that's equal to soup over um that but now in this in in this in this larger domain where f x that is all defined w epsilon c uh in the complex domain of modulus of f x z okay over 2 pi to the d and then there's integration um uh dz over c but then all the integration are um like that but then these integration is exactly equal to r because it's just the length of each control and each control is oh sorry each control is 2 pi times r so this is 2 pi to the d times r 2 partial d cancel with 2 pi d r cancel with the the plus 1 so this is all less than or equal to mx over r to the alpha okay um let's go to the next board uh hopefully

uh let me add with red that so assume ah f x w hollow morphic in here okay and let's let's walk to the last ball because i'm going to retain all the reds uh in the board uh that okay um okay so um for us i will go again our goal is to prove that um that the expectation of x over a x u squared is is is constant so um so we want to get a grip on uh this integral right we want to show that this is equal to two um the the way we are going to do this is to show that um uh is to show that in a neighborhood of uh at u0 everywhere in u0 for all you in u0 and the way we're going to show do do this is to show that it is uh it is equal to two um in away from u0 but arbitrarily close to it and then use uh dominator convergence to show that it is uh the same thing at the link at the limit okay so we have an expression for xu from before which is alpha greater than or equal to k of um a alpha x u to the alpha minus k um um so from the constructions that we from the bound we have on a alpha x we have by the triangle inequality again mod ax u is less than or equal to um uh and x r to the alpha where r is uh less than the radius of convergence um um uh of of f and uh we can just take u to be the uh to be uh right so take the mod u to be less than or equal to um r where r is just the largest domain in the local charts that we are looking at at the moment so that's r to the off so yes minus k alpha minus k um the the point is that uh modulus of u we can uh is bounded right and then um and then that's less than equal to um uh m x divided by r to the k um times some constant c so everything else just sum to some constant am i missing oh yeah yeah sorry okay yeah um the the fact that we are close to where close to u zero is important here meaning u uh u u is as small as we want to right and and so that this uh this thing so we are summing alpha multiness these um infinite number of voltages but all the r's are less than less than one so so we can invoke geometric series in each component and sum it to something finite okay um okay so again as before um we will we will arrive at this expression above um via the same trick which gives us the same trick in the proof of main theorem 621 which give us this expression e to the negative t times a x u u to the k so it's the same um uh same expression as we arrived at uh before except we have substituted the fact that f x g of u is equal to x u times u to the k um where t is between zero and one uh and and it's a constant um right so um

and recall that so record that we have divided u to the k on both sides in here right so there was a u to the k in here and then we have divided out and that's that that require u to decay not equal to zero so we are not on you you not but um we can be arbitrarily close to it um and then we will uh uh next thing we would do is to do a dominated convergence argument on the integrand so the dominant conversion theorem say that if the integrand is bounded by some function and by some integrable function then the then we can take the limit inside okay so the integrand is uh this integrand is less than or equal to um by the bound we have above here less than equal to c mx over sorry ms squared over r to the 2k because of the square here times the maximum of either one or this expression here um times q e to x yeah all right and um okay and recall that this is just um f x g of u um and we are away from u naught at the moment so uh because g is an analytic isomorphism so this reduced down to an analytic isomorphism away from the zero set so we can talk about um the maximum over w here of q x and p x w um and so so this is and this function here um this is the assumption the assumption is that this is the the the final assumption in fundamental condition one which is that this this expression here um so okay assumption assume integral mx square and then the i guess supreme over w of um uh so that's p x w uh dx is uh integrable so that's less than infinity i'm not sure how to get from this maximum over two things to to that i know that one of them does not depend on w but that's the that's the final assumption um and and then dominator convergence theorem um uh uh just so if if if this is integrable then we can take the limit we can take the limit as u to the k to zero into the integral and and that term disappear because that's disappeared to one and then that that is that becomes the uh that the whole thing becomes the integral we want except the two can get pulled to the other side and then we get the integral that we want to both that's how we bring home this theorem um sorry i'm a bit stuck on going from there to that but i guess you if you uh we can always just take this as the assumption that that that thing is integrable uh sorry i had to rejoin because i my audio stopped working but could you uh what was what was the story behind this yeah i was also wondering about this this hypothesis with the mx squared i guess this is where it gets used but i didn't follow how