WARNING: Summaries are generated by a large language model and may be inaccurate. We suggest that you use the synopsis, short and long summaries only as a loose guide to the topics discussed.
The model may attribute to the speaker or other participants views that they do not in fact hold. It may also attribute to the speaker views expressed by other participants, or vice versa. The raw transcript (see the bottom of the page) is likely to be a more accurate representation of the seminar content, except
for errors at the level of individual words during transcription.
This video explains how to calculate the generalization error of a machine learning algorithm using algebraic methods. Coloring methods and Newton diagrams are used to find the resolution of singularity and calculate ROCT. An example of a two-dimensional polynomial is provided, along with an overview of Newton polytopes, fans, rational cones, and smooth cones. Edmund's suggestion of using a toric variety and its Newton polygon to define the variety is also discussed, along with the need for further research on the theory.
This paper proposes the use of coloring methods or Newton diagrams to calculate the generalization error of a learning machine. Algebraic methods are used to find the resolution of singularity and calculate the ROCT. Bayesian generalization error is asymptotically related to Lambda on N minus 1 and log n, plus smaller order terms. Newton polygon and turret method can be used to evaluate similar integrals by constructing a resolution map G and finding its pullback by G of H. An example is provided of a two-dimensional polynomial with terms 5, 3, 2 and 0.
Newton polytope is a convex hull of points which forms a region bounded by two faces. A fan is created by the normal vectors to the faces, which consists of 3,2 for Gamma 1, 2,3 for Gamma 2 and 1,0 for the vertical face. A subset gamma of a polytope is a face if there exists a vector B such that the inner product of alpha and B is less than or equal to the inner product of alpha prime and B for all other alpha prime in the polytope. This creates a cone, which is a set of all such factors Beta and R2D that satisfy this condition. Rational cones are generated by a set of non-negative linear combinations and have a generator with integer coordinates and minimum length. Simplicial and smooth cones can also be generated if the minimal generators of the cone are linearly independent and span the lattice, respectively.
A collection of cones forms a fan called F, which can be refined to a locally complete and smooth fan F'. Maximal current is a concept related to smooth polynomials, defined by generators and a blowout map which maps the polynomial to R to D. A map G is defined in a local chart by two vectors, and the refinement of the fan is based on the construction of a joint parallelogram. The Jacobian of the map is given by a determinant of a Matrix multiplied by U1, and this transformation is used to evaluate the zeta function for computing the local ROCT logo in each chart.
In order to find the global RCT, one must first find the local RCT in each local chart by factoring out the common factor for each coordinate and computing the column sum of the Jacobian. Blowout maps can be used to desingularize a function if it is non-degenerate and Yamazaki's paper outlines a subset of cases where this is possible. Machines can be used to convert an equivalent polynomial into computing real roots and a simple shift can resolve the issue if the polynomial is degenerate. This is an improvement in the resolution procedure, speeding up the process.
Edmund discussed the process of simplifying a polynomial to find a resolution of a Kullback-Leibler divergence. This involves changing the coordinate system to make sure the polynomial is non-degenerate and finding an equivalent polynomial. He suggested that the correct view is to get a toric variety out of the divergence, looking at its Newton polygon and using that data to define the variety. He noted that the non-degeneracy condition should be set at the variety level, and that further research into the theory is necessary to find the equivalent polynomial for a neural network.
The paper proposes the use of coloring methods or Newton diagrams to calculate the generalization error (ROCT) of a learning machine. A polynomial, h(w), is used to represent the divergence between the truth and the model. Algebraic methods are used to find the resolution of singularity and calculate the ROCT as the equivalence preserves it. This is done using an elementary example and a mixture of binomial model to show exact expressions of the ROCT.
The Bayesian generalization error (BG) as a function of data set size (n) is asymptotically related to Lambda on N minus 1 and log n, plus smaller order terms. Lambda M is the largest pole location on the real axis, and N is the order of the data function. Newton polygon and turret method can be used to evaluate similar integrals, by constructing a resolution map G and finding its pullback by G of H, which is normal crossing. This involves finding a polynomial where the Newton polytope is the convex hull of the set of non-zero coefficients in the polynomial. An example of this is a two-dimensional polynomial with terms 5, 3, 2 and 0.
Newton polytope is the convex hull of points which form a region bounded by two faces, Gamma 1 and Gamma 2. Producing a fan, which consists of the normal vectors to the faces, creates an intuitive picture. The fan is a collection of vectors, where the normal vector for Gamma 1 is 3,2, for Gamma 2 is 2,3, and for the vertical face is 1,0. The lengths of the vectors are not irrelevant, but the terminology in the paper uses orthogonal vectors.
A subset gamma of a polytope is a face if there exists a vector B such that the inner product of alpha and B is less than or equal to the inner product of alpha prime and B for all other alpha prime in the polytope. A cone for a face is the set of all such factors Beta and R2D that satisfy this condition. This can be used to define the Newton polytope and fan, which is a collection of cones. These cones can be used to help explain the usefulness of the Target method.
A rational cone is generated by a set of non-negative linear combinations and has a generator with integer coordinates and minimum length. If the minimal generators of the cone are linearly independent, it is called simplicial. If they also span the lattice, it is called smooth.
A collection of cones forms a fan, called F, if the face of each cone is also in F and the interception of any two cones is also in F. A refinement, F', of a fan is the cones of F' partitioned off F. If the union of the collection of cones is non-negative, then the fan is locally complete. If a smooth, locally complete fan is found, a smooth toric variety can be defined with local structure. Charts can be found which cover the toric variety.
Maximal current is a concept related to smooth polynomials, defined by a choice of d generators which are linearly independent and spanning. A blowout map is then defined which maps the polynomial to R to D, and this map is stated in local charts given by the maximal current. If the polynomial satisfies certain non-degeneracy conditions, the fan can be refined until it is locally complete and smooth, and the map will desingularize the original polynomial.
A map G is defined in a local chart by two vectors, 3,2 and 1,1. The refinement of the fan is based on the construction of a joint parallelogram and includes any vectors inside it. The Jacobian of the map is given by a determinant of the Matrix age multiplied by U1, the sum of V1j minus one, the sum of Vd j minus one and the sum of J's. This transformation is used to evaluate the zeta function, which is important for computing the local ROCT logo in each chart.
In order to find the global RCT, one must first find the local RCT in each local chart. To do this, one must factor out the common factor for each coordinate and then compute the column sum of the Jacobian, divided by the exponent of each coordinate. The lowest RCT among these will be the local RCT in that chart, and the lowest RCT among all local charts will be the global RCT. The multiplicity of the RCT is equal to the number of coordinates.
Blowout maps can be used to desingularize a function if the function is non-degenerate, meaning the zeros of its partial derivatives are completely contained in the coordinate axes. This condition can be satisfied with an equivalent transformation if needed. Yamazaki's paper outlines a subset of cases where this is possible, and has been used to find the resolution map for binomial mixtures.
A machine can be used to convert an equivalent polynomial into computing real roots without needing to be an expert in renormalization. An algorithm exists which searches all possible paths of recursive flow out, but there is no known bound on this. Subsequent people have found a bound, but the complexity of the algorithm is exponential. If the polynomial is degenerate, a simple shift can resolve the issue. This is an improvement in the resolution procedure, speeding up the process.
Finding the equivalent polynomial for a neural network is a difficult task. It requires understanding which sub-variety to blow up and what part of the Taylor series expansion is relevant. A paper discusses these methods in connection with one hidden layer 10h, however it is not clear that this method subsumes the one hidden layer 10h blow-ups. Finding the equivalent polynomial involves a lot of work and even after that, it is still not clear whether the non-degeneracy condition is satisfied.
Edmund discussed the process of simplifying a polynomial to find a resolution of a Kullback-Leibler divergence. He explained that the process involves changing the coordinate system to make sure the polynomial is non-degenerate and finding an equivalent polynomial. He suggested that the correct view is to get a toric variety out of the divergence, looking at its Newton polygon and using that data to define the variety. He noted that the non-degeneracy condition should be set at the variety level. He suggested further research into the theory is necessary.
right so um what kind of everyone so um the plan today is kind of going through a paper in uh as much depth uh uh as I prepared basically um which is kind of not much because I got stuck on various points but um we could um look at it um together um the the paper is um let me write it on the board um it's um foreign in 2010 and the idea the main point of the paper is to um propose um the usage of coloring methods or Newton diagrams in getting the rrct of learning machine so the paper title is um not analysis but it's the same topic um of BG where BG is the version generalization error um with Newton diagram um but the plan is for me to work through sort of their example um sort of Elementary example of using Newton diagram and um and later the paper worked through a kind of substantial example but um which is the mixture of binomial model to show that you can get exact expression of um of the ROCT using Torrid resolution um Let's see we can how about we can go um with that um right so um let's go through um the theory and why Target method might be interesting for special cases of singular machines and while we're going to Theory we can simultaneously use an example in the paper as as concrete realization of the theory so this the setup is the familiar one um you have the the usual um the usual model um which will denote and true density Q and prior uh and let okay let h of w um be a polynomial that is equivalent to the function from parameter to the care Divergence between the truth and model right and I think some of us might be familiar with um the usage of the term equivalent here which is defined as f a function f of w is equivalent to gfw um if there is this Epsilon and C1 C2 or positive constant um fetch that when um in a neighborhood of the zero of f you have the following bound the paper introduces the notation because this is kind of Upper and Lower bound kind of situation being quick but not caring about um constants it uses this um this symbol for um upper lower Bound for the for this equivalence right so the the whole point of introducing this this equivalence is that if we can manage to reduce um this K function the scale Divergence function before that we would um that that is kind of the main Central object in singular learning theory if we manage to reduce that to an equivalent polynomial h of w then we can use algebraic method to find the resolution of Singularity and hence fine um the irct because this equivalence preserves RCT right and
so SLT tell tells us that um the Bayesian generalization error BG as function of n where n is the data set size is asymptotically Lambda on N minus and minus 1 and log n um plus smaller smaller order you get right and Lambda M um is the largest hole largest as in the the location of the pole where it occur on the complex plane but it's a is restricted to the um real axis so it's ordered so its largest among them and N is the order of the pole um of the data function so this um will have been easy to evaluate is there is a change of coordinate or we could ask for something better or like a resolution map I.E a bi-rational um proper map G so that the pullback by G of H is normal Crossing so that this uh integral becomes U to the 2K Z um be of u u to the h d u um locally okay so how would um have a Target Target method help with this so okay um I'm citing um not only the paper but also um shall we link thesis 2001 and um paper in 1977 um that sort of uses Newton Newton polygon and turret method to evaluate similar integrals okay so the point is there is some Clause of polynomials in which we can deploy a systematic method of finding the resolution map G and the construction of such um the construction of such a map and the Criterion under which it is the construction is valid goes are used as the the concept of Newton polygon so let's introduce Newton public polygon suppose we have um of polynomial so w is a Vector in R to the D and Alpha is a multi-index entity so using multi-index notation that's equal to W1 to the alpha 1. D to the alpha t the C Alpha is a coefficient in the polynomial the Newton polytope is the following convex how so denied that P the Newton public type of f p of f is equal to well the convex Hull in R to the D of the following set so Alpha in so the set of index in the lattice and to the D such that um the coefficient in that polynomial is not equal to zero so is the exponents of the non-zero terms in the polynomial and the Newton polytube is just the convex how in r d of that we'll see a an example now sorry example um so if hfw the if we somehow manage to find a terrier W for learning machine that's equal to the following polynomial this is a two-dimensional example so that we can plot things on a plane look at this my test ability to draw um so this first term here would contrib would contribute the multi-index fives zero so um that's that point there the second term is three three so that one and then we have two two um um and then uh zero five
okay right [Music] um so the Newton polytope is the convex Hull of all these points so it's the region bounded by so is this region and might not be obvious the way I draw it but these two there is two um sorry this is coal a face so Gamma 1 and Gamma 2 referring to the red um line in the middle is a face so these um this extended line there it's also a face for the Politoed but it is not compact um so is this vertical line at both so we have two compact Bays well really there is another Zero Dimensional phase but that doesn't contribute later so the main players would be those two faced and I'm not sure if this is the correct um interpretation but it's it's it's almost as if this point in the interior of the collector is um something like higher order term um or sort of irrelevant um foreign okay so and then we can also another relevant feature would be the um so um another relevant feature would be the um so from here we can produce kind of a dull picture which is called The Fan and I think I'm just going to give the I'm sorry for more than the definition um later on but um intuitively the the face is um sort of the Border um of the Newton polygon and uh the fan is uh the fan of this polytope is consists of the normal vectors to the faces so the normal Vector for this phase if you come for for Gamma 1 would be uh the vector three two so three two is this vector and then for gamma 2 it is um two three so to three this Factor and then there is another Vector the normal Vector for this is just one zero for the vertical phase so there's another for this one is the one pointing straight up um so that constitutes um the fact well really the fan is um is the uh uh it's a collection of current span like these um vectors but that's defined that um later actually let's let's do that now okay so that's kind of an intuitive picture I see there's a right so yeah let me yeah regarding the length let me Define that uh uh um put some formal definition down and the length does uh sorry oops yeah Russell asks if the lengths are irrelevant I assume you mean of these um these vectors in the fan I think they're not but the terminology in the paper is saying something like orthogonal vectors that's not really uh they're not arbitrary orthogonal vectors so right in yes they are not and I have to hunt for uh formal definition elsewhere and to to be honest I'm I'm a little bit confused by uh sort of different literature here and uh the usage of various different um formalism to Define things so uh do tell
me when you see something that that doesn't um that that contradicts each other but I think the length um matters a little bit as let's let me Define things and we can discuss that right so um yeah I was um thinking of the blue um vectors but I really didn't want to uh kill the flow or anything I can wait no this is not a kind of a talk presentation per se but like um please do interrupt and discuss but that's a good question and I hope I will answer that now in the the next few definitions so um as a subset um gamma so let me Define the faces and formally phase a subset gamma of of the the polytope um is a phase uh if um okay thank you so um the subset gamma is a face is um if there is a vector B so almost think of it as a what not almost it is a kind of a dual Factor um it's a well it's a vector B beta such that Theta R to the D such that gamma is equal to all the points in the Politoed that achieve some kind of minimum um so it's such an alpha inner product with b and the inner product is just the euclidean dot product um is less than equal to the infinite um over all other such Alpha Prime in the Poetry um right so um okay um and all right a cone for a face um come up is um the set of all such factors um beta and r2d um that joke that um that satisfy the um the condition above um so for instance um beta cannot be um cannot be anywhere outside of the a positive author so in the two in a two-dimensional case it can't be any way outside outside the first quadrant otherwise the um uh otherwise the the instrument would just be um doesn't that doesn't exist right and uh for beta equal to the orthogonal vector you can see the orthogonal vector of a particular face you can see that um the the face would just be um the points that lie on the face um so so I guess you're right the the based on this expression the yeah the length of beta itself doesn't matter doesn't matter right yeah you can in fact just make this like unit Vector um and it will Define the same set okay right now given this Y is having um these useful so now I'm going to make kind of a series of definition um following shall we wins this is to to sort of point out why Target method might be useful whenever it can be applied so um so let's in general let's um look at the door picture so which is why we we Define um so we we first defined the Newton polytope Define the fan which is sort of a collection of of cones um um but now let's try to see how those coins might help um so a sequence of um uh definitions
a cone um generated uh by a set of vectors so those will be the generators of the code is the set of positive linear combination actually non-negative in a combination so the current is equal to um such that the coefficients are all non-negative so is the V i's are not just on RD but it's actually restricted on the lattice on entity D then we say um the current is rational in particular any uh sort of vectors on its Edge has rational slope so array is a one dimensional okay meaning is generated by one such generator and a rational array so meaning not only is it generated by one such generator but the generator can be chosen so that it lie on the lattice in that case a rational array will have uh has a minimal generator so meaning I.E generator of um meaning them length so it's kind of um uh um sorry um it's a generator of minimum length um generator in and of entity so a lattice generator it's so so it's a generator still in the lattice so it's still um having um integer coordinates um but it has minimum length so um another question here yeah so the way is defined right now the noise of rational only depends on the generators uh yes yes I assume the implicit statement is that it's a rational current if there exists a sequence of generators yeah yeah so you can choose the the generators to be uh to to have integer coordinates yes yeah I know I'm just being pedantic here I guess yeah I try to understand the sort of um uh according to Independent definition of orders but I haven't get very far um okay let me uh push on so um a rational clone so uh a rational current would have um so if it is generated by um so if you look at its generator you can choose them to be uh you can choose them to be having integer coordinates and those individual generators can can be uh well defined array and therefore have a minimum generator and therefore the whole cone itself um has has a set of minimal generators okay um promise is getting somewhere um and if the main minimal generators for a cone um so uh is linearly independent then is called its simplicial okay so these are just sort of definition for now if it if they um I'm in export if they form uh that basis for um for the lattice then day then the cone is called sniff right so if they are uh if if they are linearly independent then they are causing visual if and in addition they spend um uh the letters as well then they are called uh then then the current itself is is called smooth and I and uh I'm interested to know what's the
um the the the variety um so this will uh uh this this cone would Define sort of a variety this data work together Define authoric variety and this probably have some this this is not this um uh definitions probably have their classic meaning um but I'm not sure if um I see the connection yet okay all right so now we get back to um to the construction that construction of cones that come from Newton polygon so a collection of cone is a fan if um the face of every current um let me give this a name a fan f is the face of every cone is also in F and the interception of any two parents is also in F so that um this is Satisfied by um the the fan constructed from a Newton polygon um now if the support of the fan so if you um is the collection of Clone um the union of the collection of cone uh is the um it's a non-negative often so R greater than recorded zero to the D um then the fan is locally uh complete it's called locally complete um foreign F Prime is a refinement okay here's the main um resolution of residential Singularity Construction so the construction of a resolution map becomes uh this refinement process and a refinement f Prime of a fan of is um the cones of F Prime um our partitions um curtains off f okay um let me go back to the pictures that we draw just now to illustrate this so in the paper the um the construction of a refinement is stated as the following so if you have uh so pick um pick any two um the only two vectors the orthogonal factors just now so the blue vectors and draw um sorry for example pick um this vector and this vector okay look at the parallel parallel program the span so that's parallel again and if you include um so any uh any vet any lattice point that is included in this parallelogram for example this point here well that gets added into the refine um the refining fan so if you pick um if you pick this vector and that Vector you will add another point so this point will be 2 1. and you have repeat this one and this one you will have ADD um one two as well so this is now a subdivided plan okay let's go back yeah okay so uh now comes kind of the punch line hopefully so if we manage to find uh smooth locally complete um fan f then we can Define a smooth Tariq variety has to note that the toric variety of that fan f um and not only that we also know that they um they are covered by um so we know that sort of that local structure as well um um I find if I open I find charts you and each of them these open assigned charts um is
home homeopathic to R to the D covered by open Define charts and these charts are um defined by Maximo currents Sigma in the fan okay so uh maximal current meaning the uh um the highest dimensional code so if we are talking um since we are talking about um smooth um so the maximum current will be D dimensional um if um if you are dealing with d variables um so this maximum currents are defined by a choice of d generators um degenerators of the code so these such vectors that that is linearly independent um and spanning okay okay um and there is [Music] um and and not only that um let me go to maybe finish up in the next spot not only that this construction also defined a blowout map let's say pi for this that goes from p of f to R to D um okay but this this map stated in charts so set it in those local charts given by the maximum current um okay okay so Sigma is the maximum current um generated by um who you want uh V1 V2 VD so in this particular chart correspond to this cone um the map of both restricted to this chart is given by um sending so you will have um U1 you d map to W1 WT um and the map is defined by W1 is equal to U1 to the V 1 1 right where V is um the the vector and v11 is the first um um the first element of that back belt and YouTube to the V2 one so on u d to the v d one and W2 is the same it's the same except you start it's using the second element of each of those factors Etc okay right okay so um the the domain um theorem here is that if so going through the the steps we have a polynomial H that we want to find for which we want to find its resolution map we look at its Newton polygon if H satisfies satisfy some non-degeneracy condition which we will state in a moment uh if it satisfies that you can Define the Define the fan using the Newton polygon refine it until it satisfies um the statement of this of this construction here so that it's a locally complete and smooth we find the fan until until it's satisfied that and then Define this map this blowout map using um these charts that is given by the refined current and Define this block map this way then this map will desingularize the original um polynomial page okay um so let's let's see what it meant for our example uh polynomial page so just now we have um so it does the desingularization in one step then um that that is correct um the the complexity of this algorithm comes from uh well the refinement and later on you need to find all combination of the this maximal current and then to find the rlct you need to
basically compute the the local ROCT logo in each every chart and find the minimum one yeah so for uh for our um uh for our example just just now let's let's just write again h of w 1 to the five plus W1 q w 2 Q W 2 squared plus W 2 to the fifth and um we found that the the the the cone um the the fan is uh thank you um the fan can be represented um by the following so this is just a sort of a data that that's can represent the fan so it's all the blue arrows just now okay um so this is not standard notation I'm just saying that using this four Vector it's the fan generated by this for uh factors and then the refine fan is uh based on the construction of a joint parallelogram and include any vectors inside it is the same thing uh together with one two and two one okay so that's the refinement and uh and let's see what the block map is in each local chart so to to do that the prescription is pick any two vectors two because we are dealing with d equal to two two variables and therefore the maximum the maximum current is generated by two um two generators and uh well two generators that that is not um uh two two vectors that is not you know linearly dependent so that it's not actually maximum dimensional um so let's say we pick um three two and one one so we pick um this and sorry not three two uh oh there's another Vector here that's one one yeah which is uh which is generated by the by these two factors um with the polar grain construction so if we pick those two um vectors and put them in the Matrix a equal to put them as Columns of The Matrix 3 2 and 1 1. then you define the map in that local chart as W1 equal to U1 so you read U2 One um you read row wise okay another nice things about uh about this blog chart is that it's the Jacobian of uh of this map here is given by um let me go back and go back to the previous board and just write out the um has a nice um expression so it's a determinant of the Matrix age in the other part there which is which consists of all the vectors V's the generators Vector these are both put in columns uh and then multiply by U1 to the um sum of v1j um minus one u d sum of v d j minus one sum of J's so those sums are just sums over the values of those vectors right and uh so if you recall back in our expression of evaluating the zeta function the the Jacobian of the change of variable is important as well right so here um applying that um uh we get so let's just concretely do this transformation and we will see that H um so that's called this map G
of U equal to W1 W2 um HG of U is equal to if you substitute things back in uh back in the polynomial approach you get um so for the first time how many uh you get 15 lot of U1 um and five log id2 in the second term you get 3 times 3 sturdy nine plus two times three six fifteen Lord of U1 um two so I'm just going to copy this now okay right so um and the next step of um finding the LCT once you get this kind of substitution is just to ask um so um what's the so in this expression what is the common factor for each coordinate so for U1 what's the common factor well the common factor here the um is U to the 10. so you can factor out U2 U to 10 animes because that's the lowest of the term um for U2 it is um U so U2 to the fourth right and the the the magic well because of the theorem before uh is is that the remaining term is uh is a non-zero function in the neighborhood of zero so to see that you can we can do the factorization and see that this is equal to in other words this makes things normal Crossing plus one plus U2 right so in the neighborhood of um U1 U2 equal to zero the remaining term is not zero and and and yeah so in SLT notation that 10 there is equal to 2K1 and that 4 there is 2k2 um in this local chart okay and how do you find the rlct in this local chart well it's equal to um uh if you work through the data function it's equal to um uh Lambda so what's Lambda 1 so Lambda 1 corresponding to the RCT for for U1 not rcd the Lambda one plus one two to the first coordinate U1 is equal to well it's um the contribution of the um the contribution of the uh Jacobian is um well recall that it's um it's the it's the column sum so for U1 is the first column if the column sum e plus two well there's a minus one but there is also because of the integration in the um in the data function there's a path pass on there divided by the 2K1 divided by the exponent of um of U1 so that's equal to um a half and if you compute the same thing for U2 you get one plus one um divided by 4 which is also a half right so the lowest among this will be your local RCT in this chart and well it's it's both both of them is a half so the ROCT um in this local chart is equal to the half with a multiplicity equal to two so that's the um that's the procedure to find local RCT and then the thing to find the global velocity would just be for each such pair of generators to do the same procedure and find the lowest rrctmundo okay finally I think I should mention what's the um condition under which this um
this is applicable so so need non degeneracy okay condition um on yeah so the um the theorem that um that the blowout map defined on the um locally complete smooth fan is uh can this de-singularize the um the function as only um apply when f is uh none degenerate and F is non-degenerate that means that um non-degenerate is phase polynomial so the degenerative of f only depends on um these phase polynomials and these phase polynomials are F gamma for each compact face of the Newton polygon of f so um and F gamma so if gamma is uh contains the lattice Point um so each compact face has um has a finite number of lattice points F gamma is defined to be um C alpha one two three so uh where um the the coefficients come from the original come from the original uh polynomial right so if okay sorry that this sentence is out of order but um f is not degenerate is um these faces um has zero doesn't have zero outside of the coordinate axis so this means that um sorry that okay let me restate this so Define um and then Define the place we're going to be following f is non-degenerate if o f gamma so satisfies um countdown W1 um f gamma equals zero um so um so the zeros of the derivatives the partial derivatives down two the set of zeros dot dot capital D this set of zero is completely contained in um the set W1 W2 W the IE the coordinate axis yep so if your polynomial satisfied this condition then the fan generated by its Newton polygon can be refined and the blowout map of the refined fan will Define a desingularization map in the way we described before now [Music] um this doesn't mean that the original um the original H uh if you want to desingularize if you want to apply this method for um for a polynomial you could apply some sort of equivalent transformation until you get a non-degenerate um app and that can happen in so in the paper in the yamazaki paper there is uh it points out a subset of such cases where that is possible and it has been used rather liberally to find the resolution map for the binomial mixture yeah I think uh I will stop there sorry this is um not very um well prepared hopefully it still has kind of a coherent structure um it's it was very useful thanks Edmund yeah it's uh it's difficult because it's the the combinatorial structure in the in the sort of toric picture doesn't have obvious geometric content until you've sort of gone through a whole bunch of theory so it's difficult to present in a way where it sort of has the conceptual content is very clear but
um the overall point is something that anybody working on SLT should see which is that at least in some cases there's a a kind of well-packaged machine which will get you from an equivalent polynomial to Computing rlcts without having to be some kind of whiz at renormalization sorry uh resolution um yeah so I think the a comment in the paper is is worth mentioning which is um the the original approved for resolution of Singularity relies only on blow up and say that there exists an algorithm you to just search all possible sort of recursive blowout path of recursive flow out um and you would eventually Arise at a normal Crossing phone but there is no bound um that so here on that proved that that algorithm will stop but we don't know what's the bound on on that I think subsequent people have subsequently people probably have one bounce but um this particular method has a kind of a well-known bound algorithmic complexity bound um though it's kind of [Applause] um is not polynomial I think the paper quoted as um K to the D I think where K is um so D is the number of variables um and uh I forgot what K is so when D is 500 billion we're fine yeah yeah so it's not it's not great that it's exponential right so um yeah and the other comment would be in in any places where you find that your your polynomial is is degenerate you can try to apply so there isn't there is a second algorithm where if if there is a particular way that it is degenerate um then there is a simple sort of a fine shift oh no it's a simple shift that will make it that would at least resolve that issue but um but if it is not then you can actually do the blow up the the usual uh blob in uh by defining uh uh projective code in it um transformation uh blowout and then see if you have any luck in there and uh to to make uh to make the following uh after the blowout would you have non-degenary polynomial in in a particular coordinates of the block below um space um and I repeat um so it is at least an improvement I guess um an extra heuristic an extra subroutine in the resolution procedure that speed things up yeah it's also a way into go ahead oh sorry I was just gonna say that um my question before you spoke was a all of the F's that come out of the uh uh I guess machine you know even single layer neural network s but I guess the answer is no sometimes they are degenerate but this degeneracy condition seems to be coordinate dependent and so you can like shift out of it that's that's is that what you're
saying yeah right is there a way to reframe this so that it's called an independent so we can know that like that maybe there are just some functions that fundamentally bad then you can't just shift out of it I this is this is me um not knowing what I'm talking about but this I get a sense that um whether or not um uh the the this like change of um like detecting um um which so this is um so like it's the the degeneracy depends on just the face right and it sort of depends on the the the the relevant term in the um in the Taylor series expansion of of the of the function um and sort of figuring out what um what part of it what part of the Taylor series um is relevant seems to be sort of the core content of hironica's proof like knowing which sub um variety to blow up okay this is I this is me getting a vague feeling I have no nothing to back up package yourself um as far as I know the answer to your question is But to answer your question I um no I I at least don't know um what's the coordinate independent way of saying this yeah me either I think it's also there is a paper that discusses these kinds of methods in connection with one hidden layer 10 10 H but I don't fully understand that paper so I wouldn't right now be able to say maybe Edmund has understood in the last week enough to uh say something else but I wouldn't be able to say right now that the the usual equivalent polynomial that we come up with satisfies this non-degeneracy condition one of us should check but uh there are there is a paper that states this and applies these kinds of techniques to to some extent do what uh Yogi and Watanabe did in their paper on deriving the rlct for those neural networks but this is a still still somewhat up in the air at least from my point of view so it's not it's not clear that you can perhaps perhaps it's clear for zero true distribution or zero true Network but in the general case I it's not it's not clear that this method subsumes the one hidden layer 10h blow-ups but that's a something we're eager to settle one way or the other yeah so even in this in the binomial mixture uh example in the paper the the situation is well the a lot of work actually goes into finding the equivalent polynomial um so that's that's kind of the bulk of the problem and I guess expenses that have done that for the 10th network but even after that um to like it's um oh so I should say that even after that two steps um uh sort of use liberally um throughout the the procedure one is
to um to change coordinate to make sure that uh whatever is whatever the polynomial it is dealing with in that stage is non-degenerate and the other step is basically finding another equivalent polynomial to the polynomial at that stage um so so it seems like that procedure um it's still needed um to um to simplify the program enough until finally um the this whatever is presented today can be dropped in and say that yes that requirement will finally give us the resolution um yeah so so the the the question of whether or not um starting from the original um original k l Divergence and even after having a equivalent polynomial it is not just throwing it to the to this method and it will work yeah the search for a a um a machine to do it for us continues any other questions for Edmund do you understand the non-degeneracy condition like in terms of the equivalent polynomial or geometrically or in any other way um regardless of the continuation of your question but do you continue sorry um I was well I was sort of petering out but I was going to say sort of outside of it seems that we've taken quite a journey to get to the faces and we're in that sort of context of faces and so I can sort of interpret it there but I just yeah I guess I'm yearning for um some other uh view of it I I think that the correct view would just be um getting a toric variety out of this um so from okay from our care Divergence get an equivalent polynomial and um and Define uh looking looking at its possibly the Newton polygon and using that data to Define authoric um variety out of that and then all the geometric notion all subsequent um operation should be done on at the variety of variety level um uh so that you don't need to reference um coordinates um obviously when you're doing um the computation it will still be in coordinates but um the this condition should be set at a variety level um then correct an event just completely sure I don't know where this condition comes in presumably when you sit down and read the proof that the that blow up that you defined is non-singular it's uh it's because some kinds of singularities are will be resolved by that map and the ones that won't uh ones that are covered by this condition but I I honestly don't know it would be maybe that's something we should look at in the future I think it's worth digging further into this part of the theory I mean I guess this seminar hasn't done that much algebraic geometry um so far uh I think it's probably appropriate for there to be a bit more