WARNING: Summaries are generated by a large language model and may be inaccurate. We suggest that you use the synopsis, short and long summaries only as a loose guide to the topics discussed. The model may attribute to the speaker or other participants views that they do not in fact hold. It may also attribute to the speaker views expressed by other participants, or vice versa. The raw transcript (see the bottom of the page) is likely to be a more accurate representation of the seminar content, except for errors at the level of individual words during transcription.

Synopsis


A Gröbner basis is an important tool for calculating an upper bound on the real log canonical threshold. In this video, learn how to use division graphs, Taylor series and the Generic Division Algorithm to find a polynomial upper bound for a simple neural network. Discover how to calculate coefficients and real numbers associated to adjacent pairs of indices in a chain, and how to use the notation K alpha beta to sum all paths of length 2 to infinity from beta to alpha in a division graph.

Short Summary


The Hillary basis theorem states that the ideal generated by all polynomials is the same as the ideal generated by some finite number. In order to calculate an upper bound on the real log canonical threshold, the loss function for a statistical model must be bounded by a polynomial. To do this, the quotients must abound for all weights, and the polynomials must be rewritten as a Gröbner basis for a chosen monomial model. This is done by taking the sum of all multi-indices such that k is in capital d alpha, and adding a sum over all integers from 2 to infinity, and for each m, a sum over all descending chains of multi-indices of length n. Averaging factors, coefficients and real numbers associated to each adjacent pair of indices in the chain are also taken. Finally, the polynomial of the monomial w to the theta m on the waiting term of g k is taken.
Division graphs are a visual representation of tau factors, used to simplify expressions. K alpha beta is a sum of all paths of length 2 to infinity from beta to alpha in a division graph, with constants from the expression for the ajks. This notation can be used to find a polynomial upper bound for a simple neural network, which can be applied to the same polynomials for 10h networks to find a bound on the kl divergence.
Taylor series is a method used to calculate a true function which is zero everywhere. It is composed of polynomials and a function of x, and the ideal of the sequence of polynomials is equal to the ideal generated by f0 and f1. The Grammar basis of the ideal is composed of three polynomials and the leading terms of each must be calculated to use the Generic Division Algorithm. This algorithm calculates the quotients by summing over pairs of indices where the leading term of the Grammar basis divides the coefficient of the polynomial. Tau alpha beta is a coefficient of the monomial w to the alpha, and is calculated by summing tau alpha beta one and tau alpha beta two.
In the division graph, indices Alpha and Beta are used to calculate aj zero. Alpha is set to 1 2 j + 1 0 0 and Beta is set to a multiple of 1 1 0 0. The coefficient of f is a factorial and w to the Beta is the leading term of g zero. Alpha has a path to Beta if Gamma = 0,2j + 2j,0 + n1(-2,2) + n2(-1,1). B^2CD does not divide the endpoint of the step, as all indices in the path have their third component as 0. The only indices Beta which appear in the expression for one are those which are divisible by a b.

Long Summary


We have been trying to find a way to take the loss function for a statistical model and bound it by a polynomial to calculate an upper bound on the real log canonical threshold. We discussed the Hillary basis theorem which says the ideal generated by all polynomials is the same as the ideal generated by some finite number. We have been looking at the hardest one to check, the quotients, which must abound for all weights. If these conditions are met, equation one will hold and we can rewrite the polynomials as a Gröbner basis for a chosen monomial model.
The quotient of the kth element of the grammar basis can be written as the sum over all multi-indices such that k is in capital d alpha. This is followed by the coefficient of w to the alpha in the polynomial f j and the monomial w to the alpha divided by the leading term of g k. Additionally, a sum over all integers from 2 to infinity is added, and for each m, a sum over all descending chains of multi-indices of length n is added, such that the leading term of g k divides the last element in that chain. Averaging factors for each of the indices in the chain are then taken, as well as the coefficient of the first element in the chain and a collection of real numbers associated to each adjacent pair of indices in the chain. Finally, the polynomial of the monomial w to the theta m on the waiting term of g k is taken.
Division graphs are a useful tool for visualising and simplifying expressions involving tau factors. A division graph is an oriented graph with vertices representing multi-indices in n variables, and arrows between two multi-indices if the corresponding tau factor is non-zero. The expression for ajk is a sum over all descending chains of multi-indices of length m. The only non-zero terms in this expression are those associated with a path in the division graph, which is a sum over all paths of length m in the graph. Each arrow in the path has a tau factor associated with it.
K alpha beta is a sum of all paths of length 2 to infinity from beta to alpha in a division graph, with constants from the expression for the ajks. The expression for the quotients a jk is a sum over all pairs of indices where the leading term of j k divides w to the beta, with the coefficient of alpha in fj and the monomial w to the beta on the leading term of g k. This notation can be used to find a polynomial upper bound for a simple neural network given by the regression function f of x w equals a sine b x plus c sine dx. This example is useful to study because the polynomials that come out are the same for simple 10h networks, allowing for a bound on the kl divergence to be found.
Taylor series is used to calculate the true function which is zero everywhere. The constant is split up between polynomials and a function of x, which is important for convergence conditions. The ideal of the sequence of polynomials is equal to the ideal generated by f0 and f1, which is shown using a recursive statement.
A Grammar basis for an ideal is composed of three polynomials, and the leading terms of each polynomial must be calculated in order to calculate the quotients using the Generic Division Algorithm. The quotients are calculated by summing over pairs of indices where the leading term of the Grammar basis divides the coefficient of the polynomial, multiplied by the monomial of the coefficient divided by the leading term of the Grammar basis. This is done for each of the three polynomials in the Grammar basis, and the Division Graph of the Grammar basis must be calculated in order to calculate the quotients.
Tau alpha beta is a coefficient of the monomial w to the alpha, when the leading term of g1 divides w to the alpha. If alpha is greater than beta, then tau alpha beta 1 is the coefficient of beta in the monomial w to the alpha plus w to the alpha on a b times c d. Tau alpha beta two is the coefficient of beta in the monomial w to the alpha plus one minus one minus one one. Tau alpha beta is the sum of tau alpha beta one and tau alpha beta two.
In the division graph, any index beta can have up to three possible arrows out of it: beta + 1, -1, -1; beta + 0, 2, -2; and beta + 1, -1, -1, 1. The quotients of the first element of the gravity basis g0 will be calculated. fj alpha is only non-zero for two specific monomials: alpha b, a b to the two j plus one; and alpha 0, 0, 1, two j plus one. The expression at the top can be rewritten accordingly.
Alpha and Beta are multi-indices used to calculate aj zero in the division graph. Alpha is set to 1 2 j + 1 0 0 and Beta is set to a multiple of 1 1 0 0. The coefficient of f is a factorial and w to the Beta is the leading term of g zero. The only indices Beta which appear in the expression for one are those which are divisible by a b, and there can be a path in the division graph from Beta to Alpha only if Alpha and Beta lie on the same lattice bound by three vectors v zero, v one and v three.
Alpha has a path to Beta if Gamma = 0,2j + 2j,0 + n1(-2,2) + n2(-1,1) where n1 can be from 0 to j and n2 must be 0. To be an edge in the division graph, B^2CD must divide the endpoint of the step, but all indices in the path have their third component as 0, so B^2CD does not divide.
In this talk, it was shown that for a two neuron network with sign activations, there is a positive constant c1 such that the KL divergence is bounded. This was demonstrated by showing that the sum of the squares of the gravity basis elements is bounded on any compact space. Additionally, it was shown that the quotient of the polynomials aj0 is b to the two j on the square root of 2j plus 1 factorial. This series was shown to converge for all values of b, and the same result was found for the other elements of the Grab and basis. This concludes the talk, demonstrating the upper bound holds.

Raw Transcript


hey hey everyone um yeah so as dan said this will be um part three of the series on talks on from analytic to algebraic so um i'll begin by reminding everyone where we'd gotten to in the last couple of talks and what it was that we were trying to achieve so um uh what we've been trying to do the whole time is find a way to take the callback language divergence so the loss function for a statistical model and bound it by a polynomial and the reason we want to do this is that it would let us calculate a upper bound on the real log canonical threshold for that model um and so i'm hoping to find a method to bound it by a polynomial of this form on the right i'll call this equation one um where these aren't just any polynomials but instead if we take the model's regression function f of x w where w is the weight and take the difference between that and the true function which generates the true probability distribution um then if we can write that as some sort of series where each term is a product of a polynomial in the weights and a function of the input so that's where we're getting these polynomials from and where we get the finite number of them uh the capital j is from the albert basis theorem which we discussed um at the start so which the hillary basis theorem says that the ideal generated by all of those um infinitely or that whole sequence of polynomials is the same as the ideal generated by some finite number okay um so from there a couple talks ago we talked about grammar bases and how once we've found a finite basis for this ideal uh we are free to swap it with a grabner basis for a chosen monomial model and that won't change any of the properties like the rlct so um we'll rewrite um g as g0 up to gs for as um a graven of basis for this ideal and throughout the last couple talks and for this talk we'll be using uh graded lexicographic monomial order um okay so what we found fairly early on in the series of talks is that uh equation one will hold given some convergence conditions on the um on the ejx things and also the fjws uh which i'll skip those two conversions but the sort of the hardest one to check which we've been looking at in the most detail is if the following series so these polynomials are called the quotients uh i'll remind everyone what they are in a second uh but if these series abounded for all w and w um so where h ajk is called equation and they come from being able to write each polynomial in our sequence in terms of the growth in the basis
um but importantly we sort of talked about last week how these quotients aren't aren't going to be unique they could be lots of different things that could give an expression that looks the same and that can make it hard to check this convergence so in particular one way you could get different quotients is by using the division algorithm to calculate them and if you um change the way these revenue basis elements are ordered that will change the equations that are output and that's because the division algorithm at least that makes a choice based on that order um so i'll go to the second board um but because of that issue we investigated the generic division algorithm which is similar to the standard one but instead averages over all the divisors at each step and using that generic division algorithm we were able to find some fairly complicated looking expressions for equations so i'll write that out and then try and um remind everyone what all the uh notation in the following session is so the quotient of the kth element of the grammar basis can be written as the sum over all multi-indices such that k is in capital d alpha so that's another way of saying that the waiting term of g k divides w to the alpha um well lowercase d alpha is the size of that set capital d alpha then we have the coefficient of w to the alpha in the polynomial f j and then we have the monomial w to the alpha divided by the leading term of g k and then we have a sum over all integers from 2 to infinity um and then for each m we add up over all descending chains of multi-indices of length n such that the leading term of g k divides the last element in that chain um then we have all the sort of called average you can call averaging factors for each of those indices in the chain um we take the coefficient of the first element in the chain then we have a collection of real numbers associated to each adjacent pair of indices in the chain um and i'll yeah i'll remind everyone what how these are defined in a second and then the polynomial of the monomial w to the theta m on the waiting term of g k so just to remind everyone um d alpha is uh the set that records which elements of the gravity basis which of those have leading terms which divide w to the alpha and the lowercase d alpha is just the size of that set and then for the tau um factors we first defined given a pair of multi-indices alpha and beta and then index from our congruent basis uh we define that as we first look at the polynomial w to the
alpha divided by the leading term of g k times all of g k and we take the coefficient of w to the beta in that polynomial uh and so this is only defined if k is in d alpha in other words if the leading term of g k divides w to the alpha so that this fraction makes sense and then when we write t alpha beta without any specific third index that is adding up all over all the um basis elements these leading terms divide w to the alpha and we add up those tau alpha beta k factors so both of these things are real numbers um so there's obviously a lot of sort of i mean this is a fairly difficult uh expression to interpret there's quite a bit going on there but i'll go to the next board to talk about division graphs which are a useful way to visualize what's going on in those expressions and so and using the division graph we can also write those expressions in a simpler looking way um okay so given a graph in the basis which will again write as g0 up to gs ah so the division graph is an oriented graph where the vertices are just all multi-indices in n variables and we have an arrow from an index beta to another index alpha um if that tau of beta factor is non-zero uh i'll just mention that um from the definition tau alpha beta can only be non-zero if alpha is greater than or equal to beta in the chosen monomial order and so that means in the division graph you're only ever going to have arrows in non-decreasing directions okay so in the expression for the question ajk we had a sum over all chains all descending chains of multi-indices of length m i know i'll just write out being summoned again okay so um if any of the tau uh beta i beta i plus one factors are zero then the whole term for that chain will be zero so what that's saying is that um the only non-zero terms in this expression um they are ones which um have a chain of indices or multi-indices which form a path in the division graph so yeah so for this product to be non-zero each um pair of adjacent indices have to be connected by an arrow in the division graph so the only remaining terms in this sum are those which are associated to a path in that graph um so in other words this uh sum can be thought of as being a sum over all paths of length m in the division graph so we can just visualize like one of those terms that has a bunch of indices in the division graph and arrows going along in one direction and each arrow has a tau factor okay so moving over to the last board um given two indices alpha and beta
um was where alpha is strictly greater than theta we define k alpha beta as the sum of all paths of length 2 to infinity of r where we add up over all descending chains of multi-indices where the first element in that chain is alpha and the last one is theta um and we include um a bunch of the constants from the expression in uh for the ajks so in other words this is a sum over all paths from beta to alpha in the division graph and we also define k of alpha as one on lowercase d alpha if d alpha is non-zero and zero if the alpha is zero um okay so using uh i i will mention that so the only thing that k alpha beta depends on is um these i guess all these real numbers which in turn only depend on the group the choice of growth and basis so they don't have um anything to do with like the specific fj's so you can think of all of this like the division graph and these k alpha beta things being um it's coming from a graph and basis and nothing else but using this notation we can write rewrite the expression for the quotients a jk as the sum over all pairs of indices where the leading term of j k divides w to the beta and then we just have the we absorb all the complicated notation into k alpha beta then we have the coefficient of alpha in fj and the monomial w to the beta on the leading term of g k so this looks a lot simpler uh i will say that then all the work in doing a calculation is sort of hidden in this factor and so [Music] we are i'll go back to the first board and i think we've got all the definitions we need to actually prove a polynomial upper bound for a simple neural network so um this this um example comes from watanabe's book uh algebraic young june singular learning theory and so it's example 3.2 in his book um so in that example he doesn't uh study it in order to get a polynomial up about i think his focus is more on looking at the um zero sets of the uh kl divergence but we'll use the example to try and yeah find another bound on the kl divergence so the model is given by the regression function f of x w equals a sine b x plus c sine dx where so the input is x and the weights are a b c and d um so this can be thought of as a simple neural network with two neurons and sine activation functions which i guess isn't very uh it's not really a thing that people use in practice um but it's still useful to study this example because it turns out a lot of the polynomials that come out are the same for simple 10h networks and so you can use this example to sort of do most of
the work for the rotanage for starting a turnage network and we also say that the true function which generates the true distribution is just zero everywhere so then we can look at the taylor series for um the model minus the true function and so just using the 10 series of sine what that comes out to be is we just have the usual constants at the front with some factorials um well then each time we take a derivative we can sign one of uh the c and so we're doing the decision we've said that we're taking the taylor series with respect to the input variable x and computing it about x is equals zero um and so then yeah each time you take a derivative with respect to x um a factor of b and d come out so that comes out to be a b to the 2 j plus 1 plus c d to the 2 j plus 1 times x to the 2 j plus 1. and so this is exactly the form that we needed to be in for um i guess remark 7.6 which i guess started this whole series of talks and it's a form i guess we've been assuming the whole time where each term is a product between a polynomial in the weights and a linearly independent function of the input um so where fj of w is 1 on the square root of 2j plus 1 factorial times a b to the t j plus one plus c d e to the two j plus one and e j of x is um minus one to the j um square root of two j plus one factorial x to the two j plus one and so you might have noticed uh there seems to be some level of choice on how we split up this uh constant between the polynomials and the function of x um and so this choice actually turns out to be sort of important um in the first lecture i think it was one of the con we talked about how we need some convergence conditions on the series of ejs and the series of fj's which i've got to skip doing those calculations for this example but i will say when we make this choice of um this choice to split up the constant this way those work out uh i guess the interesting thing is that if you sort of split it up differently those uh convergence conditions might uphold so it can take i guess yeah just a bit of trial and error just see what works um but anyway so we have now have a sequence of polynomials the fj's and what another um he shows that the first two generate the ideal of the whole sequence so the ideal generated by all of those polynomials is equal to the idea generated by f0 and f1 uh and the way he shows that is he finds a recursive statement uh writing f k plus one in terms of uh f zero f one and f k minus one so sort of unraveling that recursive statement
uh you can see that only f0 and f1 uh matter so i'll go to the second board um so what that tells us is f0 which is also i'm gonna actually i'm gonna drop the constants out the front for f0 and f1 um it doesn't matter too much just because uh we're just doing changing it for a finite number um but so i'll say f zero is a b plus c d and f one is ab3 plus cd3 so this is the basis for that ideal on the previous slide um okay so all our methods using the generic division of our algorithm need us to be using a grabner basis for created lexicographic order um so we can calculate that using an algorithm called book workers algorithm oh there's um lots of computer algebra packages that can calculate graph the bases or automatically so a grammar basis for that ideal is g which has three polynomials the same one the first one is actually the same as f zero it's a b plus c d while the second one is um b squared c c d minus c d three and the third one is a c d three plus b c squared d squared um so we'll write these as g0 g1 and g2 and so all our calculations are going to need us to be using the leading terms of these basis elements and so yeah we're using graded lexicographic order which i'll just remind everyone it orders monomials first based on the their total degree and if two binomials have the same total degree it compares them using basically um alphabetical order so the leading term of g0 is a b the leading term of g1 is b squared cd and the leading term of g uh g2 is acg3 okay so then following on from what we looked at before uh we need to calculate the quotients um using this basis and so they were the expression where you got to just a couple slides ago was that a jk is the sum over all pairs of indices where um the leading of g k divides um w to the beta uh then we have this k alpha beta function which was the sum over all paths from beta to alpha in the division graph the coefficient of w to the alpha in f j and then the monomial w to the beta divided by the leading term of g k so we need to calculate this for k equals zero one and two because we have those three elements of our gravity basis and j equals zero up to infinity being the sequence of the fj polynomials so to calculate this i guess the main thing we want to look need to calculate is this k alpha beta thing and that requires us to uh understand the division graph for that crop no basis or in other words we want to calculate tau alpha beta for each pair of multi indices so let's go to the next board and start
computing some of those okay so let's first look at tau alpha beta 1 um so that exists uh i guess yeah by definition that exists when the leading term of g1 divides w to the alpha in other words a b divides w to the alpha um as a multi-index a b can be written as w to the one one zero so this is the same as saying alpha has to equal gamma plus one one zero zero or some multi-index gamma okay so let's assume that a b divides w to the alpha then tau alpha beta 1 is w d alpha on i just saw this should be talking about g 0 based on the way i've numbered the [Music] dragon bases on the previous slide okay so wc alpha divided by the leading term g zero then we multiply that by all of j zero and take the coefficient of beta in that um so writing that out that's uh w to the alpha plus w to the alpha on a b times c d then we take the curve of beta um i should have said at the start we are assuming alpha is greater than beta so obviously uh this first term we can get we can drop because it is not equal to w to the beta so this is the coefficient of theta in the monomial w to the alpha on a b times c d uh writing everything uh in terms of multi-indices that's w to the gamma plus one one zero zero on w to the one one zero zero times double e to the zero zero one one now we take the coefficient of beta that comes out as w to the gamma plus zero zero one one that's just a monomial uh so the coefficient of w to the beta in that monomial is one if gamma plus zero zero one one equals beta and zero otherwise so we can write this in terms of a chronic delta thing um writing this as delta of beta equals gamma plus 0 0 one one ah so this this is just a way of writing the above um uh the above the quality um we can plug alpha back in in place of gamma so we can replace gamma with alpha minus one one zero zero and rearrange the what's inside the delta and that turns out to be delta of alpha equals beta plus one one minus one minus one okay um and then going to the last board so if we repeat if we were to repeat the same sort of calculation we get that tau alpha beta two is minus delta alpha equals zero to zero minus two ah sorry plus b uh let me erase some stuff there so plus beta and the one for the third l oh i just realized i have i messed up again the uh notation of the graph no basis um that should be tau alpha theta one okay and tau alpha beta two is um delta alpha equals beta plus one minus one minus one one and so uh tau alpha beta is the sum over all k's in d alpha tau alpha beta okay so we can write that
as um so i'll write delta zero alpha which equals one if the leading term of g zero divides w to the alpha and is zero otherwise um times the tau alpha beta zero um what we found for that then we have delta one alpha which is one when um the leading term of g1 divides w to the alpha and zero otherwise oh that's a zero and then a similar thing for the final element of the gravity basis so looking at this if we take some multi-index beta which we can look at this and think about all right which alphas could uh make this tau alpha beta thing non-zero and there's only three so alpha sorry alpha could be beta plus one one minus one minus one or beta plus o two o minus two the plus one minus one minus one one so what that tells us is that any index beta can have um three possible arrows out of it in the division graph so in the division graph if we have beta there's at most three things that could have arrows from it specifically theta plus one one minus one minus one theta plus o two o minus two and um theta plus one minus one minus one one uh so i mean all three of these arrows don't have to exist though uh and that's because of these other deltas based on whether the leading term of g0 or g1 or g to divide the endpoint of that arrow okay so now we have those tail terms we can try and actually do the calculation i'm thinking i'm running a bit short on time i'm just wondering dan is it okay if i go like five to ten minutes over yeah no problem we gotta we gotta break after this sure thank you um okay so what we will calculate um we won't calculate all the quotients uh because they'll take quite a while but what we will calculate is the quotients of the first element of the grammar basis g zero and so just to remind everyone of the expression for that it was the sum over all these um okay alpha theta things the coefficients of f j and these monomials okay um so the first thing we can do is we can look at f j and that was uh one on square root of two j plus one factorial a b to the two j plus one plus c d to the two j plus one so there's only two terms in fj and so that means this fj alpha is only non-zero for those two specific monomials so fj alpha is um one on the square root of two j plus one factorial if alpha is one two j plus one zero zero so that's in other words alpha b a b to the two j plus one or if alpha is zero zero one two g plus one and it's zero otherwise so there's only two of these alphas in the expression at the top that matter so we can rewrite
aj zero as sum where we first set alpha is 1 2 j plus 1 0 0 we add up over all smaller indices beta and so this zero in being in d beta thing we can write that as such that a b divides w to the beta um just writing out the expression then the just from above the coefficient from f is this factorial then we have w to the beta on a b which is the leading term of g zero i'll call this term this sum term one and the second one is where we set alpha as zero zero one two j plus one so alpha is c d to the two j plus one term and then everything that appears in that sum is pretty much the same thing oops although this should be zero zero one two j plus one i'll call this one part two uh i will only calculate part one though and the computation of uh expression two is pretty similar so going to the second board okay so we're computing one so let's just set alpha equals one two j plus one zero zero okay so what the process is going to be is we're going to find we're going to use the expression for tau alpha beta to work out first okay which uh indices theta that is smaller than alpha can have paths to alpha in the division graph um and then we're gonna look a bit more look at that in a bit more detail and realize that we can actually rule out most of those potential paths um and then from there it will be quite simple to compute the equation um aj zero okay so first we have that k alpha alpha is one on lower case d alpha so that's the number of uh leading terms in the grammar basis which divide w to the alpha so in this case w to the alpha is a b to the two j plus one while our leading terms were a b b squared c d and a c d 3 and the only one that divides w to the alpha is a b and so that equals one um [Music] meanwhile the only indices beta which appear in the expression for one they are those which um are divisible by a b so um in other words that can only happen if theta can be written as gamma plus 1 1 0 0 for some multi-index gamma um meanwhile if we look at the expression for uh tau alpha beta which is on the spinal board um uh i'll just say alpha prime so this alpha prime is different from the alpha we set fixed up up there but the expression for a general tau alpha fan beta that tells us uh there can be a path um in the division graph from beta to alpha only if alpha and beta lie on the same lattice bound by the three vectors um which i'll write v zero which is one one minus one minus one v one which is o two o minus two and v three which is one minus one minus one one so
in other words for for any path from beta to alpha to possibly exist we need to be able to write alpha equals beta plus n1 times v zero sorry n zero should say times c zero plus n one times v one plus m 2 times v 2. the sum non-negative integers and 0 and 1 and n 2. uh and so alpha was one two j plus one zero zero we had the condition that theta has to equal gamma plus one one zero zero so we can plug that in for beta here and rearrange to solve for gamma and what that gives is that gamma would have to equal zero two j plus two j zero zero plus n one times minus one so we're moving everything over to the other side so just sort the signs oh sorry n zero and two and one sorry o minus two o two plus n two minus one one one nice one and so um [Music] gamma had to have b and multi-index so all of its components had to be non-negative uh that can only happen if um and 0 is zero and n2 is zero otherwise if either frozen on zero the first component of gamma would be negative meanwhile n1 can be anywhere from zero up to j um [Music] okay so what that tells us is [Applause] that um beta which the indices beta which may have paths to alpha ah we can write them we can index them by theta and one they are alpha which was one two j plus one zero zero plus n one o minus two o two um where n one can be zero up to j um but since we were assuming beta is strictly less than alpha we can cut out the n1 equals zero case ah okay and so from a to n1 to get to alpha the uh only possibly the only potential part like path that could exist would be going n1 steps along the vector v1 okay um so what we found is which indices could have paths to alpha and what those paths could be but um the tau term for beta going to beta plus v2 that if we go back to our definition of what tau was is just this delta one um no sorry v1 instead of e2 there um theta plus v1 so what that is is minus 1 if the leading term of g1 which was b squared cd if that divides the endpoint of that step so beta plus v one and zero otherwise so for one of these steps in the above sketch to actually be an error in the division graph we need a b squared c d to divide the endpoint but everything all the indices in this path have have their third component zero um so beta n1 starts at like uh as this starts with component 0 and v1 going along the vector v1 doesn't change the third component um well b squared cd is divisible by c um which in other words is has its third component is one and so b squared c d doesn't divide
um beta n1 plus let's say k b1 for any k so [Music] uh what we can say is this path doesn't exist [Laughter] so what we found is that given uh our choice of alpha which was one two j plus one zero zero there aren't any actual there aren't actually any smaller indices beta with paths to alpha in the division graph so what that tells us is that only the k alpha alpha term contributes to to uh expression one um [Music] and so what that comes out as is that expression one is k alpha alpha times the coefficient of alpha in fj which was this factorial um times uh w to the alpha which is a b to the 2 j plus 1 on the leading term of g 0 which is a b now moving to the final bar um we saw that k alpha alpha was one so that means part one of the expression for the quotient is v to the two j on square root of two j plus one factorial ah okay so then you can do a similar calculation for part two of the expression um and what you get from that is that that whole second sum is actually zero so from that we get that the quotient aj zero is just b to the two j so b to the t j on the square root of 2j plus 1 factorial we can go and check the convergence so we add up all those quotients we've just found for you first square them and what that is is the sum from 0 to infinity of b to the four j on two j plus one vector o and so you can use the ratio test to show that this converges for all values of b um i think you can use yeah you can do something like the first draft and test to show that this is bounded on all compact are white spaces and if you do the same type of calculation or similar calculations you get the same result for this series involving the quotients for the um for g one and g two the other elements of the grab and basis so you can get the the same result the sum of all their squares is bounded on any compact way space so i'll quickly run back to the first board to conclude the talk okay so from all that uh we've shown everything that we need for the upper bound to hold so we can conclude for this model which is the two neuron network with sign activations there's some positive constants c1 so that the kl divergence is smaller than uh then we just square each of the gravity basis elements um but we found we saw that once we have a finite basis we're free to swap it without changing the whipshits equivalents so we can go back to our original basis of the idea so there's some other constant which gives us a bound where our polynomials are just these a b or c d and a b three plus c d three it's