This video explains the Generic Division Algorithm and its application in finding a polynomial that satisfies a certain property. It uses Grammar bases, the Hilbert basis theorem, and Bookburger's algorithm to divide an element of a polynomial ideal by a group of basis elements. The algorithm produces an expression involving the leading term of the basis element, the coefficient of x to the alpha divided by the leading term, and a sum over all chains of length two to infinity. The output is a remainder and quotients that satisfy the equation of the original polynomial.

The Hilbert basis theorem states that a polynomial ideal can be finitely generated. The polynomial division algorithm can be used to find values of the quotients in the series, but it can output a non-zero remainder which is not helpful for the calculations. Grammar bases are used to fix this issue and are defined by the set of leading terms of the polynomial ideal. Bookburger's algorithm can be used to turn a finite generating set into a Grammar basis, and a lemma states that the sum of the RCT of two sets is equal if they generate the same polynomial ideal.

The Generic Division Algorithm is an alternative to the Standard Division Algorithm for finding a bounding polynomial for the tail divergence of a model that can be written in a series form. It takes a polynomial and a set of divisors as inputs and produces quotients and a remainder as outputs. The algorithm begins by setting the quotients and remainder to zero and the intermediate remainder to the polynomial. It then defines a set, Lambda, which contains the largest monomial in the polynomial and all other monomials smaller than it. At each step, the algorithm subtracts the leading term of the polynomial divided by the leading term of the divisor times the divisor, and averages over all divisors. The aim is to make the division algorithm work independently of any chosen order.

The Generic Division Algorithm is used to find a polynomial that satisfies a certain property. It begins by setting the initial polynomial p0 equal to the original polynomial f and subtracting the average of all divisors whose leading term divides x^alpha1 from f to find p1. This process is repeated for each i step, subtracting the sum of all divisors whose leading terms divide x^alpha2 from p1 and multiplying the coefficient of x^alpha2 in p1 by the sum of tau alpha1 alpha2j terms. The result is a remainder and quotients that satisfy the equation of the original polynomial. Notation is used to calculate the coefficient of x to the theta in a polynomial.

A general expression for polynomials is proposed which involves a sum of single indices, distinct pairs of indices, and triples of distinct indices in descending order. This expression is proved by induction and used for dividing an element of a polynomial ideal by a group of basis elements using the generic division algorithm. The output of the algorithm is an expression for each equation, involving the leading term of the basis element, the coefficient of x to the alpha divided by the leading term, and a sum over all chains of length two to infinity. P1, P2, and P3 are polynomials calculated by subtracting, adding, and subtracting sums of divisors, leading terms, and averaging factors, with coefficients of x to the alpha 2, alpha 1, and alpha 3 respectively.

In this talk, the speaker is continuing a series of talks about finding a polynomial that is equivalent to the Quebec Libra divergence or loss function for a statistical model. The Hilbert basis theorem was used to find conditions which guarantee that the KL divergence for a model is bounded by a polynomial. To check this condition, the speaker discussed the polynomial division algorithm which is used to find values of the quotients in the series. However, the algorithm has some issues which need to be addressed.

A polynomial can be in the ideal generated by f1 to fj, however the division algorithm may output a non-zero remainder, which is not helpful for the calculations. Grammar bases are used to fix this issue and are defined by the set of leading terms of the polynomial ideal. This set may not equal the ideal generated by the polynomials, and the Hilbert basis theorem states that the ideal can be finitely generated.

A Grammar basis for a polynomial ideal is a generating set that has the property that the leading terms of the generating set generate the leading term ideal. Bookburger's algorithm can be used to turn a finite generating set into a Grammar basis. If an element of the ideal is divided by a Grammar basis, the output will always be a zero remainder. A lemma states that if two finite generating sets are used to generate the same polynomial ideal, two constants exist such that the sum of the RCT of the two sets is equal.

Given a model that can be written in a series form, a process to find a bounding polynomial for the tail divergence can be done by first finding a finite generating set. Then, the polynomials in the sequence can be divided by the generating set to get expressions of the form h a zero g0. Lastly, the sum of the squares of these expressions can be checked to see if it is bounded by some positive real number. However, the equations are not unique and the order of the divisors affects the division algorithm.

The Generic Division Algorithm is a modification of the standard polynomial division algorithm, developed by Dan. It uses a graded lexicographic monomial order, as it has the property that there are only finitely many smaller monomials. The algorithm is used to check the convergence of a series, however it is unclear whether different orders of divisors change the convergence. The aim is to make the division algorithm work independently of any chosen order.

The Generic Division Algorithm is an alternative to the Standard Division Algorithm. It takes a polynomial f and a set of divisors as inputs and produces quotients a0 to a s and a remainder as outputs. The algorithm begins by setting the quotients and remainder to zero and the intermediate remainder e to f. It then defines a set, Lambda, which contains the largest monomial in f and all other monomials smaller than it. At each step, the algorithm subtracts the leading term of p divided by the leading term of fj times fj, where fj is the first divisor whose leading term divides the leading term of p, and averages over all divisors.

The Generic Division Algorithm works similarly to the Standard Division Algorithm, outputting a remainder and quotients that satisfy the equation of the original polynomial. The remainder consists of monomials that are not divisible by any of the leading terms of the divisors. The algorithm is independent of the order of the divisors and can be used to find general expressions for equations, which can be used in statistical models to check the upper bound conditions. Notation is used to calculate the coefficient of x to the theta in a polynomial.

An algorithm is defined with the goal of finding a polynomial p that satisfies a certain property. The algorithm begins by setting p0 equal to f and then calculating p1 by subtracting the average of all divisors whose leading term divides x^alpha1 from f. Using this expression, an expression for p2 can be calculated by subtracting the sum of all divisors whose leading terms divide x^alpha2 from p1. This expression is found by multiplying the coefficient of x^alpha2 in p1 by the sum of tau alpha1 alpha2j terms. This process can be repeated to calculate pi for any i steps in the algorithm.

P1, P2, and P3 are polynomials which are calculated by subtracting, adding, and subtracting sums of divisors, leading terms, and averaging factors. Each polynomial has coefficients of x to the alpha 2, alpha 1, and alpha 3 respectively. P2 and P3 also have terms which are sums of pairs and triples of integers between 1 and 3 respectively, each with associated averaging factors and tau terms.

A general expression for polynomials in p after i steps is proposed, which involves a sum of single indices between one and i, distinct pairs of indices, and triples of distinct indices in descending order. This expression is proved by induction, and is used to divide an element of a polynomial ideal by a group of basis elements using the generic division algorithm. The output of the algorithm is an expression for each equation, which involves a sum over all multi-indices such that the leading term of the basis element divides the alpha, the coefficient of x to the alpha divided by the leading term of the basis element, and a sum over all chains of length two to infinity.

today's talk's going to be continuing on from the series of talks i've been giving so far about trying to um find a polynomial that's equivalent to the quebec libra divergence or the loss function for a statistical model so in the first talk we introduced polynomial rings in the hilbert basis theorem which we then used in the second talk to find conditions which guarantee that the kl divergence for a model is um bounded by a polynomial so i'll just remind everyone of the main points of that so the title of the talk is continuing on from before it's from analytic to algebraic part two okay so uh we had like given a statistical model we had the analytic cost function the kl divergence and we'd say in in the second talk that when um you can write your model in terms of a certain series and when uh parts of that series converge you get an upper bound um of the following form or the callback lab where divergence is bounded by an infinite sum of polynomials and so the problem we've been trying to understand is when can we bound that by the sum over a finite subset of those polynomials and so the argument we had for finding such a bound was that the hilbert basis theorem tells us that um we have so we have infinitely many polynomials they'll the ideal they generate will have a finite generating set and so each of those polynomials can be written in terms of the first say capital j of that sequence where these a terms are also polynomials which are called p quotients and so the condition that lets us guarantee this second upper bound uh we found that if you can add up the squares of all those quotients and if you can bound that by some real number for all parameters in the weight space then you can guarantee that this second upper bound holds and so you can bound the kl divergence by a polynomial and so to be able to check this condition we need some methods of actually finding out what these quotients are or finding specific values of them to so we can plug them into this series and see if it converges and is bounded um and so last week i talked about the polynomial division algorithm um which is sort of the main way we would try to find values of the quotients so we'd take each fj and divide it by that finite subset f 0 to f capital j to try and find the quotients um but at the end of the last talk i mentioned that the polynomial division algorithm has some problems so the polynomial division algorithm um has some issues so specifically we can't use it straight away to

find these questions exactly so um it may be true that you have a polynomial in the ideal generated by say f1 to fj and so that just says that you can find there exists an expression of this form but the division algorithm might output um basically a non-zero remainder so it might output an expression of the i'll give you quotients sorry to interrupt spencer uh irritating technical point the thing above your head is about to obscure the board the microphone indicator maybe if you could stand on top of the orb or nearby the orb then it won't be thanks oh let's see so if i yep that's perfect that's good okay so um the division algorithm my output questions that give you an expression that looks like this so it oops wrong thing um it where it has a non-zero remainder uh and so if this happens we can't really use these quotients in the series to check convergence for enough about um so i'll go over to the next board and just remind you i think of an example i think i briefly talked about at the end of last like last talk um so an example of this happening was the polynomial p equals minus y squared plus its head f1 equals xy plus z and f2 equals x plus y so it's true that p is in the ideal generated by f1 and f2 specifically p can be written as f1 minus y times f2 however the division algorithm um outputs p equals zero f one plus zero f two plus p uh so p is the whole remainder um and this is if you're using say lexicographic or graded lexicographic monomial order um and so yeah if something like this happens it's not very helpful for the calculations we're trying to do um and so to try and prevent this problem uh grammar bases are often used so i think they were defined um to basically fix this issue so i'll define them here so there's a first a couple of definitions we need before i introduce grab their bases so um also for now i'll assume that we've picked a monomial order and holding it fixed um and so i guess yeah for the rest of for later parts of the talk i'll be using graded lexicographic order um so if i is a polynomial ideal we define the set of leading terms of i to be [Music] the set containing the leading term of each polynomial that's in i and so we can also think about the ideal that the set of leading terms um generates so what can happen uh if so by the hilbert base the serum h polynomial ideal can be finitely generated so if i say generated by these polynomials um it may be that the idea of leading terms of i does not equal the ideal generated by

the leading terms of the um the generators right f0 up to fj so this is actually what happens in the example at the top of this board so the leading term of p is minus y squared while the leading terms of f1 and f2 x y and x neither of these two monomials divide minus y squared and so the leading term of p doesn't lie in the ideal generated by x y and x and so this is really the cause of the issues that can happen with the division algorithm um so a grammar basis for the ideal i is a generating set for i so that just that means that the ideal generated by those polynomials equals i but one which has the property that the leading terms of that generating set uh generate the reading term ideal file so i'm just going to move over to the next board um and so a couple of properties of cravener bases grapefruit bases um firstly is that they always exist uh so there's a if you have a finite generating set for a polynomial ideal there's an algorithm called um book burgers algorithm which lets you add new elements to that generating set and turn into grammar bases and the other property which is the important one for us is that if f is in the uh ideal generated by a gravina basis also if we have a grouping basis for an ideal and we take an element in that ideal um dividing f by uh that basis always um outputs a zero remainder so it always outputs quotients which satisfy f equals q zero g zero up to q s g s with no remainder and so that's uh exactly what we need to be able to find the quotients and then check the convergence of the series um in the problem we're studying um but just a slight technical point first um the polynomials that we're looking at in our problem so these are the fj of w's they come from if you remember in the second talk they come from writing the regression function for a model minus the true functions as a series of polynomials times uh linearly independent functions of the input um so [Music] coming from that series there's no guarantee that if you take a take the finite subset of those polynomials which generate whole ideal there's no guarantee that that is a graphic basis um but the following lemma tells us that we can take that generating set and replace it with by a graphic basis without having any effect on the rlct which of um on that of on the rct which is the final thing that we're trying to calculate so the lemma says that if you have two finite generating sets for this same polynomial ideal then there exists two constants so that c1 times the sum

over the f j squared is less than or equal to the sum over the gjs squared and that's bounded by c2 times the sum over the first generating sets um in other words that's saying that the two sums or these two polynomials are equivalent according to the uh the lipschitz equivalence definition um from previously um and so that means that if we can instead do the calculation uh using questions to check the up about we can um sorry we can instead check the upper bound for using this grommet basis and that will be equivalent to the upper bound using the basis of the polynomials coming from this writing this series up here okay so i'll just go over to the final board so at the moment it looks like given a model that we can write in a series of that form uh the process we could do to try and find a bounding polynomial for the um tail divergence is first find a finite generating set which i should mention is actually can be quite difficult to actually do uh it's only the whole basis there are only guarantees that one exists but it's actually difficult to work out which what actually works uh in actual examples um then we can swap that with a group of basis [Laughter] then we can divide each um so we have a sequence of infinitely many polynomials we can divide each one by i'm just going to write that group of basis as the set capital g we can divide that each um polynomial in our sequence by the graphing basis g to get expressions of the form h a zero g0 and so on and lastly we can check once we've found all those quotients we can check the when we add up all their squares we can check whether that is bounded by some positive real number um so at this point we weren't really sure whether to go ahead and try this with examples of models um or whether we needed to first do some more work so that the main sort of issue that caused us to stumble is that the equations are not unique so there could be lots of different sets of quotient polynomials that allow they'll let you write expressions like this and so [Music] i guess an example of how that can happen is and specifically the division algorithm depends strongly on the chosen order of the group of the the divisors [Laughter] so um if you remember how the division algorithm is sort of set up you start by writing the divisors in order on the left and the first thing you do is you look at the leading term of the polynomial you're dividing and check first whether the leading term of the first divisor divides it if it does

you use that to cancel it out if it doesn't you go on to the next divisor and see whether its leading term can be used to cancel the leading time of p and so if you were to reorder these divisors you'd get a [Music] you'd usually get a different set of quotients out from the algorithm um and so since we're trying to check the convergence of this series uh it was quite unclear whether uh say if we did compute it for a an example it would be unclear whether if it conversion was bounded we wouldn't know if that was just a fluke because of the order of the devices we happen to use and similarly if it didn't converge uh we also wouldn't really know whether that just happened to also be a fluke um so instead we first investigated ways of making the division algorithm work independently of any chosen order um and so i'll introduce what's called the generic division algorithm for that uh i should say at the moment it's unclear whether these uh the following methods are necessary or crucial for doing calculations um we don't really yet know how much of an impact the this non-uniqueness can have on the convergence um i guess yeah i should say i haven't found any examples where choosing different quotients uh does change the convergence but i also haven't been able to prove that it doesn't matter either so i guess yeah quite a bit of it's unclear but uh i guess regardless of that the i think the maths underpinning the generic division algorithm is still quite interesting and it is interesting to see it work in examples um so i'm going to go back to the first thought to introduce that okay so the generic division algorithm um it's a modification of the standard polynomial division algorithm um so this was i think developed by dan several years ago for another project and while we were studying this problem dan mentioned that then realized that oh maybe this could be quite useful here um so for the rest of the talk i'm going to use graded lexicographic monomial order uh you can use any monomial order that has the property that if you pick any monomial there's only finitely many smaller monomials and so that's the case in any graded order because there's only finitely many monomials of each degree so in the standard uh division algorithm rhythm um if we have the uh what's called the intermediate remainder p so that's just what whatever is left over on the right hand side of the calculation while you're doing long division uh so that's what's left that's the sort of um you know polynomial that's

left that at each step um what we do at each step in the standard division algorithm is we subtract uh the leading term of p divided by the leading term of fj times fj where fj is the first divisor which which um whose leading term divides the leading term of p so um in the generic division algorithm uh instead of just choosing one divisor and subtracting this expression from the intermediate remainder p um we instead average over all the devices that work so [Music] um in the generic division algorithm we subtract so i'll explain what this notation is in a second so capital d alpha is the set of all integers so this index runs over the divisors and the set d alpha just records which um i just realized the way i've written it doesn't make sense um sorry i'll just fix that up okay so instead of writing the waiting time of p up here i'll write um p alpha x to the alpha so p alpha is the coefficient of the monomial x to the alpha in the polynomial p and so then d of capital d alpha is the set that records which of the divisors leading terms divides the monomial x to the alpha and little d alpha is just the number of elements in that set so when we subtract this polynomial from p we're essentially doing the same thing as in the standard division algorithm but instead of taking a single diversities we're averaging over all the divisors which um have leading terms which divide that above p ah and so i'll go to the next board and write the whole algorithm out [Laughter] so the inputs to the algorithm are the [Music] devices [Laughter] and also a polynomial f which is the polynomial that we're dividing by the devices and so the algorithm outputs quotients so write them as a0 up to a s and it also outputs a remainder um okay so the first thing that the algorithm uses is it defines a set called capital lambda which first starts with the largest monomial in f so f is a linear combination of different monomials and so we just take the largest monomial f and record its index which we'll write as alpha one and then this set contains every other monomial sorry every monomial which is smaller than alpha one uh and since there's finitely many and the smallest one will be zero we can just write that set as alpha one up to alpha q um so the algorithm begins by setting the quotients all to zero and the remainder is zero and setting this intermediate remainder e as all of f which is the polynomial we're dividing and that's just the index which keeps track of what step we're at

and so we at each step we will increase that index and we'll subtract from p um that average that i described in the previous board um so uh also um for a step to occur we also need that um there exists some divisor whose leading term divides x to the alpha if there are no devices that divide x to the alpha i we just move on to the next um index in the set capital lambda so we replace the intermediate remainder by subtracting the average of all those polynomials using each of the um divisors whose leading terms divide x to the alpha i and then we also update each quotient based on that subtraction so for each um divisor that is used in that previous step we update the quotient for it uh and so then these this like for loop and while loop uh yeah we then sorry we increase the value of v into index i and then the while we end and then round space i'll go to the next board um the algorithm ah that's the final value of p as the remainder and it outputs the final values of the equations uh so the following proposition says that this generic division algorithm works just as well as the standard division algorithm so firstly the remainder and quotients that it outputs satisfy the equation that the original polynomial equals the quotients times the divisors plus the remainder and the remainder satisfies the same conditions that it does for the standard division algorithm so r consists of [Laughter] monomials um none of which are divisible by any of the waiting terms of the devices [Laughter] and additionally if one of the quotients times the divisors is non-zero then the leading term of the original polynomial we're dividing is always greater than or equal to the leading term of that equation times divisor um and since each step of the algorithm we use the average over each divisor the output of the algorithm is then completely independent of any chosen order of the devices uh and so one thing that this algorithm lets us do is sort of uh work backwards and [Music] find expressions find general expressions for equations and so then we can use these expressions in actual examples of statistical models to try and check the [Music] upper bound conditions so first a bit of notation which we'll be using so if the reading term of f j divides x to the alpha we write tau alpha theta j as so we have this polynomial x the alpha divided by the leading term of f j times all of the divisor f j we can take that polynomial and check the coefficient of x to the [Music] theta in that polynomial and so we write

that as tau alpha beta j um so that's a real number and then we define tau alpha beta with no j as the sum over all of the divisors whose leading term divides x to the alpha um we add up all those tau alpha theta j factors um and if uh if the set d alpha is empty uh so in other words no divisors leading terms divide x to the alpha then we set um tau alpha theta as zero so i'll go over to the final so let's write um pi for the value of p after i steps in the algorithm and so from how the algorithm works uh so we have we start with p0 equals f and then just by looking at the what the algorithm does in the first step we have that p1 is equal to f minus one on the alpha one so the minus the average over all of the divisors whose lean term divides x g alpha one and then we have this polynomial f alpha one x to the alpha 1 divided by the leading term of each divisor times the whole divisor so i'll refer to this 1 and d alpha 1 as a averaging factor okay so then using this we can try and [Music] calculate p2 and p3 and so on and doing so will let us figure out an expression for pi in general so the first thing we need is the coefficient of x to the alpha 2 in p1 and so using the above expression this will be the coefficient of x to the alpha 2 in f minus f alpha 1 on the averaging factor times the sum um of over the divisors and then we have the next the alpha one on the leading term of times f j and we take the coefficient in that polynomial of x to the alpha 2. uh but the term the coefficient of x d alpha 2 of in the polynomial within the parentheses is exactly the um tau alpha 1 alpha 2 j um term that we defined on the previous board so this can be written as f alpha 2 minus f alpha 1 on d of one times the sum over the divisors in the other one time uh we sum over tau alpha one alpha two j and then if you remember this whole sum just equals tau alpha 1 alpha 2 without the j subscript okay and so the reason for calculating this expression for the coefficient of alpha 1 in sorry the coefficient of alpha 2 in e1 is that we can use it to compute an expression for e2 so the value of p at the next step in the algorithm so p2 equals p1 minus the sum over divisors whose leading terms divide x to the alpha two um and we have the averaging factor for that sum which is the alpha 2 um the coefficient of p1 sorry the coefficient of alpha 2 and p1 times x to the alpha 2 on the wing term of the divisors times the times the whole divisor so we can take this expression we found for

the coefficient of alpha 2 and p1 and plug it in here and so if we go back to the first board after a few steps of rearranging what that comes out as is p2 equals f minus the sum over we have a new index t which just sums from one to two um and then we sum over those averages and we have these familiar looking polynomials each summoned and then there's also a sum over the uh divisors these link terms divide x the alpha 2. the coefficient of alpha one and f and then we have a pair of those averaging factors and then this tau alpha one alpha two and so this is coming from just before the uh in the coefficient of alpha two in p1 we saw that there was this tau alpha 1 alpha 2 vector um and then this is all multiplied by the polynomial x to the alpha 2 divided by the weighting term of f j times all of the divisor fj so we can keep going um so we can see what the coefficient of x to the alpha 3 in p2 is and then plug that into the expression for p3 i'll skip the working but what it comes out as is p3 is the uh it equals f minus we have a sum of uh index t equals one to three and then this term is very similar to the this term in the expression for p 2. so we add up all the over all the divisors these leading terms divide alpha 1 or alpha 2 or alpha 3 and then we have the same polynomials as in the patron case uh then there's another term which is then the sum over all pairs of integers or all distinct pairs of integers in ascending order between 1 and 3. um and the sum is then very similar to this second term in p2 so we sum over the divisors whose leading terms divide the uh alpha t where alpha t is uh comes from the larger integer there and we then have a pair of averaging factors associated to each of those integers we take the coefficient of f related to the the smaller integer and then we have the tau term um coming from the pair of indices and then so p1 we subtracted one sum from f p2 we added and subtracted two different sums and so for p3 we then actually have a third sum which um is the sum over all uh divisors whose waiting terms divide x to the alpha three and this sum each term has a triple of um averaging factors associated to each of the indices alpha one alpha two and alpha three we take the coefficient of f associated with these molar the first of those indices and then we have two pairs of these tau terms associated to each adjacent pair of indices and then the usual polynomial at the end okay so from these first three um values of p1 p2 and p3 pattern seems to

be forming um so so in p3 the first sum involves all single indices between alpha 1 and alpha 3. the second sum involves all uh distinct pairs of indices um like alpha s and alpha t and the third sum involves all uh triples of distinct indices when placed in descending order so we can try and gather from those three examples a general pattern um and so we propose that the expression for p after i steps is f and we subtract from it the sum of over all single into indices between one and i and then we have this usual sum over polynomial so we've seen quite a lot now do and then we add up over uh what we can call all chains of multi-indices of lengths m equals 2 up to length i so i guess what i should say is a chain of indices is a collection of multi-indices in descending order according to graded lexicographic order and where the first intercept beta 1 is smaller than alpha 1 which is the leading term of f and the final indices is larger oh greater than or equal to alpha i um and then we sum over each divisor whose leading term divides the smallest indice in the chain um and then there's a as we saw in e1 p2 and p3 there's a alternating sign based on the length of the chain we include the averaging factor for each indicy in the chain and then the tau term for each pair of adjacent indices in the chain and then we also have the coefficient of the first indices in the chain and then the polynomial involving the final indices in the chain of final multi-index i should say so this composition can be proved just by um induction and so it's a similar process of like what we did for finding p1 p2 and p3 um and so i'll just go to the next board to finish off the talk uh and so i'll sort of talk uh discuss the whole point of finding these this general expression for p i [Laughter] so if i is a polynomial ideal um and g0 up to gs is a group in a basis five then dividing say an element of i by this square the basis using the generic division algorithm [Laughter] so that outputs questions which satisfy an expression a0 g0 up to a s gs where now we specifically have expressions for each of those equations aj so aj is the sum over all multi-indices such that the bleeding term of gj actually i'll write that as um such that j is in d alpha so in other words the waiting term of g j divides x to the alpha one one d alpha is the averaging term the coefficient of x the alpha in f x to the alpha divided by the leading term of gj and then we have a sum over all chains of length two to infinity i'll discuss