Gaussian integration is cool

https://rohangautam.github.io/blog/chebyshev_gauss/

202•beansbeansbeans•8mo ago

Comments

constantcrying•8mo ago

A good introduction to the basics.

What is also worth pointing out and which was somewhat glanced over is the close connection between the weight function and the polynomials. For different weight functions you get different classes of orthogonal polynomials. Orthogonal has to be understood in relation to the scalar product given by integrating with respect to the weight function as well.

Interestingly Gauss-Hermite integrates on the entire real line, so from -infinity to infinity. So the choice of weight function also influences the choice of integration domain.

creata•8mo ago

Sorry if this is a stupid question, but is there a direct link between the choice of weight function and the applications of the polynomial?

Like, is it possible to infer that Chebyshev polynomials would be useful in approximation theory using only the fact that they're orthogonal wrt the Wigner semicircle (U_n) or arcsine (T_n) distribution?

constantcrying•8mo ago

Yes. Precisely that they are orthogonal means that they are suitable.

If you are familiar with the Fourier series, the same principle can be applied to approximating with polynomials.

In both cases the crucial point is that you can form an orthogonal subspace, onto which you can project the function to be approximated.

For polynomials it is this: https://en.m.wikipedia.org/wiki/Polynomial_chaos

sfpotter•8mo ago

What you're saying isn't wrong, per se, but...

There are polynomials that aren't orthogonal that are suitable for numerics: both the Bernstein basis and the monomial basis are used very often and neither are orthogonal. (Well, you could pick a weight function that makes them orthogonal, but...!)

The fact of their orthogonality is crucial, but when you work with Chebyshev polynomials, it is very unlikely you are doing an orthogonal (L2) projection! Instead, you would normally use Chebyshev interpolation: 1) interpolate at either the Type-I or Type-II Chebyshev nodes, 2) use the DCT to compute the Chebyshev series coefficients. The fact that you can do this is related to the weight function, but it isn't an L2 procedure. Like I mentioned in my other post, the Chebyshev weight function is maybe more of an artifact of the Chebyshev polynomials' intimate relation to the Fourier series.

I am also not totally sure what polynomial chaos has to do with any of this. PC is a term of art in uncertainty quantification, and this is all just basic numerical analysis. If you have a series in orthgonal polynomials, if you want to call it something fancy, you might call it a Fourier series, but usually there is no fancy term...

constantcrying•8mo ago

I think you are very confused. My post is about approximation. Obviously other areas use other polynomials or the same polynomials for other reasons.

In this case it is about the principle of approximation by orthogonal projection, which is quite common in different fields of mathematics. Here you create an approximation of a target by projecting it onto an orthogonal subspace. This is what the Fourier series is about, an orthogonal projection. Choosing e.g. the Chebychev Polynomials instead of the complex exponential gives you an Approximation onto the orthogonal space of e.g. Chebychev polynomials.

The same principle applies e.g. when you are computing an SVD for a low rank approximation. That is another case of orthogonal projection.

>Instead, you would normally use Chebyshev interpolation

What you do not understand is that this is the same thing. The distinction you describe does not exist, these are the same things, just different perspectives. That they are the same easily follows from the uniqueness of polynomials, which are fully determined by their interpolation points. These aren't distinct ideas, there is a greater principle behind them and that you are using some other algorithm to compute the Approximation does not matter at all.

>I am also not totally sure what polynomial chaos has to do with any of this.

It is the exact same thing. Projection onto an orthogonal subspace of polynomials. Just that you choose the polynomials with regard to a random variable. So you get an approximation with good statistical properties.

sfpotter•8mo ago

No, I am not confused. :-) I am just trying to help you understand some basics of numerical analysis.

> What you do not understand is that this is the same thing.

It is not the same thing.

You can express an analytic function f(x) in a convergent (on [-1, 1]) Chebyshev series: f(x) = \sum_{n=0}^\infty a_n T_n(x). You can then truncate it keeping N+1 terms, giving a degree N polynomial. Call it f_N.

Alternatively, you can interpolate f at at N+1 Chebyshev nodes and use a DCT to compute the corresponding Chebyshev series coefficients. Call the resulting polynomial p_N.

In general, f_N and p_N are not the same polynomial.

Furthermore, computing the coefficients of f_N is much more expensive than computing the coefficients of p_N. For f_N, you need to evaluate N+1 integral which may be quite expensive indeed if you want to get digits. For p_N, you simply evaluate f at N+1 nodes, compute a DCT in O(N log N) time, and the result is the coefficients of p_N up to rounding error.

In practice, people do not compute the coefficients of f_N, they compute the coefficients of p_N. Nevertheless, f_N and p_N are essentially as good as each other when it comes to approximation.

constantcrying•8mo ago

At this point I really hate to ask. Do you know what "orthogonal subspace" means and what a projection is?

sfpotter•8mo ago

Ah, shucks. I can see you're getting upset and defensive. Sorry... Yes, it should be clear from everything I've written that I'm quite clear on the definition of these.

If you would like to read what I'm saying but from a more authoritative reference that you feel you can trust, you can just take a look at Trefethen's "Approximation Theory and Approximation Practice". I'm just quoting contents of Chapter 4 at you.

Again, like I said in my first response to you, what you're saying isn't wrong, it just misses the mark a bit. If you want to compute the L2 projection of a function onto the orthogonal subspace of degree N Chebyshev polynomials, you would need to evaluate a rather expensive integral to compute the coefficients. It's expensive because it requires the use of adaptive integration... many function evaluations per coefficient! Bad!

On the other hand, you could just do polynomial interpolation using either of the degree N Chebyshev nodes (Type-I or Type-II). This requires only N+1 functions evaluations. Only one function evaluation per coefficient. Good!

And, again, since the the polynomial so constructed is not the same polynomial as the one obtained via L2 projection mentioned in paragraph 3 above, this interpolation procedure cannot be regarded as a projection! I guess you could call it an "approximate projection". It agrees quite closely with the L2 projection, and has essentially the same approximation power. This is why Chebyshev polynomials are so useful in practice for approximation, and why e.g. Legendre polynomials are much less useful (they do not have a convenient fast transform).

Anyway, I hope this helps! It's a beautiful subject and a lot of fun to work on.

sfpotter•8mo ago

Chebyshev polynomials are useful in approximation theory because they're the minimax polynomials. The remainder of polynomial interpolation can be given in terms of the nodal polynomial, which is the polynomial with the interpolation nodes as zeros. Minimizing the maximum error then leads to the Chebyshev polynomials. This is a basic fact in numerical analysis that has tons of derivations online and in books.

The weight function shows the Chebyshev polynomials' relation to the Fourier series . But they are not what you would usually think of as a good candidate for L2 approximation on the interval. Normally you'd use Legendre polynomials, since they have w = 1, but they are a much less convenient basis than Chebyshev for numerics.

creata•8mo ago

True, and there are plenty of other reasons Chebyshev polynomials are convenient, too.

But I guess what I was asking was: is there some kind of abstract argument why the semicircle distribution would be appropriate in this context?

For example, you have abstract arguments like the central limit theorem that explain (in some loose sense) why the normal distribution is everywhere.

I guess the semicircle might more-or-less be the only way to get something where interpolation uses the DFT (by projecting points evenly spaced on the complex unit circle onto [-1, 1]), but I dunno, that motivation feels too many steps removed.

sfpotter•8mo ago

If there is, I'm not aware of it. Orthogonal polynomials come up in random matrix theory. Maybe there's something there?

But your last paragraph is exactly it... it is a "basic" fact but the consequences are profound.

efavdb•8mo ago

Could you guys recommend an easy book on this topic?

sfpotter•8mo ago

Which topic?

efavdb•8mo ago

Some sort of numerical analysis book that covers these topics - minimax approx, quadrature etc. I’ve read on these separately but am curious what other sorts of things would be covered in courses including that.

4ce0b361•8mo ago

I'd suggest: Trefethen, Lloyd N., Approximation theory and approximation practice (Extended edition), SIAM, Philadelphia, PA (2020), ISBN 978-1-611975-93-2.

sfpotter•8mo ago

I would check out "An Introduction to Numerical Analysis" by Suli and Mayers or "Approximation Theory and Approximation Practice" by Trefethen. The former covers all the major intro numerical analysis topics in a format that is suitable for someone with ~undergrad math or engineering backgrounds. The latter goes deep into Chebyshev approximation (and some related topics). It is also very accessible but is much more specialized.

seanhunter•8mo ago

I thought when I first saw the title that it was going to be about the Gaussian integral[1] which has to be one of the coolest results in all of maths.

That is, the integral from - to + infinity of e^(-x^2) dx = sqrt(pi).

I remember being given this as an exercise and just being totally shocked by how beautiful it was as a result (when I eventually managed to work out how to evaluate it).

[1] https://mathworld.wolfram.com/GaussianIntegral.html

niklasbuschmann•8mo ago

Gaussian integrals are also pretty much the basis of quantum field theory in the path integral formalism, where Isserlis's theorem is the analog to Wick's theorem in the operator formalism.

srean•8mo ago

Indeed.

It's the gateway drug to Laplace's method (Laplace approximation), mean field theory, perturbation theory, ... QFT.

https://en.m.wikipedia.org/wiki/Laplace%27s_method

chermi•8mo ago

Maybe I should just read the wiki you linked, but I guess I'm confused on how this is different than steepest descent? I'm a physicist by training so maybe we just call it something different?

dawnofdusk•8mo ago

They are the same for physicists who analytically continue everything. Steepest descent is technically for integrals in the complex plane, and Laplace's method is for integrating over the real line.

srean•8mo ago

It is Laplace's method of steepest decent. When the same method is used to approximate probability density function, for example the posterior probability density, it's called Laplace's approximation of the density.

The wikipedia link would have made things quite clear :)

constantcrying•8mo ago

There is a relationship here, in the case of Gauß-Hermite Integration, where the weight function is exactly e^(-x^2) the weights have to add up sqrt(pi), because the integral is exact for the constant 1 polynomial.

extrabajs•8mo ago

What is Fig. 1 showing? Is it the value of the integral compared with two approximations? Would it not be more interesting to show the error of the approximations instead? Asking for a friend who isn’t computing a lot of integrals.

mturmon•8mo ago

Fig 1 could use a rethink. It uses log scale, but the dynamic range of the y-axis is tiny, so the log transform isn't doing anything.

It would be better shown as a table with 3 numbers. Or, maybe two columns, one for integral value and one for error, as you suggest.

rd11235•8mo ago

Yeah - my guess is this was just a very roundabout solution for setting axis limits.

(For some reason, plt.bar was used instead of plt.plot, so the y axis would start at 0 by default, making all results look the same. But when the log scale is applied, the lower y limit becomes the data’s minimum. So, because the dynamic range is so low, the end result is visually identical to having just set y limits using the original linear scale).

Anyhow for anyone interested the values for those 3 points are 2.0000 (exact), 1.9671 (trapezoid), and 1.9998 (gaussian). The relatives errors are 1.6% vs. 0.01%.

tim-kt•8mo ago

No, the integral cannot be "expressed" as a sum over weights and function evaluations (with a "="), it can be approximated with this idea. If you fix any n+1 nodes, interpolate your function, and integrate your polynomial, you will get this sum where the weights are integrals over (Lagrange) basis polynomials. By construction, you can compute the integral of polynomials up to degree n exactly. Now, if you choose the nodes in a particular way (namely, as the zeros of some polynomials), you can increase this to up to 2n+1. What I'm getting at is that the Gaussian integration is not estimating the integrals of polynomials of degree 2n+1, but it's evaluating them exactly.

Onavo•8mo ago

Convolution?

tim-kt•8mo ago

What about it?

beansbeansbeans•8mo ago

You're totally correct, my writeup in that section definitely was not clear. I've updated the blog, hopefully it's better. I've also given you a shoutout in the end of the post in my edit log, if that's cool with you

tim-kt•8mo ago

Sure, thanks. I also saw this sentence, which was not there in the last version (I think):

> That is to say, with n nodes, gaussian integration will approximate your function's integral with a higher order polynomial than a basic technique would - resulting in more accuracy.

This is not really the case, Gaussian integration is still just interpolation on n nodes, but the way of choosing the nodes increases the integration's exactness degree to 2n-1. It's actually more interesting that Gaussian integration does not require any more work in terms of interpolation, but we just choose our nodes better. (Actually, computing the nodes is sometimes more work, but we can do that once and use them forever.)

adrian_b•8mo ago

You are right and almost all methods of numerical integration (in any case all those that are useful and I am aware of) are equivalent to approximating the target function with another simpler function, which is then integrated using an exact formula.

So the estimation error is introduced at the step where a function is approximated with another function, which is usually chosen as either a polynomial or a polynomial spline (composed of straight line segments for the simplest trapezoidal integration), not at the actual integration.

Fortunately, for well-behaved functions, when they are approximated by a suitable simpler function, the errors that this approximation introduces in the values of function integrals are smaller than the errors in the interpolated values of the function, which are in turn smaller than the errors in the values estimated at some point for the derivative of the function (using the same approximating simpler function).

tim-kt•8mo ago

> [...] almost all methods of numerical integration (in any case all those that are useful and I am aware of) are equivalent to approximating the target function with another simpler function, which is then integrated using an exact formula.

The key property of quadrature formulas (i.e. numerical integration formulas) is the degree of exactness, which just says up to which degree we can integrate polynomials exactly. The (convergence of the) error of the quadrature depends on this exactness degree.

If you approximate the integral using a sum of n+1 weights and function evaluations, then any quadrature that has exactness degree n or better is in fact an interpolatory quadrature, that is, it is equivalent to interpolating your function on the n+1 nodes and integrating the polynomial. You can check this by (exactly) integrating the Lagrange basis polynomials, through which you can express the interpolation polynomial.

sfpotter•8mo ago

Something interesting here is that although Clenshaw-Curtis rules with N nodes only exactly integrate degree N-1 polynomials while Gauss rules integrate degree 2N-1, they are frequently competitive with Gauss rules when integrating general analytic functions. So while the space of polynomials being integrated is crucial, there are nevertheless many possible interpolatory quadrature rules and how they behave is not completely straightforward.

drmikeando•8mo ago

My mind was exploded by this somewhat similar technique https://en.wikipedia.org/wiki/Tanh-sinh_quadrature - it uses a similar transformation of domain, but uses some properties of optimal quadrature for infinite sequences in the complex plane to produce quickly convergent integrals for many simple and pathological cases.

actinium226•8mo ago

While I love seeing mathy articles on HN, the lead image in this article is terrible.

At first glance I see we're talking about numerical integration so I assuming the red part is this method that is being discussed and that it's much better than the other two. Then I look at the axis, which the caption notes is log scale, and see that it goes from 1.995 to 2. Uh-oh, is someone trying to inflate performance by cutting off the origin? Big no-no, but then wait a minute, this is ground truth compared to two approximations. So actually the one on the right is better. But the middle one is still accurate to within 0.25%. And why is it log scale?

Anyway point is, there's lots of room for improvement!

In particular it's probably not worth putting ground truth as a separate bar, just plot the error of the two methods, then you don't need to cut off the original. And ditch the log scale.

beansbeansbeans•8mo ago

Yeah - I totally agree. Some others on the thread pointed it out as well. I ended up replacing it with a table additionally having the error percentages. Thanks for the fair critique! I've also mentioned your username in the edit logs at the end of the blog if you don't mind.

algorithmsRcool•8mo ago

Did anyone else's antivirus complain about an exploit on this page?

---EDIT---

I'm about 98% sure this blog has a browser hijack embedded in it targeted at windows+MSEDGE browsers that attempted to launch a malicious powershell script to covertly screen record the target machine

beansbeansbeans•8mo ago

That's a major claim. The only thing different in this blog post from my others is that I've embedded an executable python notebook in an iframe. It's a marimo notebook that runs code using WASM in your browser. That project is open source too, with no exploit as far as I know.

The code for my blog is here : https://github.com/RohanGautam/rohangautam.github.io

If you could point to anything specific to support that claim, would be nice.

algorithmsRcool•8mo ago

I would be happy to be wrong on this one. But I've gotten two pretty convincing threat notifications when visiting the page from the Sentinel One antivirus platform saying that my msedge process had been compromised by a known exploit.

I'll try to get more details.

I should note, I do not believe the site is malicious, but i am worried about 3rd party compromise of the site without the owner's knowledge

beansbeansbeans•8mo ago

I see, that's strange. Yeah, feel free to share the details/logs with me - you can open an issue on my blog's repo with the relevant details and system info.

However, I'm still suspecting it's something specific to your antivirus not knowing what to do with WASM code(which is used on this page). I found something similar on Reddit: https://www.reddit.com/r/eaglercraft/s/heVtPy60lG. I wonder if that's the issue.

BLanen•8mo ago

You should not be using antivirus browser plugins anyway.

algorithmsRcool•8mo ago

This was not from a browser plugin, this was from my system antivirus

Malipeddi•8mo ago

This is a neat and accessible introduction to the topic. Nice work! I want to mention nesting quadratures like Gauss-Kronod like Clenshaw-Curtis/Fejér. The nesting means that the quadrature points for lower order are included in the points required for higher orders. This allows you to increase your order of accuracy if required with fewer total evaluations.

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

Speed up responses with fast mode

Software factories and the agentic moment

Homeland Security Spying on Reddit Users

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Hoot: Scheme on WebAssembly

LLMs as the new high level language

Stories from 25 Years of Software Development

Total Surface Area Required to Fuel the World with Solar (2009)

First Proof

Vocal Guide – belt sing without killing yourself

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Al Lowe on model trains, funny deaths and working with Disney

FDA intends to take action against non-FDA-approved GLP-1 drugs

Vouch

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: Axiomeer – An open marketplace for AI agents

Start all of your commands with a comma (2009)

The AI boom is causing shortages everywhere else

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

I write games in C (yes, C) (2016)

Learning from context is harder than we thought

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Selection rather than prediction

Where did all the starships go?

The F Word

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

Speed up responses with fast mode

Software factories and the agentic moment

Homeland Security Spying on Reddit Users

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Hoot: Scheme on WebAssembly

LLMs as the new high level language

Stories from 25 Years of Software Development

Total Surface Area Required to Fuel the World with Solar (2009)

First Proof

Vocal Guide – belt sing without killing yourself

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Al Lowe on model trains, funny deaths and working with Disney

FDA intends to take action against non-FDA-approved GLP-1 drugs

Vouch

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: Axiomeer – An open marketplace for AI agents

Start all of your commands with a comma (2009)

The AI boom is causing shortages everywhere else

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

I write games in C (yes, C) (2016)

Learning from context is harder than we thought

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Selection rather than prediction

Where did all the starships go?

The F Word

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Gaussian integration is cool

Comments