From what little have read so far, it seems that the issue of texture filtering is essentially a matter of Moire patterns arising from zero-fillfactor sampler when the ground-truth should be an integration over the projected area. (Moire pattern occur when the fillfactor of the kernel is less that unity, the generalized Sampling Limit is beholden to the spatial frequency of the "gaps" rather than the actual sample interval)
Since a square pixel generally projects a trapizoid onto the texture surface, it takes only four points to determine the list of texels being integrated within that trapizoid (Think Starcraft 2's minimap frame), in which case we could simply reverse-rasterize it into a "texel fragment" that is the union between the trapizoid-footprint and polygon boundary, and just average up the values.
Why then do anisotropic filtering need to send out multiple probes, when four would suffice and the resulting fragment is contiguous data anyways? It seems like all these extra complexity arise simply working with the texel-aligned grid rather than the pixel-projected coordinates?
But when rendering an object far away, the pixel trapezoid will cover many pixels in the texture map. If you just picked the 4 corners, you'd get shimmering moire patterns. If you tried to average across all the texels, it would probably take longer than just random sampling.
You could probably implement this in a GLSL shader. The math to calculate overlaps between trapezoids and a grid is conceptually simple, though it'll be tedious to write out all the cases.