Care to share any of the favorite recent results?
Are there big fields still making gains behind all this computing surge?
Papers and references (page maintained by central academic in the world of CRDTs): https://crdt.tech
Group doing research into how they can be used to build interesting collaborative (and async) applications: https://www.inkandswitch.com
A few of the major open source implementations - mostly for rich text editing or JSON like data structures:
- Yjs: https://github.com/yjs/yjs
- Automerge: https://github.com/automerge/automerge
- Peritext: https://www.inkandswitch.com/peritext/
- Dimond types: https://github.com/josephg/diamond-types
People building eventually consistent database syncing with them:
- https://electric-sql.com (Postgres <-> SQLite)
- https://vlcn.io (SQLite <-> SQLite)
Open source colaborative servers (coordination, persistance, presence):
- https://github.com/ueberdosis/hocuspocus
0: https://egraphs-good.github.io/
1: https://github.com/cfallin/rfcs/blob/main/accepted/cranelift...
Basically given a set of million (billion, trillion...) ~1000 valued vectors, and a query ~1000 valued vector, find the closest vector in the indexed set. This is "nearest neighbor" search and there have been increasingly more and more sophisticated approaches:
https://haystackconf.com/us2022/talk-6/
And has spawned companies like Weaviate, Pinecone, etc etc (a half a dozen others)
How Branch Mispredictions don't affect Quicksort (2016) https://arxiv.org/abs/1604.06697
Quicksort is still the algorithm of choice for general purpose internal unstable sort. In modern hardwares, quicksort spends a lot of time recovering from branch misprediction, because quicksort's comparison branches are inherently unpredictable. The problem is severe to the extent that with branchful quicksort, uneven split with more recursion is faster than even split, because it makes branches more predictable! Exact penalty is hardware microarchitecture specific, but one report is 20:80 split being optimal.
So... you should use branchless quicksort! The technique was pioneered by an implementation called BlockQuicksort, but pdqsort (for pattern-defeating quicksort) https://arxiv.org/abs/2106.05123 is also popular. pdqsort incorporates branchless technique and also includes other enhancements.
Designing a Fast, Efficient, Cache-friendly Hash Table, Step by Step (2017) https://abseil.io/blog/20180927-swisstables
Hash table is the workhorse of data structures, and while improvements are possible for specific usage patterns, open addressing with linear probing is still the fastest implementation for general purpose. So called "Swiss Table" combines cache-friendly metadata layout with SIMD-enhanced probing. These two optimizations work well with each other and combination of two is particularly effective.
https://web.stanford.edu/class/cs168/index.html
I didn't have time to go through all of this, but the parts that I did read were very interesting to me.
It's not even cutting edge new stuff of 202x, probably most of them are new-ish results from ~2000 onwards.
- Eventual consistency [2]
- Refinement types [3], [4]
[1] https://www2.cs.uh.edu/~ceick/6340/Streams.pdf
[2] https://www.microsoft.com/en-us/research/wp-content/uploads/...
[3] https://ranjitjhala.github.io/static/trust_but_verify.pdf
[4] https://ranjitjhala.github.io/static/refinement_types_for_ty...
What I mean is: in the comparison model, assume start with an empty heap, a sequence of instructions like "insert element x" and "remove minimum element" of length n.
There's an algorithm that can tell you in O(n) time, which items are still left in the heap at the end (and thus also which items have been removed at some point in time).
The algorithm can't tell you exactly _when_ items have been removed, because that would be equivalent to sorting, and you need at least O(n log n) to sort in the comparison model.
I've worked on this problem on and off for about a decade now as a little hobby project. I would have been happy to prove it impossible or to find a solution. Interestingly, it is possible and the solution is relatively simple, too.
(I have no clue whether this is useful for anything. In practice you probably just run the obvious O(n log n) algorithm which probably has much better constant factors. I'm doing this purely for the theory.)
The data structure / algorithm I found is expressible in a purely functional setting, and (spoiler alert) builds on top of Chazelle's Soft Heap.
- better zero knowledge proofs/arguments
- better consensus protocols (my expertise!)
- multiparty computation is nearing practicality, as is fully homomorphic encryption
Also, lots of fascinating theoretical progress on primitives such as indistinguishability obfuscation, post-quantum schemes, quantum crypto, etc.
Obligatory Quanta link: https://www.quantamagazine.org/finally-a-fast-algorithm-for-...
https://en.m.wikipedia.org/wiki/Combinatorial_optimization
Algos to solve those both exact methods and heuristics are super useful in real world applications, and very underappreciated by the HN crowd IMO.
Parsing with errors that are useful for the human is definitely an open research topic.
Capabilities have seen interesting work, in particular in relationships to effect handlers.
Hyperloglog and other datasketch keep seeing incremental progress. I have not had time to read it all but Omar Ertl has a massive reduction in memory use in UltraLogLog that i expect a paper on soon. The code is already up.
[1] https://github.com/terminusdb/terminusdb/blob/dev/docs/white...
It is a cacheline sized 'container' (CLC) of machine-word length records, with one record used to store the record order and remaining bits for metadata. So you can project different kinds of container semantics, such as FIFO or LRU -- any ordered set semantics -- on this meta record. Using arrays of CLCs you can create e.g. a segmented LRU, where the overhead per item is substantially less than a conventional pointer-based datastructure, and, is naturally suited for concurrent operations (for example by assigning a range to distinct worker threads), and ops require a few or couple of lines to be touched. The LRU (or whatever) semantics in aggregate will be probabilistic, as the LRU order is deterministic per unit container only. It is very fast :)
https://github.com/alphazero/libclc/blob/master/include/libc...
https://github.com/alphazero/libclc/blob/master/include/libc...
As for the non-deterministic aspects: Since container semantics e.g. LRU order is only applied at unit level, the overall cache is ~LRU. We can strictly quantify the 'ordering error' by observing the age of items in FIFO mode as they are removed: for a deterministic container the age of the item is equal to the total capacity of the queue, for a segmented (array) composed of FIFO queues, the age will have a effectively gaussian distribution around the capacity (number of units x unit capacity). But since these containers can use as few as 9 bits per entry vs 24 or more bytes for pointer based solutions (which use linked-lists), for the same allocation of memory, the capacity of the array of CLCs will be much greater, so, the distribution tail of 'short-lived' items will actually be longer lived than items in a strict queue for the same exact memory. Additional techniques, such as n-array hashing, and low order 'clock' bits at container level, can tighten this distribution significantly (i.e. ~LRU -> LRU) via better loading factors.
https://docs.dolthub.com/architecture/storage-engine/prolly-...
In a nutshell: B+ trees that are stable w.r.t. insertion order. Great if you want to test them for equivalence/deduplicate them efficiently.
Implemented in Rust: https://crates.io/crates/differential-dataflow
[1] https://www.siam.org/Portals/0/Conferences/About_SIAM_Confer...
2022 Paper https://arxiv.org/abs/2203.00671
For example, the constraints may be:
1) The graph is read left to right. This informs the layout such that those spiral and circular graph layout algorithms that blow out the graph do not work.
2) Try to keep certain types of nodes vertical aligned, so that they lie on the same horizontal line.
3) Minimize edge crosses.
4) Have the edges be not lines but paths made up of horizontal and vertical lines. This probably requires some path finding as well.
5) Be able to live update in the sense that a new node or edge entry into the graph does not drastically re-layout the graph for reasonable inserts.
https://egraphs-good.github.io/
(Incidentally, union-find structures are also great to know about. But they're not exactly "new".)
Hyperdimensional computing combines LLM-style AI with Lisp-style AI, making it possible to code within a latent space. Pretty weird, but cool.
Electric Clojure is pretty groundbreaking, possibly similar to React's impact once it gets ported to a more popular platform.
I think there are real opportunties going forward to build dedicated compilers and the corresponding runtimes for distributed computing and related algorithms and data structures. Really anything that is extremely error prone, difficult, or complex would benefit from building a runtime and compiler that simply solves the problem. MLIR has good infrastructure for that.
Interprocedural data flow analysis is now scaling well in the LLVM/MLIR context: DFI: An Interprocedural Value-Flow Analysis Framework that Scales to Large Codebases
> We introduce algorithms that use predictions from machine learning applied to the input to circumvent worst-case analysis. We aim for algorithms that have near optimal performance when these predictions are good, but recover the prediction-less worst case behavior when the predictions have large errors.
An example is using learning to decide which variable to discretize next in a branch-and-bound integer linear programming solver. The algorithm might be extremely similar to the classic version, but the analysis is new.
https://arxiv.org/abs/2006.09123
More broadly, I think there's a lot of work on applying more refined analysis to classic algorithms - online settings, smoothed analysis, etc.
https://probablydance.com/2023/04/27/beautiful-branchless-bi...
But I had to learn them and employ them alot lately, very good for resource constrained environments
Just as an example, when you look at land parcels, you often get these irritating slivers (or splinters, even more apt because they are a pain and you have to dig them out) of land. You and I can look at a plot and point one out, but to a computer ...
Well, one of the hallmarks of a splinter is that there's an awful lot of fencing but very little land. Or put another way, a lot of perimeter to area enclosed. Simple division doesn't suffice here, because the end result ought to be a dimensionless number so that scale doesn't interfere. Alright, perimeter squared divided by area. The lowest you can get is a little over twelve, for a circle. A square gets you sixteen. But a splinter might give you a value well over a hundred.
Now, this sounds nice. You've got some values that are no gos (anything under twelve indicates an error), a reasonable bottom limit, a range where a human ought to take a look at it, and then a third range where you just say, "Yeah, that's a splinter alright, no need to bother a human." Sounds good. But then ... Wyoming.
Scale does end up mattering. A parcel that was two feet across and a hundred long has the same "sliver value" as one that is two hundred feet across and ten thousand feet long. Yet the latter is still quite useful. You can run a road through it and tack on some strip malls. So it isn't useless. Okay, so you say "that's great, we'll just set a lower bound on the shortest side, anything bigger than that and you'll have to reconsider."
... but we've assumed that we have handy little rectangles and that we know what the sides are. It's GIS. We have a bag of points tracing the outline of a polygon. It isn't necessarily just four points. Each side might have a bunch of points that are roughly co-linear. You and I can look at it and say "Yeah, that's a rectangle alright," but the computer cannot. And so here I am trying to develop an algorithm to determine how many sides a human would see versus just this list of points. All so I can better define my splinters.
Problems like this abound and they're quite interesting to think about. It's one of my favorite parts of programming and it requires some deep introspection, some meta-cognition: how do I recognize this, anyway? What is going on in my mind that I can do this?
My opinion on things is that cache-oblivious programming is dead end and the constant factors by which cache-aware algorithms win are significant and worth taking.
I'm currently re-studying them just bc I'm interview prepping. I wonder if whiteboarding for DS/Algo is gonna be a thing of the past too though.
I would imagine there's certainly new DS/Algos in progress to aide ML training efficiency