"Estimating the number of unseen species: A bird in the hand is worth log(n) in the bush" https://arxiv.org/abs/1511.07428 https://www.pnas.org/content/113/47/13283
It deals with the classic, and wonderful, question of "If I go and catch 100 birds, and they're from 20 different species, how many species are left uncaught?" There's more one can say about that than it might first appear and it has plenty of applications. But mostly I just love the name. Apparently PNAS had them change it for the final publication, sadly.
https://www.cambridge.org/core/services/aop-cambridge-core/c...
I've always hated build systems. Stuff cobbled together that barely works, yet a necessary step towards working software. This paper showed me there's hope. If we take build systems seriously, we can come up with something much better than most systems out there.
Non-invasive early detection of cancer four years before conventional diagnosis using a blood test
https://www.nature.com/articles/s41467-020-17316-z
Major breakthrough in cell-free diagnostics. The methylation pattern of DNA can be used to identify early-stage cancer, i.e. circulating tumor DNA (ctDNA) has a distinct methylation pattern.
The results are based on data from a ten year study which must have cost a fortune to run.
1.Attention is not explanation (https://arxiv.org/abs/1902.10186)
2.Attention is not not Explanation (https://arxiv.org/abs/1908.04626)
Goes to show the complete lack of agreement between researchers in the explainability space. Most popular packages (allen NLP, google LIT, Captum) use saliency based methods (Integrated gradients) or Attention. The community has fundamental disagreements on whether they capture anything equivalent to importance as humans would understand it.
An entire community of fairness, ethics and Computational social science is built on top of conclusions using these methods. It is a shame that so much money is poured into these fields, but there does not seem to be as strong a thrust to explore the most fundamental questions themselves.
(my 2 cents: I like SHAP and the stuff coming out of Bin Yu and Tommi Jakkola's labs better..but my opinion too is based in intuition without any real rigor)
There are lots of good bits, such as: 'On the practical side, not only is there no age at which humans are performing at peak on all cognitive tasks, there may not be an age at which humans perform at peak on most cognitive tasks. Studies that compare the young or elderly to “normal adults” must carefully select the “normal” population.' (italics in original)
This seems to me to comport with the research suggesting that most or all of the variance in IQ across the life span can be accounted for by controlling for mental processing speed; i.e., you are generally faster when you are younger, but you are not more correct when you are younger.
The idea that you can achieve the same practical effect of a 3x replication factor in a distributed system, but only increasing the cost of data storage by 1.6x, by leveraging some clever information theory tricks is mind bending to me.
If you're operating a large Ceph cluster, or you're Google/Amazon/Microsoft and you're running GCS/S3/ABS, if you needed 50PB HDDs before, you only need 27PB now (if implementing this).
The cost savings, and environmental impact reduction that this allows for are truly enormous, I'm surprised how little attention this paper has gotten in the wild.
[0] https://www.microsoft.com/en-us/research/wp-content/uploads/...
And the earlier paper “A Polymorphic Type System for Extensible Records and Variants” https://web.cecs.pdx.edu/~mpj/pubs/96-3.pdf
Row types are magically good: they serve either records or variants (aka sum types aka enums) equally well and both polymorphically. They’re duals. Here’s a diagram.
Construction Inspection
Records {x:1} : {x:Int} r.x — r : {x:Int|r}
[closed] [open; note the row variable r]
Variants ‘Just 1 : case v of ‘Just 0 -> ...
[open; note the row var v] v :
[closed]
Neither have to be declared ahead of time, making them a perfect fit in the balance between play and serious work on my programming language.
https://arxiv.org/abs/1706.03762
It's from 2017 but I first read it this year. This is the paper that defined the "transformer" architecture for deep neural nets. Over the past few years, transformers have become a more and more common architecture, most notably with GPT-3 but also in other domains besides text generation. The fundamental principle behind the transformer is that it can detect patterns among an O(n) input size without requiring an O(n^2) size neural net.
If you are interested in GPT-3 and want to read something beyond the GPT-3 paper itself, I think this is the best paper to read to get an understanding of this transformer architecture.
1) Michael, C. J., Acklin, D., & Scheuerman, J. (2020). On interactive machine learning and the potential of cognitive feedback. ArXiv:2003.10365 [Cs]. http://arxiv.org/abs/2003.10365
2) Denton, E., Hanna, A., Amironesei, R., Smart, A., Nicole, H., & Scheuerman, M. K. (2020). Bringing the people back in: Contesting benchmark machine learning datasets. ArXiv:2007.07399 [Cs]. http://arxiv.org/abs/2007.07399
3) Jo, E. S., & Gebru, T. (2020). Lessons from archives: Strategies for collecting sociocultural data in machine learning. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 306–316. https://doi.org/10.1145/3351095.3372829
Also a great read related to IML tooling for audio recognition:
1) Ishibashi, T., Nakao, Y., & Sugano, Y. (2020). Investigating audio data visualization for interactive sound recognition. Proceedings of the 25th International Conference on Intelligent User Interfaces, 67–77. https://doi.org/10.1145/3377325.3377483
http://www.pnas.org/lookup/doi/10.1073/pnas.1915006117
You might think that it's possible to use machine learning to predict whether people will be successful using established socio-demographic, psychological, and educational metrics. It turns out that it's very hard and simple regression models outperform the fanciest machine learning ideas for this problem.
The way this study was done is also interesting and paves the way for new kinds of collaborative scientific projects that take on big questions. It draws on communities like Kaggle, but applies it to scientific questions not just pure prediction problems.
Fellow HNer seems to have liked a lot of ML paper, this is not breaking the trend. This is a great meta paper questioning the goal of the field itself, and proposing ways to formally evaluate intelligence in a computational sense. Chollet is even ambitious enough to propose a proof of concept benchmark! [2] I also like some out of the box methods people tried to get closer to a solution, like this one combining cellular automata and ML [3]
[1] https://arxiv.org/abs/1911.01547 [2] https://github.com/fchollet/ARC [3] https://www.kaggle.com/arsenynerinovsky/cellular-automata-as...
A good incremental improvement in service level indicator measurements for large-scale cloud services.
Obligatory The Morning Paper post: https://blog.acolyer.org/2020/02/26/meaningful-availability/
In computing theory, when do you actually need coordination to get consistency? They partition the space into two kinds of algorithm, and show that only one kinds needs coordination.
CACM, 9/2020. https://cacm.acm.org/magazines/2020/9/246941-keeping-calm/fu...
Some pretty mind blowing insights - ex: if you replace one layer's weights in a trained classification network with the initialisation weights for the layer (or some intermediate checkpoint as well), many networks show relatively unaffected performance for certain layers ... which is seen as a generalisation since it amounts to parameter reduction. However, if you replace with fresh random weights (although initialisation state is itself another set of random weights), the loss is high! Some layers are more sensitive to this than others in different network architectures.
I recently summarised this to a friend who asked "what's the most important insight in deep learning?" - to which I said - "in a sufficiently high dimensional parameter space, there is always a direction in which you can move to reduce loss". I'm eager to hear other answers to that question here.
2) Snowflake and its tiered storage, among other things http://pages.cs.wisc.edu/~yxy/cs839-s20/papers/snowflake.pdf
Gradualizing the Calculus of Inductive Constructions (https://hal.archives-ouvertes.fr/hal-02896776/)
I'm not sure if this is precisely the direction things should go in order to improve the utilisation of specification within software development but it's a very important contribution. As yet my favourite development style has been with F-star but F-star also leaves me a bit in a lurch when the automatic system isn't able to find the answer. Too much hinting in the case of hard proofs.
Eventually there will be a system that lets you turn the crank up on specification late in the game, allows lots of the assertions to be discharged automatically, and then finally saddles you with the remaining proof obligations in a powerful proof assistant.
Chen, Y.W.; Yiu, C.B.; Wong, K.Y. Prediction of the SARS-CoV-2 (2019-nCoV) 3C-like protease (3CL (pro)) structure: Virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates. F1000Research 2020, 9, 129.
This paper (based on a machine learning-driven open source drug docking tool from Scripps Institute) from Feb/Mar formed the basis for the agriceutical venture I started for supporting pandemic management in Africa. We’re in late stage trialing talks with research institutes here in East Africa.
Thought the Naiad project is really cool!
https://cs.stanford.edu/~matei/courses/2015/6.S897/readings/...
An interesting (no pun intended) paper on what makes papers (or anything in general) interesting.
It’s fairly accessible to anyone who vaguely remembers their CS theory, and quite fun!
Automerge [2] implements a variant of this.
Which is actually great because it gives me something to read on subjects im not familar with.
Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures [2] applies DFA, an approach to training neural nets without backprop, to modern architectures like the Transformer. It does surprisingly well and is a step in the right direction for biologically plausible neural nets as well as potentially significant efficiency gains.
Hopfield Networks is All You Need [3] analyzes the Transformer architecture as the classical Hopfield Network. This one got a lot of buzz on HN so I won't talk about it too much, but it's part of a slew of other analyses of the Transformer that basically show how generalizable the attention mechanism is. It also sorta confirms many researchers' inkling that Transformers are likely just memorizing patterns in their training corpus.
Edit: Adding a few interesting older NLP papers that I came across this year.
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding [4]
Do Syntax Trees Help Pre-trained Transformers Extract Information? [5]
Learning to Compose Neural Networks for Question Answering [6]
Parsing with Compositional Vector Grammars [7]
[1] https://arxiv.org/abs/2006.11287
[2] https://arxiv.org/abs/2006.12878
[3] https://arxiv.org/abs/2008.02217
[4] https://arxiv.org/abs/1908.04577
[5] https://arxiv.org/abs/2008.09084
https://deepmind.com/research/publications/Mastering-Atari-G...
It felt like a baby step towards general intelligence.
https://www.nationalgeographic.com/science/2020/04/first-spi...
Note that it has been published on arXiv just yesterday; I helped review an earlier draft.
Erik Hoel in this paper offers an audacious hypothesis: Our brain, during the its evolution, has developed dreams as a way to solve over-fitting
Since we’re learning from a limited samples of data in the real world, chances of overfitting (I call it judgement) goes higher. In ML we inject randomness and noise to avoid overfitting. Hoel theory can explain why our dreams are so sparse & hallucinatory
https://www.jprasurg.com/article/0007-1226(75)90127-7/pdf
Great read. Note if you're not going to read it that you yourself should not eat 35 eggs per day because these patients had calorie requirements of a little under 7000.
"Equality of Opportunity in Supervised Learning" (https://arxiv.org/abs/1610.02413)
It explain the basic concept about fairness in ML. Very practical exemple in my domain knowledge that show the trade-off between fairness of an algo and overall performance (money). Really make you see what may go wrong with bias in ML. It shows, in my opinion, why we will have to regulate ML as corporation aren't really incentivized to deal with fairness. It also shows that there is different notions of fairness. So there will always be something that feel unfair and also doing something can always be interpreted as positive discrimination.
Deep brain optogenetics without intracranial surgery
"Achieving temporally precise, noninvasive control over specific neural cell types in the deep brain would advance the study of nervous system function. Here we use the potent channelrhodopsin ChRmine to achieve transcranial photoactivation of defined neural circuits, including midbrain and brainstem structures, at unprecedented depths of up to 7 mm with millisecond precision. Using systemic viral delivery of ChRmine, we demonstrate behavioral modulation without surgery, enabling implant-free deep brain optogenetics."
https://groups.csail.mit.edu/genesis/papers/radul%202009.pdf
Even if a bit impractical in some regrards, I think an operating system/cloud that you interact with like a database is something we should aspirationally strive for. We're spending too much time gluing things together and not enough time being productive. Databases are great at tracking and describing resources (much better than YAML) and stored procedures that are like Lambdas would be neat.
A killer paper presenting an algorithm capable of inductive learning. ("DreamCoder solves both classic inductive programming tasks and creative tasks such as drawing pictures and building scenes. It rediscovers the basics of modern functional programming, vector algebra and classical physics, including Newton's and Coulomb's laws.")
http://www.heathershrewsbury.com/dreu2010/wp-content/uploads...
I think it's going to be years before we understand this properly, but in 2020 we are beginning to see practical uses.
At the moment I think it's a toss up: it's either going to be a curiosity that people read, have their mind exploded but can't do anything with or else it's a good chance to be the most influential deep learning paper of the decade.
In this paper, they did that with genes. And the 2d space that was left wasn't meaningless at all. It accurately recreated map of Europe.
No one knows what attention is (https://link.springer.com/article/10.3758/s13414-019-01846-w)
Molecular repertoire of Deinococcus radiodurans after 1 year of exposure outside the International Space Station within the Tanpopo mission: https://microbiomejournal.biomedcentral.com/articles/10.1186...
I've been reading up on the object capability security model a lot recently, and was pointed to this paper... I was hooked. A really compelling security model almost from first principles.
https://www.jneurosci.org/content/39/2/307
Abstract ran through a text optimizer:
We administered 100 mg MDMA or placebo to 20 male participants in a double-blind, placebo-controlled, crossover study.
Cooperation with trustworthy, but not untrustworthy, opponents was enhanced following MDMA but not placebo.
Specifically, MDMA enhanced recovery from, but not the impact of, breaches in cooperation.
During trial outcome, MDMA increased activation of four clusters incorporating precentral and supramarginal, gyri, superior temporal cortex, central operculum/posterior insula, and supplementary motor area.
MDMA increased cooperative behavior when playing trustworthy opponents.
Our findings highlight the context-specific nature of MDMA's effect on social decision-making.
While breaches of trustworthy behavior have a similar impact following administration of MDMA compared with placebo, MDMA facilitates a greater recovery from these breaches of trust.
https://pdfs.semanticscholar.org/c26b/4d3156b0c526d16c891ce7...
>"three of the four most cited papers in the journals deal with hypoxia [...] yet its routine clinical use is very limited."
'What if we treated AI as equals, like other human beings, not as tools or, worse, slaves to their creators?' That's the premise to this paper, which is a wonderful provocation. It's a really important consideration too, when you consider how many of our decisions we're asking machine sentience to make for us. If algorithmic bias were a human judge, they'd be thrown out of court (you'd hope).
https://www.researchgate.net/publication/342317256_A_systema...
-- because it was relatively straightforward to understand and convert to code. So it helped me understand backprop.
https://www.usenix.org/legacy/event/osdi10/tech/full_papers/...
Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion: https://arxiv.org/abs/2008.12228
aka SIREN: https://vsitzmann.github.io/siren/
"Erotic Modesty: (Ad)dressing Female Sexuality and Propriety in Open and Closed Drawers, USA, 1800–1930" https://onlinelibrary.wiley.com/doi/abs/10.1111/1468-0424.00...
When JAmes C. Scott wrote about infrapolitcs in his 1990 work "Domination and the Arts of Resistance: Hidden Transcripts" (https://www.jstor.org/stable/j.ctt1np6zz) and described it as a sort of political resistance that never declares itself and remains beneath what the dominant group can properly perceive until the power shift actually starts to happen, he probably didn't think of a case where the undeclared politics is so literally something not meant to be seen. The theme is very much a progression of how slowly women were able to establish even what parts of their body can be sexualized or not sexualized, and culminates in a sudden burst, or power shift, in the 1910s-1930s after centuries of aggregating individual choices and entirely unseen acts. This particular revolution managed to happen almost entirely outside of organization and public view and while it's by no means over, the progress made in the last twenty years covered in the paper really show how productive the aggregation of individual acts of resistance, done without any open plans, can still bring about so much change. It also showed the limits of such movements, particularly whe the dominant group rhas an active interest in preserving that status quo.
"How Qualified Immunity Fails" https://scholarship.law.nd.edu/cgi/viewcontent.cgi?article=4...
and "The Case Against Qualified Immunity" https://scholarship.law.nd.edu/cgi/viewcontent.cgi?article=4...
These two were both written by UCLA Law professor Johanna Schwartz over the course of about a year and half from 2017-2018, and really got a lot of attention this year when a lot of people for the first time asked "why does it seem impossible to actually hold abusive police to some degree of personal responsibility?" Having worked at a public defender's office and then on federal CJA cases (essentially federal defense work when there is more than one codefendant and the federal defenders would have a conflict of interest defending both), the abusive nature of policing was very much something that I saw constantly for years but it's difficult to quantify just how little potential consequence a police officer may actually face because nobody had done the shoeleather work to collect the data, and police departments tend to have opacity written into their contracts. The actual data collected by Schwartz demonstrating how the multiple layers of shielding negotiated into police contracts and just how much indemnification, which is actually illegal in many jurisdictions but universally ignored, pushes any potential liability onto taxpayers directly, creating a situation where victims' taxes are just getting looped back into the settlements they receive. There are a lot of problems in the criminal justice and really any carceral system this country runs, and most of it are poorly documented on a systemic level and difficult to quantify. It's nice to see that someone put in the work to make the picture a little clearer, as practitioners tend to be entirely focused on their clients to do research like this and this is a particularly unglamorous field of research.
MMR vaccine could protect against COVID-19
https://mbio.asm.org/content/11/6/e02628-20?_ga=2.139230451....
Constantinescu, Alexandra O., Jill X. O’Reilly, and Timothy EJ Behrens. "Organizing conceptual knowledge in humans with a gridlike code." Science 352.6292 (2016): 1464-1468.
Kriegeskorte, Nikolaus, and Katherine R. Storrs. "Grid cells for conceptual spaces?." Neuron 92.2 (2016): 280-284.
Klukas, Mirko, Marcus Lewis, and Ila Fiete. "Efficient and flexible representation of higher-dimensional cognitive variables with grid cells." PLOS Computational Biology 16.4 (2020): e1007796.
Moser, May-Britt, David C. Rowland, and Edvard I. Moser. "Place cells, grid cells, and memory." Cold Spring Harbor perspectives in biology 7.2 (2015): a021808.
Quiroga, Rodrigo Quian. "Concept cells: the building blocks of declarative memory functions." Nature Reviews Neuroscience 13.8 (2012): 587-597.
Stachenfeld, Kimberly L., Matthew M. Botvinick, and Samuel J. Gershman. "The hippocampus as a predictive map." Nature neuroscience 20.11 (2017): 1643.
Buzsáki, György, and David Tingley. "Space and time: The hippocampus as a sequence generator." Trends in cognitive sciences 22.10 (2018): 853-869.
Umbach, Gray, et al. "Time cells in the human hippocampus and entorhinal cortex support episodic memory." bioRxiv (2020).
Eichenbaum, Howard. "On the integration of space, time, and memory." Neuron 95.5 (2017): 1007-1018.
Schiller, Daniela, et al. "Memory and space: towards an understanding of the cognitive map." Journal of Neuroscience 35.41 (2015): 13904-13911.
Rolls, Edmund T., and Alessandro Treves. "The neuronal encoding of information in the brain." Progress in neurobiology 95.3 (2011): 448-490.
Fischer, Lukas F., et al. "Representation of visual landmarks in retrosplenial cortex." Elife 9 (2020): e51458.
Hebart, Martin, et al. "Revealing the multidimensional mental representations of natural objects underlying human similarity judgments." (2020).
Ezzyat, Youssef, and Lila Davachi. "Similarity breeds proximity: pattern similarity within and across contexts is related to later mnemonic judgments of temporal proximity." Neuron 81.5 (2014): 1179-1189.
Seger, Carol A., and Earl K. Miller. "Category learning in the brain." Annual review of neuroscience 33 (2010): 203-219.
Neurolinguistics:
Marcus, Gary F. "Evolution, memory, and the nature of syntactic representation." Birdsong, speech, and language: Exploring the evolution of mind and brain 27 (2013).
Dehaene, Stanislas, et al. "The neural representation of sequences: from transition probabilities to algebraic patterns and linguistic trees." Neuron 88.1 (2015): 2-19.
Fujita, Koji. "On the parallel evolution of syntax and lexicon: A Merge-only view." Journal of Neurolinguistics 43 (2017): 178-192.