HACKER Q&A
📣 lucas-piske

What piece of code you wrote are you most pride of?


What piece of code you wrote are you most pride of?


  👤 codegeek Accepted Answer ✓
I was working for an Investment Bank in 2010 as a Tech. Business Analyst. My job was not necessarily to write code.However, we had a major problem with one of the projects where we had to process hundreds and thousands of rows in an Excel spreadsheet for testing and the version of Excel was 2003. The limit for one worksheet was 65,000 rows only and it used to take 3-4 hours to do the testing (don't remember details anymore) due to manual copy paste across multiple worksheets. One time, my boss mentioned in a meeting that perhaps someone could do it in MS-Access instead ? I decided to do it without telling him and put together a proof of concept in MS-Access in 2-3 days and then handed it over to a developer to finalize. I did not know much about VBA and googled the heck out of it to put that application together. We reduced the testing time to less than 30 mins and I was so proud of that. Not the best code but the best problem I solved till then.

👤 jjjbokma
I am quite proud of the tumblelog [0] code I wrote. There is both a Python and a Perl version and it was quite a challenge to keep them in sync and idiomatic (I hope I succeeded with the Python version). I am also happy with the output [1]

[0] https://github.com/john-bokma/tumblelog

[1] http://plurrrr.com/


👤 closeparen
A conscice, self contained, unassuming little function that represents some out of the box thinking to vastly simplify a previously hard problem. Those 25 lines are backed by several months and 10+ pages of analysis establishing that their simple approach is at least as good as the obvious approach, which would have meant integrating several very complex dependencies. Never been prouder of a diff so small.

Close second, a system I designed and built 2.5 years ago, which faded into the background and plugged along. Business interest has renewed, and everything they’re now asking for fits neatly into the architecture. Really satisfying to see something hold up.


👤 throwaway43635
The ones I wrote and upstreamed to fix bugs in the Linux kernel.

👤 watersb
The code I am most proud of is the code I did not need.

👤 atomashpolskiy
I'm not particularly proud of any code that I've wrote (looking back, all of it sucks in one way or another), so I'd rather share a piece of code, that I've written most recently (and maybe receive some constructive criticism, hehe).

The piece of code that I'm speaking of is a variation of Bip Buffer, which itself is a special kind of ring buffer. Ring buffers, also called circular queues, are commonly used for producer-consumer scenarios. The main difference between a typical ring buffer and a Bip Buffer is that the former operates on distinct elements whereas the latter operates on contiguous data (e.g. binary streams).

--

In short, Bip Buffer is backed by an array, which it splits into two regions: A and B. At first, there's only a single region A, and all incoming data is appended to it. Simultaneously, the data is consumed (stripped off) from the beginning of the array, probably by some other process. Region's boundaries are updated as the data comes in and goes out (in FIFO fashion), so the region kind of floats from "left" to "right", akin to a sliding window, but shrinking and growing in lockstep. As the array gets more than half full (region A's right boundary crosses the halfway point of the array), it's a good time to start thinking about wrapping around. Since now, on each change to region A's boundaries the Bip Buffer compares free space on both sides of the region. As soon as there's more space to the left than to the right, a second region B is created there (on the left, i.e. in the beginning of the array).

Now the newly arriving data is appended to B, but the old data is still consumed from A. This continues until one of two things happen:

1) B's right boundary hits A's left boundary. In this case no more data can be added to the buffer, so the producer has to wait until the consumer catches up.

2) All of A's data is consumed. Now there's only a single non-empty region left, namely B, so we may as well start calling it A and start over.

--

This aside, my use case is a bit more complicated than just reading a continuous stream of data. I have a messaging system, where nodes exchange large quantities of data with each other. All communication goes through a single TCP connection, and this includes both service messages (like, requests for blocks of data and announcements about having finished downloading certain blocks of data) and the actual data.

What I as a well-behaved peer want to be able to do is:

(a) Be able to process actual data asynchronously, as there are multiple incoming streams, disk I/O, checksum verification, etc., and all of these contend with each other and generally take some time.

(b) Process service messages as soon as possible (e.g. in case some peer, that happens to be sending me data, in the midst of the process sends me a request for some other data). In technical terms, I want to minimize latency of my peer's requests.

A simple way to accomplish this would be to copy blocks of data into memory and enqueue them for delayed, asynchronous processing. This works pretty well for small amounts of data (tens of MB's per second), but has some limitations:

- Some runtimes incur overhead for each allocated object. E.g. in JVM each object is prepended with several bytes of metadata; references to the object are accounted for for purposes of garbage collection; if asynchronous processing of the object is taking too long, it may be promoted (copied) to a different, long-lived part of the memory (and, depending on the load, "long" may actually mean 50 ms more than usual).

- Copying takes time. Obviously, one copy from main memory to disk is absolutely necessary. NIC buffer can be mapped to main memory via DMA, if the OS supports it, or else it's one more copy. Why add yet another copy to application process's memory?

--

So my idea is to tweak the Bip Buffer in such a way, that it can hold the data, that's going be processed asynchronously, while at the same time continuing to process next messages in the stream. Basically, it remembers, which areas of the buffer are "reserved" for delayed processing (and thus must not be overwritten), and goes on operating as usual and reading subsequent data, until it wraps around and hits the reserved data, and only then it stops. And if the data arrival rate is not too high, then chances are, the reserved data will already be processed and disposed of by the time the buffer wraps around.

Link to code: https://github.com/atomashpolskiy/bt/blob/d0f10b7a306c7595ef...

Please keep in mind, that it's a POC and was programmed in a single day, so there might be (must be?) some edge cases, that I did not think about, and the code style is a bit rushed (was too busy drawing pictures of this craziness and debugging :)