HACKER Q&A
📣 optimalsolver

Largest speedup you ever achieved by only changing a few lines of code?


Largest speedup you ever achieved by only changing a few lines of code?


  👤 chrisutz Accepted Answer ✓
Started a new job working on a company's API team. The API had out of memory issues and had processes crashing all the time.

The code was PHP. All API calls ended like this:

echo json_encode($data) . "\n";

Changed just one character, the period to a comma so the string wasn't duplicated before being output. Problem solved. Felt like a hero.


👤 muzani
Gallery code that queries all photos and shows them to the user. It had a 13 sec delay with 10k photos and above.

I thought it was a Big O problem at first, because the code was hacky and used more Arrays that it needed. But it was because it was obtaining all images and then sorting them by time.

I sped it up to milliseconds by making the queries sorted by time.


👤 Arcten
Iterating via reference instead of copy in C++ is a great way to get a speed up with a one character change.

I.e. changing for (const MyType v : collection) to for (const MyType& v : collection)


👤 yen223
If you're working on Django codebases, `select_related` and `prefetch_related` are going to be your friends. Maybe 70-80% of Django slowness I've encountered (back when I did Django) was because of something issuing hundreds of db calls unnecessarily.

👤 yongjik
Almost twenty years ago, found a debug logging function using O_SYNC. Took it out. Test ran so fast that a teammate said that they thought it didn't run.

Now, of course, there's zero reason to write every debug output with O_SYNC. Classical case of cargo-cult programming. I'd like to say I've never seen something like that again, but then I'd be lying.


👤 max_hammer
Moved data aggregation from Oracle to `awk`

The process was loading 40GB files to database and aggregation took more than 5 hours.

Wrote a simple awk one liner with associate array and process was completed in minutes


👤 maxrev17
Dropped a request by 40 seconds, by preventing entity framework round-tripping the db 3000+ times, for a property which was already held in memory! Pretty much a 1 liner change. Took some time to find, and the use of miniprofiler (thx Stackoverflow!).

👤 pknerd
The company I used to work with was a b2b portal. Their Chinese customers were having issue to submit form due to the firewall and was taking 22 seconds. What I simply did was sending close header from server which made client browser to shut the request and release. Behind the scene it was still taking same time but at least customer weren't waiting. It was kind of Ajax but using apache flags.

👤 bradknowles
My most recent example is some SQL commands that were being executed on a nightly basis by another team against our database. Their code was sometimes taking more than eight hours to run, and this could cause timing problems where it took so long that it caused further processes down the pipeline to fail.

I took their exact SQL commands and wrapped them in my own scripting, and the simplified single-threaded version was executing in about fifteen minutes.

I went back through and added some explicit parallelization combined with wait commands to ensure that everything in that stage was complete before going to the next stage. That improved version now executes in around 600 seconds.


👤 lakkal
Not really lines of code as such, but in Visual Foxpro, opening a DBF (database table file) on a fileserver, from an application running on that fileserver, referring to the file using its local path rather than a mapped network drive file path, resulted in a big speedup. (the application typically runs on RDP servers, but can also be run directly on the fileserver, which we do for some heavy-duty processes)

👤 speedgoose
Wrapping multiple SQL mutations into a single transaction.

Adding the right indexes in a relational database.

Converting from csv to parquet before querying large datasets on Apache Spark.


👤 dsgrillo
Sending campaign flow would create tens of thousands records on DB. After each record insertion, there was a sleep(0.1) - in place to solve problems related to master/slave setup in other flows - just conditionally disabling the sleep was enough to reduce the procedure time from ~5min to ~30 secs

👤 LarryMade2
Revising a complex SQL query, so it doesn't do everything in one fail swoop which resulted in 1000x more records to process than necessary. By using sub selects/union made the query that took a half minute down to a couple seconds.

👤 iujjkfjdkkdkf
Some basic word counting on text files in bash, I was tokenizing to one token per line and then counting:

tr -cs '[:alnum:]' '[\n*]' | sort | uniq -c

The sort takes a long time (probably just n log n I guess) on a big text. Swapping for

awk '{k[$0]++} END {for (token in k) print token, k[token];}'

and then sorting on the numbers does the same thing faster.


👤 sharmi
import gc gc.disable()

in python. The python script would previously take 4 hours to run. There were lots of small functions and these were called on a loop. I ensured that there were no circular references within any of the functions and then disabled gc (Which is what the gc would look for, variables without circular references would be garbage collected automatically when they go out of scope.)

The script ran in 20 mins.

The hue and cry people raised over the gc.disable though convinced me never to do any unconventional optimizations again.


👤 amir734jj
MongoDb driver in C# was failing to convert LINQ expression to mongo query. I noticed it added unit test to make sure it would never silently do in-memory filter. Night and day difference.

👤 Black101
Modified an SQL query... went from a few minutes, to a few seconds.

👤 surds
Took a massive data migration from several hours (over a day) to a few minutes (less than 10) on MongoDB with the right indices.

👤 Leparamour
Does somebody have any stories for Python codebases?

👤 billconan
changed from single thread to multi-threads,

changed std::map to std::unordered_map