Every day there is around 200 users, and it generates me some ramen profit.
Users report that my web app is sometimes slow.
I know that there are many plausible causes. But before I jump into the rabbit hole, I want to ask for your kind advice.
What is your mental framework to deal with such a problem?
FYI, my app:
- is built with NodeJS
- ran on a cloud server with 2 core CPU, 2G Ram, and 1M bandwidth
- is a CRUD app
I have learned about building apps from the web, mostly from Udemy and Youtube. Someone has told me to not worry about performance. `Make it work, then make it better`.
Now that I have learned how to make it work, how can I learn how to make it better? Do you have any links or materials to share?
Thanks a lot.
1. You can find that from basic logs - just logging the duration of each request should be enough - either at the app or the web server level. Figure out which endpoint takes more than expected time and in what situation (with specific parameters, specific user, etc.)
2. You should be able to use some tracing library or APM to figure out what's slow. It could be a call to the database, processing some data, or something else. If it seems obvious what the suspect is, you can also take the timing with a trivial `start=current_time(); the_suspect_thing(); log(current_time()-start);` For APM, datadog has a free trial: https://www.datadoghq.com/product/apm/ but there are other options too.
3. Figure out how you can improve that. For database, often indexes help. For data processing, you can try to do less of it. (filter before processing, cache results, etc.) Whatever is slow, you may need to research more in a specific area. No resource will cover everything, but asking on Stackoverflow is not a bad place to start.
One wise move is to validate that your app really is slow. Do you experience the same slowdowns? If not, it could be network latency, which is completely dependent on where your ends users are working.
I work with many rural end users, and we have noticed that specific areas of the country tend to be where we get complaints of slow apps. We've worked with ISPs in that area to identify and correct old hardware on their network, which improved performance of our app.
If you can reproduce the slow behavior, by all means, follow everyone else's advice. But just like you would reproduce a bug before fixing it, reproduce the slowness.
You should create a log file (text file is fine), where you write as much as possible about the program flow: user logged in, user navigates ..., request started, query submitted with parameters, query responded with 1100 records, result processed, result formatted, result delivered, finished request.
The log should be appropriately detailed.
Correlate the log entries with the user ID, session ID, request ID, thread ID, etc, whatever appropriate.
Timestamp the log entries with millisecond times (2021-12-27 14:21:33.512)
Watch and observe the request flows over period of days.
Perhaps you will see where the problems are.
1) Too many records fetched from DB.
2) Too slow DB fetch (missing indexes)
3) Cold DB caches
4) Some exponential loops
Instead of text file, you can use a structured log, but a simple text file will quickly give you results.
You don't have a lot of resources (RAM/CPU) in use so it is possible that's the constraint but you have a real learning opportunity to understand better how your tech stack works and learn some performance tuning. Your cloud provider should have some stats for you to see if those things are hitting limits, some simple charts to glance at would at least give you an idea.
You mention its a CRUD app so first thing I would look at and spend time on is your SQL queries and analyzing them. The majority of the time in web requests for CRUD apps is in the DB - because it could be loading and reading a file from disk (where the DB stores data) which is typically slower than receiving bytes on a network socket. Learn what `explain plan` means for your database flavor. I am going to venture a guess that you may not have database indexes on some tables. An explain plan for your queries should tell you what indexes, if any, are used.
For your front-end you could look into caching services for any JS framework libs, images, etc. assuming there is a way to do that on your network in China. That would reduce the bandwidth usage and if those services work properly, your end users would be downloading those assets from a faster server than you provide.
vCPUs don't prioritize or guarantee performance.
2GB physical ram may not be a lot.
1Mb bandwidth (~100KBytes/s) may not be a lot if multiple simultaneous users with bulk payloads.
Storage may not be performant.
Personally as a first step I would just migrate it to a better, dedicated server. Consider the cost of your time to research it. In the West a midrange older dedicated server starts at ~US$30/mo. All kinds of potential badness about being in a vm on a shared resource are eliminated.
If not, or if that doesn't buy you a solution, you must characterize which of the vserver itself, cpu, storage, or bandwidth is the bottleneck.
Assuming it's linux, I would ssh in and set up prometheus or similar monitoring so you can see it over a week.
If no problem, hit it with concurrent transactions until there is a problem. Compare what you have to do to see a problem on a setup on your laptop vs the remote server, you can judge your vserver then.
For cpu, look with top how it acts when you give it transactions. Who eats cpu?
Memory, top + vmstat.
For storage, iostat.
Network, netstat -s
2) set your caching in assets and images to preserve bandwidth
3) test if your db is the bottleneck.
Eg. Some applications have their performance log in a response header ( db: 20 ms, logic : 10 Ms, total: 50 ms)
Definitely start profiling and try to up you bandwidth depending on your current online users.
- Use the app, see what is slow from a subjective UX perspective
- Profile your code, find hotspots
- Use system monitoring to determine if there are any hardware/network bottlenecks
One thing I would add on these, perhaps a bit more specific: You mention you are using NodeJS, and it is a CRUD app. What is your database? Are you using an ORM? Sometimes the choice of especially an ORM can cause some performance issues behind the nice opaque interface. I have no specific reason to believe this is causing the performance issues for you, but I've had issues myself in the past so thought it might be a helpful suggestion to look into (for example, N+1 query issues).
If you are using NodeJS, it is optimized for I/O and not computation. Since it's single threaded and async at it's core, any blocking computation will impact entire service.
One approach could be to profile your app using a myriad of node js profiling libraries. (https://nodejs.org/en/docs/guides/simple-profiling/)
If you want a solution, try deploying your JS code as AWS Lambda or GCP Cloud Functions. They may address the issue at lower cost while you fix the issue.
I used datadog recently to find the exact line of code that would bring CPU to 100%. Since it's quite pricey, I only used it for the couple of weeks I needed it.
I don't know what the standard tool is for node, but there must be one.