HACKER Q&A
📣 Gyanangshu

How do you handle wide-to-long without Pandas/R?


I’ve been building a no-code data reshaping tool and ran into an interesting design problem around wide-to-long transformations, like pandas melt or R’s pivot_longer.

The idea itself is simple once you get it. You’re basically turning multiple columns into two: one for the variable and one for the value. But the terminology trips people up. In user testing, even folks with strong analytical backgrounds got confused by terms like “ID columns” and “value columns” if they didn’t come from a coding background.

The UI that worked best for us was pretty straightforward. Two lists side by side. On the left, columns that stay as they are. On the right, columns that will be turned into rows. Users can just drag and drop between the two. We also show a live preview of the first few rows before and after, which updates as they make changes. That alone reduced configuration errors quite a bit.

The harder part has been defaults. If the dataset has clear patterns, like columns named month_1 to month_26, it’s easy to guess what should be treated as values. But when naming is inconsistent, the guesses are often off.

Has anyone else built something similar? Curious how you approach auto-detection in these cases.


  👤 smcin Accepted Answer ✓
Because “ID columns” and “value columns” (or "pivot") are dry statistician jargon, as opposed to terms like "wide-to-long" , and most of your users won't be MS in statistics or economics. So like you do, just visually separate the columns into "columns used to identify the data" , e.g. Year-WkOfYear-Store-Department, and "value columns". (Posting a few screenshots for illustration would be really good).

> The harder part has been defaults. If the dataset has clear patterns, like columns named month_1 to month_26, it’s easy to guess what should be treated as values. But when naming is inconsistent, the guesses are often off.

Well, you'll probably need to iterate with user assistance via your UI, but you can often eliminate a lot by filtering on the inferred data type (float/integer/date/categorical/string/etc.), permitted range of values (e.g. 'Sales' is probably a positive float or integer), units, formatting, assume columns tend to be (fairly) contiguous, etc.

Post us a corner case or two. It helps if you tell us what domains your data typically comes from.