HACKER Q&A
📣 esflow

What kind of scoring algorithms exist?


Hi HN, in my project I need to sort apartments and there is a lot of data about each of them. I give each parameter score, f.e. to the size of the apartment I give some certain score, to location as well, but it doesn't really work well because I don't understand how to give the weight to each parameter. Question: Is there some sort of guideline on scoring/sorting things, some algorithms that might help? Or you might have some suggestions on where to look for information about such things. Thanks!


  👤 PaulHoule Accepted Answer ✓
It's tricky. Read up on Pareto Optimization.

You can't really trade off square footage vs commute length linearly because there is no objective criterion.

What you can do is prove that Apartment A has fewer square feet than Apartment B and a longer commute so A is dominated by B.

Out of your complete set of apartments you can that there is a small set that dominate all the others. When you are down to that you can make your personal choice from that set.


👤 teh_g
For real estate, a quick and dirty solution that comes to mind is asking price per square foot, assuming that the market itself has taken into account all the common relevant factors. Unless price is something you intend to test against, of course.

👤 curo
Hopefully I understand your problem correctly...

You can use window functions to do things like dense rank, rank, percentiles, etc on each parameter in order to normalize the data. E.g., this one is at the 63rd percentile in size, 20th in distance, etc. This doesn't work so well if you have lots of 0s in your data.

Or you can find the min and max of each and divide by the max. This one is 42% of max, etc.

In each case you're trying to normalize diff parameters to represent something comparable (x/max, percentile, etc) so you can combine them. You can also do intermediate operations like take the logs or take the z score if you're trying to muffle the effects of outliers.


👤 xaedes
The problem you describe is known as "multi objective optimization".

Normalizing the input data to similar ranges usually helps, but there is no single golden rule how to weight. It depends on what you want to accomplish.

But regardless of any weighting in multi objective optimization problems there is a subset of all items (apartments in your case) that is better then all the items not in this set. This set is called the "pareto front". There are methods to compute this set.

You can't decide which item of the pareto front is better than another; it is a rock, scissors, paper situation. But the pareto front can exclude a lot of items, that you then don't need to consider. These items are worse in every aspect (optimization objective) than any item from the pareto front.

As a computer science student we often used population based optimization methods for dealing with multi objective optimization. For example ant colony optimization or evolutionary algorithms.


👤 jventura
In these kind of things I usually start by multiplying (or dividing) the parameters to get a score (weight=1). Then I sort them, see if they look "good" and add weights as needed. My thinking is that the first result should match my preferences and the last result should not. I know it's a kind of confirmation bias, but..

When I bought my car years ago, I had some parameters, but comparing the price was harder because of devaluation. I remember that I assumed a 15% yearly devaluation so that I could compare prices. For instance, a 2000's car valued 1000€ was almost similar in price to a 1999's car valued 850€..

I make sure to always include a "Preference" (aka bias) parameter and give it more or less weight the more it harms my results.


👤 mmb_
if i were you i would look to PCA analysis and cluster analysis. PCA can help find what are the key (reduced) set of drivers/dimensions that provide you with the ability to explain the majority of your variability. Cluster analysis could help you group the data you have in meaninful reduced set of groups and then you can measure dimensions within clusters and between clusters to help you come up with a strong score system. Again assuming you have a relative high dimensional problem with many data

👤 inertiatic
Well, rank based on what? You need a metric to optimise for.

Maybe that's price, or maybe that's user clicks or bookings if you're Airbnb.


👤 brudgers
Cosign similarity is a standard method of searching multi-dimensional data. https://en.wikipedia.org/wiki/Cosine_similarity

👤 BOOSTERHIDROGEN
37% rule in algorithms to live by book's