Would this work?
I recently released a simple voice recognition app under MIT license[1], and now I'm working on expanding it to all kinds of open source home automation. It is getting big enough that I am thinking of dual-licensing it under a creative commons non-commercial license and a commercial license for users who want to pay for some extra functionality, however I want users to remain in control.
As part of this, to enable paying users I would like to be able to check if they are among my customers via a simple API call, but I want to preserve people's privacy so that they do not have to log in or authenticate in any way, and without actually transmitting their information or IP address.
For the license check, I was thinking of binning their IP address by CRC16 which has just 16 bits of entropy (65536 possible customers) meaning they would not actually tell me their IP address at all, just which bin they belong to which doesn't specifically identify them.
I myself would know their name and which bin they request based on when they pay me, so if there is an overlap (two different customers happen to have the same bin - for 30 customers the chances of this seem to be about 0.7% and it does go up from there a bit) then I would know about it and it just adds to their privacy as they are no longer unique in that bin.
When the user's IP address changes they would request the change from me in an authenticated way (for example logging in with username/password and setting their new bin) and I would update their assigned bin.
On their machine the open source code does the license check by just looking it up in a neutral third party database which doesn't track IP address (for starters a public Google sheets seems fine, it doesn't collect users' IP address and it doesn't really divulge anything to know which of 65536 are paying - anyone could be looking that information up, regardless of whether they're a customer, there are only 65536 possibilities. Actually Google sheets supports 40000 rows per sheets so above 40000 I would look on sheet 2 after subtracting 40 000, but the point is I would not include any customer details there except that that cell is a paying customer.)
The code for the CRC-16 binning could look like this basically
import binascii
data = "example_string"
crc = binascii.crc_hqx(data.encode(), 0xffff)
You can empirically verify that there are no shenanigans and that this bins to 65,000 addresses by just trying a few million random numbers as string: import random
print (random.random())
#0.7466695076975173 for example
values = {} # start with an empty dictionary
# add the crc of a million random values to the dictionary (keep track of their counts to see they're binned evenly if you want)
for i in range (1,1000000):
data = random.random()
datastring = str(data)
crc = binascii.crc_hqx(datastring.encode(), 0xffff)
if (crc in values):
values[crc] += 1
else:
values[crc] = 1
# print the number of entries:
print (len(values)) #prints 65536
What do you think about this approach as a way of preserving privacy without tying users to licenses in a reversible way?The main quality I'm looking for here is letting users retain control over their home automation setups and privacy, while letting a few paying users pay for added functionality.
[1] https://github.com/robss2020/computerplayverysexymusic
> I don't want them to tell me their IP address in detail, since this would just compromise their privacy and increase their attack surface area.
I don't see the point, your server knows their IP address anyway, and I bet you log it too.
It seems overcomplicated and unnecessary.
https://docs.google.com/spreadsheets/d/e/2PACX-1vSCaZIQqtWBr...
Currently it supports up to a few hundred thousand paying users.