- customer_id - customer_first_name - customer_last_name - customer_loan_number - customer_phone_number - customer_email - customer_address - customer_date_of_birth - customer_zip_code
What I ultimately want is customer_id. But what I have is any combination of the subsequent fields. They might be all lowercase, they might not. There might be type-os. The phone number might have parentheses and dashes, it might not.
The address and zip are also wonky - for example customer_zip_code may look like XXXXX-YYYY, or it may just be XXXXX.
And example query would be: given customer_first_name and customer_last_name and customer_address, find the customer_id. The customer_last_name might be lowercase in the query and uppercase in the main database. The customer_address in the query might be "123 Main St" but in the database it might be "123 MAIN STREET".
And the data is in the 100m-1000m rows size, so this has to run reasonably quickly with that in mind. Can't just do a "select * where [x = y] and [p = q] and [t = s]". Would like to pre-load the big table in RAM, and then somehow look up the matches.
Because this almost definitely a common problem in our industry, I was wondering, is there some kind of good database already pre-optimized for this kind of fuzzy lookup?
Deep dive on how it works: https://podcasts.apple.com/us/podcast/world-of-daas/id157064...