Similarity search men <> women for matchmaking

I am looking to index files, men.csv and women.csv separately with about 3000 men and 5000 women for our matchmaking company. I want to do load up a man vector and find ranking of top 10 women who match with a similarity search. I would like to weight Age and City as highest.

I assume each row would be treated as a “document” to return based on the input vector. I do note that I would have to convert the CSV to JSON to provide as the input data. My questions are:

  1. Should I concatenate into a single line for each “document” and give the column title as a lead-in text? Ex. Name: Scott - City: Honolulu - I speak these languages: English - etc…
  2. Is this even the correct platform to pursue this, or would Pinecone, Weaviate, or another vector database be the better way to go.

In short, my corpus are: rows of data to be returned separately as similarity search. Thanks for any direction or insight on my particular use case.

I might as well paste in the column titles here as an example of what each “document” would have for the similarity search.
Person_id KI Type FirstName Age City State Postal Marital Status My annual income is: My eyes are: My hair is: My weight is: My ethnicity is: I would date these ethnicities: My Religion is: Is my matches religion important?: I speak these languages: Traveling frequency and where I go: Sports and fitness?: My hobbies are: Do I have or enjoy Pets?: My health is: Do I Smoke?: Do I Drink?: Am I open to possibly relocating for the right person?: Am I willing to travel for a match?: This is where I stand on Politics: My top 3 Priorities for a match are: Who am I looking to meet?: I have been married this many times: What was missing in my last Relationship? Why did it end?: If asked to describe myself: My background is: The 2-3 accomplishments am I most proud of are: Here are my top Personal Goals: Publications or podcasts I read or engage with regularly: How I like to spend a Sunday: My Relationship Goals: My ideal match: My desired age range is: I consider this Maximum Age range: Do you have children? How many children?: Kids ages: Do you want children?: Are His/Her Children OK?: These are my dealbreakers: Others notes about me: Days since First Contact - Lower is better:

Hi Scott,
You can use Vectara custom dimensions to essentially add numerical values to the embedding vector that Vectara generates from the text.

  • Do you have some basic text segment for each person and in addition the above mentioned variables?
  • My approach would be to consider which variables to add to the “row sentence” and which to implement as custom dimensions. For example “languages” could be a good textual variable to just add in to the text portion whereas income might be better suited for a custom dimension.
  • Note that when you integrate variables in the text segment, then the matching that occurs is “semantic”; are there cases where having opposite preferences or views results in a better match?