Similarity search men <> women for matchmaking

Scott_Soderstrom · May 2, 2023, 10:44pm

I am looking to index files, men.csv and women.csv separately with about 3000 men and 5000 women for our matchmaking company. I want to do load up a man vector and find ranking of top 10 women who match with a similarity search. I would like to weight Age and City as highest.

I assume each row would be treated as a “document” to return based on the input vector. I do note that I would have to convert the CSV to JSON to provide as the input data. My questions are:

Should I concatenate into a single line for each “document” and give the column title as a lead-in text? Ex. Name: Scott - City: Honolulu - I speak these languages: English - etc…
Is this even the correct platform to pursue this, or would Pinecone, Weaviate, or another vector database be the better way to go.

In short, my corpus are: rows of data to be returned separately as similarity search. Thanks for any direction or insight on my particular use case.
Scott

I might as well paste in the column titles here as an example of what each “document” would have for the similarity search.
Person_id KI Type FirstName Age City State Postal Marital Status My annual income is: My eyes are: My hair is: My weight is: My ethnicity is: I would date these ethnicities: My Religion is: Is my matches religion important?: I speak these languages: Traveling frequency and where I go: Sports and fitness?: My hobbies are: Do I have or enjoy Pets?: My health is: Do I Smoke?: Do I Drink?: Am I open to possibly relocating for the right person?: Am I willing to travel for a match?: This is where I stand on Politics: My top 3 Priorities for a match are: Who am I looking to meet?: I have been married this many times: What was missing in my last Relationship? Why did it end?: If asked to describe myself: My background is: The 2-3 accomplishments am I most proud of are: Here are my top Personal Goals: Publications or podcasts I read or engage with regularly: How I like to spend a Sunday: My Relationship Goals: My ideal match: My desired age range is: I consider this Maximum Age range: Do you have children? How many children?: Kids ages: Do you want children?: Are His/Her Children OK?: These are my dealbreakers: Others notes about me: Days since First Contact - Lower is better:

ofermend · May 4, 2023, 4:57am

Hi Scott,
You can use Vectara custom dimensions to essentially add numerical values to the embedding vector that Vectara generates from the text.

Do you have some basic text segment for each person and in addition the above mentioned variables?
My approach would be to consider which variables to add to the “row sentence” and which to implement as custom dimensions. For example “languages” could be a good textual variable to just add in to the text portion whereas income might be better suited for a custom dimension.
Note that when you integrate variables in the text segment, then the matching that occurs is “semantic”; are there cases where having opposite preferences or views results in a better match?

Topic		Replies	Views
How to feed JSON data to Corpora to retrieve talents recommendations Vectara Platform Q&A query , indexing	5	831	October 22, 2023
SQL database --- how can I 'connect' Vectra to SQL? Vectara Platform Q&A	10	923	November 11, 2023
List of Words/Ngrams of a given Document indexed with Vectara Vectara Platform Q&A	3	872	May 23, 2023
Similar content fetching instead of whole document from the multiple documents. Vectara Platform Q&A	1	247	May 7, 2024
Chained chat / ability to retain history / chat against conversation, csv upload Vectara Platform Feature Requests	3	768	September 5, 2023

Similarity search men <> women for matchmaking

Related topics