Semantic search: replace A.json with C.json

fourthquark · September 27, 2024, 7:30am

I have a web app that uses Vectara’s Semantic Search Query API to get answers from my corpus. The corpus is in the form of a json file — A.json — that I uploaded using the web uploader.
Things are working quite smoothly. But now I need to increase the amount of data in my corpus. I have collected the new data in a file called B.json.
I have two options:

Replace A.json with C.json, which I created by appending the array elements in the “section” portion of B.json to the array elements of A.json.
Upload B.json to the corpus, leaving A.json unchanged.

Which one of these is the recommended method?

ofermend · September 27, 2024, 5:05pm

Hey @fourthquark
Great question. It really depends on the data.
A single file uploaded in this way is meant to represent a single actual document, where sections are parts of the document. This means that during query time, Vectara would grab matching parts of the document and return those along with the metadata of the matching document.
In your case If B is just an improved version of A with more data (but same logical grouping as a single document) then I would replace A with the updated A (or C in your terminology). If B is a completely different set of data, then uploading it separately also works.
For example, if A is a 10K document about Apple and B is call transcripts about Apple from that same earnings call then I would combine into C. But if B is about Microsoft then it’s different. I hope this example makes sense.

fourthquark · September 29, 2024, 3:21am

Thanks for the reply, @ofermend. It was very helpful.

Topic		Replies	Views
Upload JSON data in my own structure? Vectara Platform Q&A	7	158	June 11, 2024
Uploading files & Indexing Vectara Platform Q&A indexing	3	863	July 21, 2023
SQL database --- how can I 'connect' Vectra to SQL? Vectara Platform Q&A	10	925	November 11, 2023
How to feed JSON data to Corpora to retrieve talents recommendations Vectara Platform Q&A query , indexing	5	834	October 22, 2023
Search not picking up programmatically added doc	5	880	June 14, 2023

Semantic search: replace A.json with C.json

Related topics