I have a web app that uses Vectara’s Semantic Search Query API to get answers from my corpus. The corpus is in the form of a json file — A.json — that I uploaded using the web uploader.
Things are working quite smoothly. But now I need to increase the amount of data in my corpus. I have collected the new data in a file called B.json.
I have two options:
- Replace A.json with C.json, which I created by appending the array elements in the “section” portion of B.json to the array elements of A.json.
- Upload B.json to the corpus, leaving A.json unchanged.
Which one of these is the recommended method?
Hey @fourthquark
Great question. It really depends on the data.
A single file uploaded in this way is meant to represent a single actual document, where sections are parts of the document. This means that during query time, Vectara would grab matching parts of the document and return those along with the metadata of the matching document.
In your case If B is just an improved version of A with more data (but same logical grouping as a single document) then I would replace A with the updated A (or C in your terminology). If B is a completely different set of data, then uploading it separately also works.
For example, if A is a 10K document about Apple and B is call transcripts about Apple from that same earnings call then I would combine into C. But if B is about Microsoft then it’s different. I hope this example makes sense.
Thanks for the reply, @ofermend. It was very helpful.