I am working on integrating Vectara to an existing database, where once a file is uploaded to the database, I want the file to be uploaded to a corpus in Vectara and indexed and ready for querying. From going over Vectara documentation, I understand that uploading a file and indexing a file on Vectara are two different tasks. What would be the best way for me to upload a file to the corresponding corpora (that I want it to be uploaded in) while also indexing it right away?
The file upload API results in not only the file extraction, but also an indexing operation. Did you experience some problems in seeing a file not searchable?
Thanks for your response!
No, I did not face any issues. So far I have tried only on console.vectara.com.
If the upload API extracts a file and indexes the data in it, what is the difference between upload API and index API? Can you brief on the particular use cases for the two?
The upload API takes “raw files” (PDFs, word documents, HTML files, etc) and then extracts text and metadata from those documents.
The index API is for sending semi-structured data programmatically where you control the text/metadata extraction. e.g. if you had a database that contained some fields that contain text and some metadata, I’d generally recommend using the standard indexing API, as it gives you the greatest control to structure your documents and also allows gRPC (which is lower latency) vs REST. The File Upload API does allow you to send custom formatted JSON documents, and there aren’t too many downsides to it, but I’d think the standard indexing API would generally make more sense for a database sync unless your database held document blobs containing PDFs and similar