How do you see the list of documents in a corpus?

I’m uploading documents one-by-one using the web UI, but I don’t know what’s already uploaded. It’d be useful to get a list of documents that are already in there to know whether I’m just inputting duplicates.

It would also help to preview a document in the corpus to see if it is out of date and needs replacing.

Hello and welcome!

Our indexing API is idempotent, so submitting the same document multiple times won’t cause it to be indexed multiple times, or consume additional quota.

That said, I agree it would be useful to have an API to list existing documents, but, unfortunately, we don’t have one at the moment. It’s something we’re considering adding in the future.

If you submit a changed version of a document that already exists in the index, you will receive a CONFLICT (409) status code from the system. You’ll need to delete the document first and then index it again. We are planning to add an update API in the future.

I hope this helps.

1 Like

Did you ever implement listing all docs in the index?

Hi Christian,

This is high on our priority list, as others have requested it as well. Is it blocking you at the moment? Would you be willing to shed some light on the use case you’re implementing that depends on it?

thanks,
Amin

Hi Amin, I would say so. I’m building a search engine and would like to give my users the ability to peruse the documents they are searching, as well as list how many there are. Something you can easily do w/ Elastic. Thanks!

Hey Christian, when you say “your users”, do you mean your end users who are using the application you’re building with Vectara? Or do you mean internal coworkers who are using Console to help you administer your data? Thanks in advance for clarifying!

CJ

Hi CJ, I mean external users, not coworkers.

Thanks for the additional context. Listing docs, optionally specifying a filter expression to restrict the selection, is high on our priority list.

Yep — that would be super.

Christian