Currently I am testing indexing a single wikipedia page (about 50kb according to index stats) and retrieval.
It seems like retrieval alone is about 700ms for me (querying for a single word). For requests that need token refreshes, that adds another 500ms (on average).
Do you have any tips for how I might go about speeding up retrieval time?
Thanks for reaching out. Our query latencies are much shorter than what you are observing. Can you please share your customer (account) ID and corpus ID? That’ll help us diagnose further. You can share this information offline with me via email: tallat (at) vectara.com.
So after a very helpful exchange with Tallat, there are 3 things I’ve learnt:
Reusing the the tcp connection goes a long way to speeding things up. I am using python so using requests.Session() did it for me (again, thanks Tallat for this tip)
I stopped using OAuth for requests and instead generated an API key. OAuth requests (even when not refreshing the token) were adding a large overhead to request initialisation.
Vectara servers are currently US west coast, and I am in the UK. So that may be one reason I’m not seeing the latencies I would expect.
With the above changes I was able to get my average latency down to about 200ms. I think in an ideal world they would be closer to 50ms, but I will wait till Vectara supports more regions for that.