Response Body: {
messages: [
‘FAILED_PRECONDITION: Factual consistency computation skipped due to summarizer error.’
],
request_id: ‘6c4de54d112f5d175a3c19a58457d454’
}
The strange thing is that I am calling the api around 15 times in quick succession (now that v2 api does not support batch calling like v1). Some of those calls are getting 412 errors, others are getting ok response back. The only difference between the api calls is the query text itself - all configurations are unchanged.
The issue is assigned to one of the engineers and he’s actively looking at it. I’ve nudged the engineer, and he should respond here soon with his progress.
Hey Vivek, it seems that you are using a chain reranker and the first reranker cuts off all results. This can happen as different queries will have results with different scores, so whatever cutoff you are using seems to, in some cases, remove all results. Could you share your reranking config for the request?
Ah I see, thanks @husseinhassans .We currently only have dummy data in the corpus during development, I see why the cutoff would be too strong. It does seems that 412 status is bit misleading, it would be more clear to get “No responses found” back from the summarizer.
Our team is new the to the concept of reranking. We were following what appeared to be best practice from the docs and blog posts. Would you recommend a different approach?
Thanks for the feedback Vivek. You’re absolutely right, posting a fix for the error log soon.
A quick note about reranking: Rerank_Multilingual_v1 essentially ensures your results are in order of relevance. So for the most part, this is what you want to use. The knee reranking function is used to remove results it determines are not useful from your result set. It operates better on a larger result set. Therefore, you don’t want to use any limits or cutoffs prior to applying the knee reranker.
If you want optimal performance, don’t use limits or cutoffs in Rerank_Multilingual_v1 when used in tandem with knee reranker. use your desired limits or cutoffs in the final stage. Our recommendation is just using a 0.5 cutoff, but if you would like a limit of 10 for your purposes, that should go in the final stage:
const VECTARA_RERANKING = {
SLINGSHOT: {
type: ‘customer_reranker’ as const,
reranker_name: ‘Rerank_Multilingual_v1’
},
KNEE: {
type: ‘userfn’ as const,
user_function: ‘knee()’,
cutoff: 0.5,
limit: 10 #THIS IS OPTIONAL
}
} as const;
Could you point me to what docs recommended you to use this type of config? Might need to fix that.
On reread, I think the docs are clear and the blog post laid out the recommended approach well. I likely got mixed up between looking at the different examples for combing cutoffs and limits.