Summary Is Answering Questions That Weren't Asked

Hello!

I’m testing Vectara search and summary function based on one of our knowledge base docs. And it seems to be answering questions well, but it also answers questions that were not asked in the prompt.

For example:
Question- Can I take modified citrus pectin with the rest of my supplements?

Answer- Yes, you can take your modified citrus pectin (MCP) with the rest of your supplements, but you should wait 30 minutes after taking the MCP before taking the other supplements. This is because the MCP absorbs other chemicals that are around the pectin fiber, and you don’t want it to do that to your anti-cancer supplements. Coffee with a small amount of unsweetened soy milk can be taken soon after taking the other supplements, though you should wait 30 minutes to an hour after if you are also taking beta glucan.

This is the part of the answer that is irrelevant:

Coffee with a small amount of unsweetened soy milk can be taken soon after taking the other supplements, though you should wait 30 minutes to an hour after if you are also taking beta glucan.

Please let me know your thoughts.

Thanks!
Gene

Hi Gene, and welcome to the forums!

A lot of different things could be going on, and I think we need some more details to try to figure out the best path forward. Some examples of some things that could be going on:

  • It’s possible that you have some bug in your code, especially if you’re using the gRPC futureId feature and you’re getting a futureId for a different request back. I don’t suspect that to be true, but knowing if you’re using gRPC + futures would help to potentially eliminate this possibility
  • It’s possible that the underlying retrieval model that happens before the summarization could be retrieving results that aren’t as relevant as they’d need to be to generate good summaries. (We do know of a few instances like that and we have a plan to address it by more aggressively automatically cutting off irrelevant results). A kind of manual approach to this in the interim is to turn the maxSummarizedResults down to a lower number, especially if you’ve turned it up.
  • It’s possible we have an issue with our generative system or it’s possible you need a higher quality summarization than what we make available to our Growth (free/self-service CC) users. We do have higher quality summarizations available to our Scale plan customers than what we offer to our Growth plan users.

What would probably be most helpful is to understand first and foremost what the references/sources are that Vectara has cited. Vectara issues these citations by putting them in [number] format in the summary text, where number starts from 1 and increases by 1 in each result in the responseList. e.g. the summary/answer should be something like

Yes, you can take your modified citrus pectin (MCP) with the rest of your supplements [1], but you should wait 30 minutes after taking the MCP before taking the other supplements [2]. This is because the MCP absorbs other chemicals that are around the pectin fiber, and you don’t want it to do that to your anti-cancer supplements[4]. Coffee with a small amount of unsweetened soy milk can be taken soon after taking the other supplements, though you should wait 30 minutes to an hour after if you are also taking beta glucan [7].

If you’re not getting these [1] style citations, then something else is going wrong entirely that we need to look more deeply into. But if it is, knowing what the references are and what they cite would be one of the next steps to investigate.

You can post these here if they’re not sensitive, or if they are sensitive, I’d be happy to connect you to a technical resource on our side to help look and/or set you up with a trial of the Scale account to see if a higher quality generative summarization helps. You can reach me at shane@vectara.com or DM me here

Hi Shane!

Yes, the answer does include citations. I’m not sure why it didn’t paste in.

  1. About possibly having bug in my code… I don’t think this would be an issue since I only uploaded the corpus and did the search right from the internal search function in Vectara. I haven’t done any kind of API integration yet.

  2. Does the issue with "underlying retrieval model " apply here as well? since I’m just using the internal search function? (Actually, i just saw the “advanced options” where I can adjust “lambda” and “number of search results to summarize” … Is this what you’re referring to that I adjust?)

  3. Do you have any documentation on how to format the Corpus for best results? I realize that ours might be a bit confusing, since questions are grouped together by the emails they came from…

So the info in our corpus is formated like this:

Title: which summarizes the main questions
Tags: related keywords
Questions: all the questions that came in a single email, which may cover a few topics.
Answer: all the answers to the questions in the email which cover all those topics.

  • So I realize that the summary mentioned the coffee because that question was also included in that same email (though I didn’t ask about it in my search.)

Please let me know your thoughts. Thanks!

Yeah. To give a sense of the underpinnings of how Vectara works, we employ several different optimized LLMs/embeddings models at different stages of the index & query pipelines. In particular, for the generative summarization, your query will pass through at least 1 model during “search” before summarization. Vectara uses LLMs to search and then finds the most relevant semantic results. It then has the ability to pass a large set of those results through an even more accurate model (the “reranker” as it’s called, which is only available to Scale users), but regardless of whether it’s reranked, after it’s got the set of results, it passes them to the generative summarization engine to produce the summary. There are different flavors of the summarization: a ‘normal’ model that we think is usually pretty good for our Growth users, an a more advanced model that generally produces better results for our Scale customers. (If you’re a Growth account, you’re not using the reranker and you’re not using the more advanced generative model. Normally we’ve seen these still produce pretty good results, but we have seen some issues like this on occasions.)

It’s possible that the underlying unreranked search function is producing some results that are less relevant to the query. We’ve tried to tune the generative summarization to ignore things at that stage that are irrelevant in case of some mismatches, but there can be some statistical failures, though we seek to minimize those through a variety of tactics.

When we generate summaries, we also try to provide the summarization engine with some context to the summary: sentences before and after the relevant part of the given result, in case those help with the summary (they often do). You actually can control that behavior via the ContextConfig, but we haven’t yet added controls for that in the Console UI.

I wish there were a generic answer to this, but it’s very use case dependent, and likely requires looking at the data a bit to understand a bit more of the specific use case. It sounds like generally you have a kind of support use case where your users may be asking several different questions in a single thread and you have a support agent that’s responding to each of those in turn and you want to expedite their answers to your users?

We have some pretty high-level documentation on question-answer matching at https://docs.vectara.com/docs/common-use-cases/question-answer/question-answer-overview, but it sounds like you might have a bit messier data than a well-formatted “here’s a question and here’s an answer”. If at all possible, if you can reformat your data to provide question + answer in a slightly more structured form than “a bunch of questions + a bunch of answers”, it will typically work better. In the future, we plan on introducing a capability in the product to automatically extract the questions and associated answers for this type of “messy data” (through another LLM), but we haven’t yet implemented that.

Hi Shane!

Ok I will book a call to find out about the cost of the Scale plan. At this point, we just need to test to see if it works well enough for our use case.

Also, with the API, would we be able to do these things below?

  1. Somehow keep the Vectara citation out of the answer to the user, while still being able to reference the citation internally as an administrator.

  2. Somehow separate the individual results within the summary, so that our administrator can select some parts/citation-based results of the summary, to send to the user? But ignore parts of the summary that are not useful in a given context?

Because at this point for our use case, we’d need to have an administrator checking all the autogenerated answers, and adjusting them before sending to the user.

  1. Could we somehow have the system respond with an “I don’t know statement” if the question asked is not answered by our knowledge base?

  2. Lastly, are there any training features that we can use as Vectara users to train the model better for our use case?

Overall, I’m hiring a developer to help me with the implementation details, but I first wanted to get an idea of what’s possible with Vectara overall.

Thanks!
Gene

Somehow keep the Vectara citation out of the answer to the user, while still being able to reference the citation internally as an administrator.

You should be able to do that today by looking for a regular expression like \[\d\] in the summary and substituting it out for an empty string for the user. You could do that after your administrators reviewed the answers, for example, based on what I understand. Today, the citation format of the prompt is not controllable for our Growth users, but it is available to Scale customers.

Could we somehow have the system respond with an “I don’t know statement” if the question asked is not answered by our knowledge base?

This is the case today. The specific response today should be The returned results didn't have sufficient information to be coherently summarized into a useful answer for your query, please try restating your query differently. We have not made this response configurable to our Growth plan users but it is available to our Scale plan customers.

Since I’ve said that a few times and I’m probably sounding a bit like I’m trying to sell you on a Scale plan, I’d like to explain the logic of what we’ve made configurable so far.

Doing this type of generative response is extremely expensive to operate in general (not just for Vectara, but for anyone in the space). We think there’s enormous power in general in bringing generative responses to search, which is why we offer such a generous free tier (15k free queries + generative responses per month for our Growth plans) but we want to make sure we’re building a sustainable business model as well so we’re not a vendor that just disappears in a year due to running out of funding. We also want to make sure users don’t shoot themselves in the foot with overriding defaults that they don’t understand. That’s meant making sure the generative models that we offer to our free/Growth plans are pretty cookie-cutter so that we can optimize them. There are a lot of users that are fine with this cookie-cutter, and we’ll continue to improve it and make “additional cookie-cutters” available over time based on user need. But we do expect that some users will have special needs for their business to control the generative output, and we know those will be more expensive to operate and have more potential things to go wrong, so we want to make sure we have a conversation first

Hi Shane!

I booked a call for Friday with Aamir.

Can you take a look at this though? I put in a random question that would not be answered by our corpus, and it gave us this answer below (please see screenshot.)

Maybe vectara pulled it from the internet? But not sure how it got this answer from citation 1?

Please let me know your thoughts. Thanks!

Vectara doesn’t try to pull any information/data from the Internet, but the generative model we use does have a general knowledge of much of the Internet. We try to avoid situations like the one you’re seeing here by preventing the model from consider its existing knowledge. We’ve noticed some issues with this where very “well known information” that was once posted on the Internet that the generative model was trained on still has some tendency to come out in questions like this. We have some plans on how to address that for all users, but the model we’re using for Scale users is significantly more powerful at preventing this type of behavior.