Hey there @Ed_Moore – thanks for the question! I think you’ve run into a bug on our side that we’ll work to get fixed, but also I think I can get you unblocked anyway in the interim. I’ll explain what bug I think you’ve run into, and what we’re working on at the end of this post.
To get unblocked, I think the fastest thing is to answer the question: Do you want to have the title be a “hard filter” – as in something you can do an exact match for or something you want to be able to do semantic matching against or both? That will determine how you want to model your data.
Exact Match Case / Both
Create a new corpus. When you’re adding metadata filters, add documentTitle
instead of Title
on the Document
level. Likewise, in your documents, change Title
in metadataJson
on the top-level of the document to documentTitle
. You’ll then be able to do exact matches against this field by filtering to doc.documentTitle = 'foo'
Semantic Search on Title Field
You don’t really need to do anything special in the metadata to set up the ability to do semantic searches on the title. You can/should remove the Title
field from metadataJson
at the Document
(top) level. What you can do for semantic title matching is to perform a search and then set part.is_title = true
in your filter metadata to restrict your search to just the title. That’s described here
What Happened
Unfortunately, a few different things. First thing is on the Vectara side: the metadata filters are case insensitive, and Vectara already has a field called title
internally that’s used for the semantic title. So when you created a field Title
, it looks like that’s preventing Vectara from matching the field you’ve created because it’s getting redirected to the title
field which isn’t set up as metadata. There were already several protections to try to prevent this type of conflict: corpus creation won’t let you create filters with 2 different metadata fields of different casing. The file upload API also generally has some protections on it to try to prevent some field ambiguity. But I don’t think these took Title
vs title
into account correctly, so look to make a change to make that safer.
The second thing is that you have part.Title
, but that should be doc.Title
. Once you create a new corpus, you’ll want doc.documentTitle
if you still want strict filtering.
In the mean time, I think you will need to create a new corpus and put new documents with these changed metadata fields in it unfortunately.
I should also mention that we’re working on an API to allow you to change which metadata is filterable after corpus creation. That’s currently pre-GA, but available if you’d like to test it out and provide us feedback. Feel free to DM me if so, and we can get a quick chat set up