Search accuracy for product data

Hi,

I’ve been experimenting with Vectara with product data for fashion and clothing items. I used the console to upload some JSON files (single file for each product) and test the search results.

Here’s an example of a JSON file:

{
  "documentId": "fashion-item-3",
  "title": "Elegant Affair Jumpsuit",
  "metadataJson": "{\"available-date\":\"05 May 2022\",\"price\":500,\"vendor\":\"Glamour Studio\",\"stars\":4.9}",
  "section": [
    {
      "title": "Description",
      "text": "An elegant jumpsuit with a deep V-neckline, a fitted waist, and wide-leg pants, perfect for a formal event or a night out."
    },
    {
      "title": "Clothing Category",
      "text": "Jumpsuit"
    },
    {
      "title": "Color",
      "text": "Black, Navy Blue"
    },
    {
      "title": "Pattern",
      "text": "Solid"
    },
    {
      "title": "Neckline",
      "text": "Deep V-Neck"
    },
    {
      "title": "Sleeve Style",
      "text": "Sleeveless"
    },
    {
      "title": "Length",
      "text": "Full Length"
    },
    {
      "title": "Ocassion",
      "text": "Formal, Night Out"
    }
  ]
}

If I search “jeans for a date night”, Vectara will return the above item because it matches the word “jeans” with the word “clothing”. But the above item is a jumpsuit.

I’d like to understand if there’s a better way to structure/reword the data to improve search results or is it simply because this use case is not well suited to Vectara?

Thank you,
Mikhaeel

Hello Mikhaeel,

When sections are indexed, the title and text are both searchable. In this case, it seems like that might not actually be desired. The title is actually describing the type of information. So I would suggest attaching it as section metadata as follows:

{
  "documentId": "fashion-item-3",
  "title": "Elegant Affair Jumpsuit",
  "metadataJson": "{\"available-date\":\"05 May 2022\",\"price\":500,\"vendor\":\"Glamour Studio\",\"stars\":4.9}",
  "section": [
    {
      "metadataJson": "{\"facet\": \"Description\"}",
      "text": "An elegant jumpsuit with a deep V-neckline, a fitted waist, and wide-leg pants, perfect for a formal event or a night out."
    },
    {
      "metadataJson": "{\"facet\": \"Clothing Category\"}",
      "text": "Jumpsuit"
    },
    {
      "metadataJson": "{\"facet\": \"Color\"}",
      "text": "Black, Navy Blue"
    },
    {
      "metadataJson": "{\"facet\": \"Pattern\"}",
      "text": "Solid"
    },
    {
      "metadataJson": "{\"facet\": \"Neckline\"}",
      "text": "Deep V-Neck"
    },
    {
      "metadataJson": "{\"facet\": \"Sleeve Style\"}",
      "text": "Sleeveless"
    },
    {
      "metadataJson": "{\"facet\": \"Length\"}",
      "text": "Full Length"
    },
    {
      "metadataJson": "{\"facet\": \"Ocassion\"}",
      "text": "Formal, Night Out"
    }
  ]
}

Make the facet filterable metadata: this allows you to use it in queries for maximum control over the matching process. Note: you must define filterable metadata at corpus creation time.

When you run queries against your product catalog, you can run against all the facets, or against specific facets only, e.g. where part.facet = 'Color'. Furthermore, when you receive results, they will have the facet metadata attached, so you understand what part of the item description you’re matching against.

Hello @aahmad. Thanks for your reply! I will try this and test to see how search performs with these changes.

Kind regards,
Mikhaeel