How to feed JSON data to Corpora to retrieve talents recommendations

Hi there,
I am currently using algolia search and what it is lacking from functionality that it did not understand the intent or skip the intent sometimes and did not respond back correctly.
So i have checked vectara and happy to see that it uses LLM model with vector hashing and can return recommendations.

I am looking for a desired result, but stuck in data preparation setup that what kind of data needs to be uploaded in Corpora in JSON format.

Here is my actual JSON.

{
    "first_name": "Sri",
    "last_name": "Siva",
    "tagline": "Award Winning Learning Consultant",
    "bio": "I have over 20 years progressive experience with large fortune 500 companies. Experience includes both US and offshore teams with robust records of success in achieving complex objectives and timelines. I am highly experienced in managing and implementing end-to-end, instructional design process-based methodology (analysis, design, development, and delivery) to create outstanding curriculum and training programs (ILT and CBT/WBT). I was awarded 2nd prize in the OxTalent competition for my work in converting an ILT to CBT at the University of Oxford. The prototype was so successful that Oxford University has proposed to take this project further to supporting the wider community.",
    "location": "",
    "roles": [
        "Course Developer",
        "eLearning Developer",
        "Instructional Designer",
        "Learning Technologist",
        "Learning Technologist - Other"
    ],
    "languages": [
        "English"
    ],
    "skills": [
        "Articulate",
        "Camtasia",
        "Captivate",
        "Dreamweaver",
        "Office 365",
        "Photoshop",
        "Snagit"
    ],
    "industries": [],
    "experiences": [],
    "companies": [
        "Amazon",
        "Google",
        "Ernst & Young",
        "Abbott",
        "American Society of Plastic Surgery",
        "Home Depot",
        "Walgreens",
        "World Health Organization",
        "Blue Cross Blue Shield",
        "Twilio",
        "Infor",
        "Coca Cola",
        "Motorola",
        "BMO Harris Bank",
        "US Cellular",
        "Vyaire",
        "Crowe Horwath",
        "Aflac",
        "Accenture",
        "Boy Scouts of America",
        "Cox Communication",
        "Cox Automotive",
        "Mars",
        "Franklin Templeton",
        "Facebook",
        "Fannie Mae",
        "Meta",
        "AT&T",
        "Uber",
        "Kaiser Permanente",
        "Analog Digital",
        "Columbus McKinnon",
        "Morton Buildings",
        "Clear Connect"
    ]
}

And i have converted the above JSON into the specific JSON format which Vectara understand.
But i am not sure if i have correctly formatted it or not.

Here is the ready JSON to upload on vectara.

{
  "documentId": "talent-002",
  "title": "Award Winning Learning Consultant",
  "description": "I have over 20 years progressive experience with large fortune 500 companies. Experience includes both US and offshore teams with robust records of success in achieving complex objectives and timelines. I am highly experienced in managing and implementing end-to-end, instructional design process-based methodology (analysis, design, development, and delivery) to create outstanding curriculum and training programs (ILT and CBT/WBT). I was awarded 2nd prize in the OxTalent competition for my work in converting an ILT to CBT at the University of Oxford. The prototype was so successful that Oxford University has proposed to take this project further to supporting the wider community.",
  "metadataJson": "{\"talent\":\"sri siva\"}",
  "section": [
	{
		"title": "Roles",
		"text": "Course Developer, eLearning Developer, Instructional Designer, Learning Technologist, Learning Technologist - Other",
		"metadataJson": "{\"section\":\"roles\"}"
	},
	{
		"title": "Languages",
		"text": "English",
		"metadataJson": "{\"section\":\"languages\"}"
	},
	{
		"title": "Skills",
		"text": "Articulate, Camtasia, Captivate, Dreamweaver, Office 365, Photoshop, Snagit",
		"metadataJson": "{\"section\":\"skills\"}"
	},
	{
		"title": "companies",
		"text": "Amazon, Google, Ernst & Young,Abbott,American Society of Plastic Surgery,Home Depot,Walgreens,World Health Organization,Blue Cross Blue Shield,Twilio,Infor,Coca Cola,Motorola,BMO Harris Bank,US Cellular,Vyaire,Crowe Horwath,Aflac,Accenture,Boy Scouts of America,Cox Communication,Cox Automotive,Mars,Franklin Templeton,Facebook,Fannie Mae,Meta,AT&T,Uber,Kaiser Permanente,Analog Digital,Columbus McKinnon,Morton Buildings,Clear Connect",
		"metadataJson": "{\"section\":\"companies\"}"
	}
  ]
}

This JSON data successfully uploaded to corpora.
But upon querying on this data let’s say i have asked the vectara
“I am looking for instructional designer”

it gives me this result.

I am using the PHP to post request to vectara API.
here is my PHP code:

 $query_data = [
            'query' => $query,
            "start" => 0,
            'numResults' => 3,
            'corpusKey' => [
                [
                    'customerId' => $customer_id,
                    'corpusId' => $corpus_id,
                    // "metadataFilter" => "part.is_title = true",
                    "semantics" => "RESPONSE",
                    "lexicalInterpolationConfig" => [
                            "lambda" => 0
                    ]
                ],
            ],
            "summary" => [
                [
                    // "summarizerPromptName" => "vectara-summary-ext-v1.2.0",
                    "responseLang" => "en",
                    "maxSummarizedResults" => 2
                ]
            ]
        ];

What i want to achieve that when i asked to vectara that I am looking for instructional designer who speak spanish for example then it will return the recommended talent profiles.

And display the result in the form of Cards in which each card represent one talent profile which contains talent name, roles, tagline for that specific talent.

FYI: I have pulled the talent data from algolia index and i want to train that data into vectara to fetch recommendations.

I am stuck right now into data format setup, and looking for help in this regard.
Looking forward for quick response.

Thanks

Hi @Bill_C
Thanks for sharing the flow. A few questions:

  • are you looking for search functionality or trying to also get a summary based on the user query?
  • generally, I think your data preparation makes sense. From what I understand, you create here a JSON document per person. When you submit a query to Vectara, it responds with the top matching parts of one or more of these JSON documents, you can see how the responseSet look like here, and you can pull the document_id for each of the matches which uniquely identifies the overall source document (if it’s unclear let me know and I can provide more specific instructions how to do this).
  • If you already store the full profiles in a way you want to present them back to a user in your application, could you just look those up by the document_id and show the complete original JSON in the form required?

Please let me know if I am mis-understanding your need here. Happy to help further as needed.

Hi @ofermend
Thanks for your response.

I will try to answer and explain my scenario.
1 - Summary is not necessary in our scenario, if it gives then good for our results. I am not sure why its not giving the summary might be due to less data we have feed to corpous, We only feed two talent profiles. But again summary is not our priority.

2 - What we want to achieve that Algolia did not is that we need recommendations from vectara.
First lets examine the real example of our data.

{
  "documentId": "talent-profile-data-001",
  "title": "Edward O'neill",
  "section": [
    {
      "title": "first_name",
      "text": "Edward"
    },
	{
      "title": "last_name",
      "text": "O'Neill"
    },
	{
      "title": "tagline",
      "text": "Course Design, Staff Development, Facilitation, Screenwriting"
    },
	{
      "title": "bio",
      "text": "Over a decade of experience with everything from R1 institutions to start-ups and international corporations.\r\n\r\nI'm a design thinker and expert facilitator, supporter of problem-solving powers. \r\n\r\nHaving a Ph.D. and teaching experience, faculty feel comfortable with me. \r\n\r\nHaving a film school degree, I'm comfortable with visual communication.\r\n\r\nBeing a published author, I can write clearly for multiple audiences."
    },
	{
      "title": "location",
      "text": ""
    },
	{
      "title": "roles",
      "text": "Content Developer, Photographer, Videographer, Writer"
    },
	{
      "title": "languages",
      "text": "French"
    },
	{
      "title": "skills",
      "text": "Adobe Audition, Adobe Photoshop, Adobe Premiere Pro, Audition, BlackBoard, Blackboard Learn, Blended Learning, Canvas LMS, Higher Education, Instructional Design, Learning Technology, Micro-Learning, Photography (location and studio), Photoshop, Premiere, Sakai LMS, Script Writing, Scriptwriting for E-learning/Video, Video Production, Virtual Training, Writing"
    },
	{
      "title": "Industries",
      "text": "Education"
    },
	{
      "title": "Experiences",
      "section": [
			{
				"title": "Consultant",
				"metadataJson": "{\"detail\": \"\",\"end_date\": \"null\",\"start_date\": \"2016\/04\/01 00:00:00\",\"company_name\": \"Creative Learning Arts LLC\"}"
			},
			{
				"title": "senior instructional designer",
				"metadataJson": "{\"detail\": \"\",\"end_date\": \"2016\/03\/30 00:00:00\",\"start_date\": \"2014\/01\/07 00:00:00\",\"company_name\": \"Yale University\"}"
			},
			{
				"title": "senior consultant",
				"metadataJson": "{\"detail\": \"\",\"end_date\": \"2013\/12\/13 00:00:00\",\"start_date\": \"2010\/01\/04 00:00:00\",\"company_name\": \"University of Southern California\"}"
			},
			{
				"title": "academic technology specialist",
				"metadataJson": "{\"detail\": \"\",\"end_date\": \"2010\/12\/19 00:00:00\",\"start_date\": \"2007\/08\/14 00:00:00\",\"company_name\": \"Stanford University\"}"
			}
		]
    },
	{
      "title": "companies",
      "text": "Bryant University, Union College, NERCOMP, General Assembly, Pearson"
    }
  ]
}

And the second profile is:

{
  "documentId": "talent-profile-data-002",
  "title": "Danielle Anderson",
  "section": [
    {
      "title": "first_name",
      "text": "Danielle"
    },
	{
      "title": "last_name",
      "text": "Anderson"
    },
	{
      "title": "tagline",
      "text": "Instructional Designer"
    },
	{
      "title": "bio",
      "text": "Highly experienced designing eLearning and instructional-led curriculum. Experienced in both Adobe Captivate and Articulate Storyline, years of experience leading projects, and working multiple projects at once. Have written for a multitude of topics including sales, leadership, compliance, and system training. Solutions- oriented, innovative, highly flexible, and creative."
    },
	{
      "title": "location",
      "text": "Gila County, Arizona"
    },
	{
      "title": "roles",
      "text": "Consultant,Content Developer,Course Developer,Educational Consultant,Educator,eLearning Developer,Facilitator,Instructional Designer,Learning Architect,Learning Technologist - Other,Technical Writer,Training Coordinator"
    },
	{
      "title": "languages",
      "text": "English"
    },
	{
      "title": "skills",
      "text": "ADDIE,ADDIE Model,Adobe Audition,Adobe Captivate,Adobe Dreamweaver,Adobe Illustrator,Adobe Indesign,Adobe Photoshop,Agile,Articulate,Audition,Blended Learning,Captivate,Content Development,Content Management,Course Development,Curriculum Development,Documentation,Dreamweaver,eLearning,Employee Training,HTML,Illustrator,InDesign,Instructional Design,Instructor Led Training,Kirkpatrick Model,Leadership Training,Learning and People Leadership,Learning Management Systems (LMS),Learning Theory,Micro-Learning,Microsoft Sharepoint,Microsoft Visio,multiple Learning Management Systems (LMS) LMS,Online Training,People Leadership,Photoshop,Premiere,Program Management,SAP Litmos,SharePoint,Simulations,Strategic Planning,Technical Writing,trainer,Training Development,Virtual Classroom,Virtual Training,Visio,Vyond"
    },
	{
      "title": "Industries",
      "text": "Education, Finance, Healthcare & Wellness, Retail, Tech"
    },
	{
      "title": "Experiences",
      "section": [
			{
				"title": "Technical Instructional Designer",
				"metadataJson": "{\"detail\": \"\\u2022 Work with Curriculum Managers and SMEs to convert PowerPoints to eLearning courses using Articulate Storyline.\\r\\n\\u2022 Review current eLearning content and identify opportunities to make more interactive.\",\n            \"end_date\": \"null\",\n\"location\": \"Remote\",\n\"start_date\": \"2021-08-01\",\n\"company_name\": \"Microsoft (Contractor)\"}"
			},
			{
				"title": "Instructional Designer",
				"metadataJson": "{\"detail\": \"\\u2022 Create instructor-led and webinar training for leadership (all levels), New Hire Orientation,\\r\\nregulatory, soft-skills, and all corporate training needs.\\r\\n\\u2022 Consult with Key Stakeholders to identify learning needs, meet with Subject Matter Experts (SMEs) to put design and develop learning content for the learner (use ADDIE or Agile depending on project).\\r\\n\\u2022 Create various eLearning modules for all corporate team members using Captivate and Vyond.\\r\\n\\u2022 Create recordings in Audition using my own voice.\\r\\n\\u2022 Built a highly successful and engaging 3-day New Hire Orientation program from the ground-up.\\r\\n\\u2022 Assisted with vetting and implementing a new Learning Management System for the company (Litmos).\",\n            \"end_date\": \"August 2021\",\n            \"location\": \"\",\n            \"start_date\": \"2017-12-01\",\n            \"company_name\": \"Plexus Worldwide\"}"
			},
			{
				"title": "eLearning Technical Specialist (Instructional Design)",
				"metadataJson": "{\"detail\": \"\\u2022 Served as the lead designer and created content for three major software\/computer conversions.\\r\\n\\u2022 Responsible for creating and updating instructor-led training courses (leader guides, participant\\r\\nworkbooks, and job aids).\\r\\n\\u2022 Responsible for creating and updating eLearning tutorials for company-wide initiatives, financial\\r\\ncompliance training and new hire training programs.\\r\\n\\u2022 Manage all compliance training to include eLearning. Work with the Compliance Department on\\r\\na monthly basis to ensure tutorials are up to date and instructionally sound.\\r\\n\\u2022 Consult with key stakeholders and SMEs to identity training needs and the best course of learning\\r\\nfor various company projects, as well as manage these projects from beginning to end (analysis\\r\\nthrough implementation).\\r\\n\\u2022 Created a new hire sales training program that helped to increase sales by 20%.\\r\\n\\u2022 Designed a Business Accounts course which improved quality scores on business account\\r\\nsubmissions by 30%.\",\n            \"end_date\": \"2017\/12\/31 00:00:00\",\n            \"start_date\": \"2008\/01\/01 00:00:00\",\n            \"company_name\": \"Desert Schools Federal Credit Union\"}"
			},
			{
				"title": "Technical Writing & Instructional Design",
				"metadataJson": "{\"detail\": \"\\u2022 Created eLearning courses for several new company initiatives.\\r\\n\\u2022 Created and maintained all operation and call center procedures (print and web-based format).\\r\\n\\u2022 Developed leader guides, participant workbooks, and reference\/job aids for the new hire training\\r\\nprogram.\\r\\n\\u2022 Updated system documentation and release notes for the IT Department\\u2019s web-based loan\\r\\nsystems.\\r\\n\\u2022 Responsible for creating and maintaining an 800-page SharePoint Intranet Site that housed\\r\\ndepartment procedures, resource material, department news items and computer-based training.\\r\\n\\u2022 Designed and implemented several web-based forms and automated workflows in SharePoint.\",\n            \"end_date\": \"2008\/12\/31 00:00:00\",\n            \"start_date\": \"2003\/01\/01 00:00:00\",\n            \"company_name\": \"Nelnet\"}"
			},
			{
				"title": "Instructional Designer",
				"metadataJson": "{\"detail\": \"\\u2022 Responsible for developing all training curriculum (leader guides, participant workbooks, and job\\r\\naids) for all 400 nation-wide store employees.\\r\\n\\u2022 Created sales and soft skills training courses for employees and leaders.\\r\\n\\u2022 Assisted with developing training curriculum and activities for the annual leadership conferences.\\r\\n\\u2022 Created installation and user guides for a new point of sale\/cash register system.\",\n            \"end_date\": \"2002\/12\/31 00:00:00\",\n            \"start_date\": \"2000\/01\/01 00:00:00\",\n            \"company_name\": \"Leslie's Poolmart, Inc.\"}"
			},
			{
				"title": "Trainer, Instructional Designer, Technical Writer",
				"metadataJson": "{\"detail\": \"\\u2022 Created and maintained training material as well as on-line procedures for a 300-person virtual\\r\\ncall center.\\r\\n\\u2022 Participated in an implementation project that created an online help system for the call center\\u2019s\\r\\nnew GUI System.\\r\\n\\u2022 Led a virtual project team that streamlined the call center\\u2019s quality guidelines to reduce telephone\\r\\nhandle time.\\r\\n\\u2022 Assisted with developing training materials and procedures for two multi-million dollar system\\r\\nconversions.\\r\\n\\u2022 Responsible for training and developing course content for five student loan processing\\r\\ndepartments.\",\n            \"end_date\": \"2000\/12\/31 00:00:00\",\n            \"start_date\": \"1994\/01\/01 00:00:00\",\n            \"company_name\": \"Sallie Mae\"}"
			}
		]
    },
	{
      "title": "companies",
      "text": "Microsoft, Plexus Worldwide, Desert Schools Federal Credit Union, Nelnet Student Loans, Leslies Pool Supplies, USA Group / Sallie Mae"
    }
  ]
}

I am sorry for the long JSON putting here, i couldn’t fine attach option in this editor.
Ok Now if you examine the above two talent profiles. you will see that one speak french and one speak english.

I have only feed these two profiles to my corpous, and first let me know also that if this is the right JSON format to feed vectara to achieve the desire results.

Query: I am looking for instructional designer who speak french.
Now when the response return it highlight the Danielle profile first and on second it show Edward profile.
But it should prioritize the Edward profile on top as he speaks french.

Query: I am looking for videographer
Now this should only return Edward profile because videographer only exists in edward profile.
But it returns Both profiles.

The problem is we are not getting the desired result from what we have feed to vectara.

Our Use cases include:

  1. Search by using different filters
  2. Recommendations

Can you test the above two JSONs at your end with vectara and see what it returns.
Or let me know if there is a way to share my corpous with you.

I am looking forward for a quick reply as i have to tell my client either we should go with vectara or not.
Thanks

Hey Bill,
Thanks for the additional details. Here are some thoughts:

  1. The main reason you would use Vectara for this is to have very strong semantic search capabilities so for example if your query is “I am looking for videographer” you would get in response profiles that also have errors/typos like “videografer” in them or maybe “film-maker” or something semantically similar. This is where Vectara’s system is quite strong.
  2. In order to enable this, I would suggest a few changes to the way you currently structure the JSON. Syntactically it’s okay, but you may want to consider it making it designed to facilitate what you are trying to achieve.

A few thoughts here:

  • You use metadata (“metadataJson” field) quite extensively. Note that metadata is NOT used as part of the text search, rather it is designed to help with filtering. So for example let’s say we create a metadata field for each document (profile) called “language”, you might run a query with Vectara and add a filter condition that says “doc.language == ‘English’” which would result in only returning results of profiles that match that condition. The way I see your current example - it includes valuable text information in the metadata (e.g. “detail” in the experiences) which does not allow Vectara to use this information in the search.
  • Instead I would recommend maybe a simpler structure to start with, as follows (high level):
    A) Create a single document per profile as now.
    B) decide which fields you might want to filter by and define those as meta-data fields. For example, these could be languages, skills, industries, etc. Remember though these won’t be part of the actual text query; rather in the application you would add these as filter conditions (see Metadata Filters | Vectara Docs).
    C) The rest of the “Document” will include more sections, each with a focus on the “Text” piece (as opposed to title/metadata). For example you can construct a section around skills that goes something like this:
    {
    “title”: “Skills”,
    “text”: “Edward is skilled in Adobe Photoshop, Adobe Captivate and ADDIE…”
    }
    That way you describe in natural language what the data conveys.
    For skills it’s quite similar to the above JSON you provided, but for example in experiences there is no “text” section at all, so I would suggest making the change there too, for example
    {
    “title”: “instructional designer”,
    “text”: "Edward Worked with Curriculum Managers and SMEs to convert PowerPoints to eLearning courses using Articulate Storyline.\r\n\u2022 Review current eLearning content and identify opportunities to make more interactive.",\n "
    }
    and you can add here dates and/or location and/or other meta-data specific to this work experience.

For the example query you suggest: “I am looking for instructional designer who speaks french” - this can now be accomplished using a query of “I am looking for instructional designer” with a filter of doc.langage = “French”.

For the 2nd query “I am looking for a videographer” - Vectara will always return the number of results you request (if it has enough data) but note it provides a relevance score for each result. I would expect that the 2nd result in this case to have much lower relevance score and thus you can filter it out using that score.

Does that make sense?

Thanks @ofermend For your quick response.

Based on your context, i am not sure that this would be a good option to apply filters for this query: I am looking for instructional designer who speak french.
Now this is only one case that i have talked about, user can search like this as well
I am looking for Osha Trainer who lives in germany
who speak french
who worked in ABC company

So we cannot judge the user intent that what he want.
Also,
Can you share some example JSON based on my JSON data so i can get the point what you were saying in your last response.

Thanks
Looking forward

Bill, I’m thinking of something like this:

{
“documentId”: “talent-profile-data-002”,
“title”: “Danielle Anderson”,
“section”: [
{
“text”: “Danielle Anderson is an instructional designer living in Gila County, Arizona. She speaks English. Her bio is: Highly experienced designing eLearning and instructional-led curriculum. Experienced in both Adobe Captivate and Articulate Storyline, years of experience leading projects, and working multiple projects at once. Have written for a multitude of topics including sales, leadership, compliance, and system training. Solutions- oriented, innovative, highly flexible, and creative.”
},
{
“text”: “Danielle Anderson held the following roles: Consultant, Content Developer, Course Developer, Educational Consultant,Educator,eLearning Developer,Facilitator,Instructional Designer,Learning Architect,Learning Technologist - Other,Technical Writer,Training Coordinator”
},
{
“text”: “Danielle Anderson has the following skills: ADDIE,ADDIE Model,Adobe Audition,Adobe Captivate,Adobe Dreamweaver,Adobe Illustrator,Adobe Indesign,Adobe Photoshop,Agile,Articulate,Audition,Blended Learning,Captivate,Content Development,Content Management,Course Development,Curriculum Development,Documentation,Dreamweaver,eLearning,Employee Training,HTML,Illustrator,InDesign,Instructional Design,Instructor Led Training,Kirkpatrick Model,Leadership Training,Learning and People Leadership,Learning Management Systems (LMS),Learning Theory,Micro-Learning,Microsoft Sharepoint,Microsoft Visio,multiple Learning Management Systems (LMS) LMS,Online Training,People Leadership,Photoshop,Premiere,Program Management,SAP Litmos,SharePoint,Simulations,Strategic Planning,Technical Writing,trainer,Training Development,Virtual Classroom,Virtual Training,Visio,Vyond”
},
{
“text”: “Danielle Anderson worked in the following industries: Education, Finance, Healthcare & Wellness, Retail, Tech”
},
{
“title”: “Experiences”,
“section”: [
{
“text”: “Danielle Anderson worked as a Technical Instructional Designer since 2021-08-01 at Microsoft (Contractor) described as: Work with Curriculum Managers and SMEs to convert PowerPoints to eLearning courses using Articulate Storyline.\r\n\u2022 Review current eLearning content and identify opportunities to make more interactive”,
“metadataJson”: “{"end_date": "null",\n"location": "Remote",\n"start_date": "2021-08-01",\n"company_name": "Microsoft (Contractor)"}”
},
… [MORE like these for each experience]
{
“text”: “Danielle Anderson worked as a Trainer, Instructional Designer, Technical Writer between 1994/01/01 and 200/12/31 at Sallie Mae described as: Created and maintained training material as well as on-line procedures for a 300-person virtual\r\ncall center.\r\n\u2022 Participated in an implementation project that created an online help system for the call center\u2019s\r\nnew GUI System.\r\n\u2022 Led a virtual project team that streamlined the call center\u2019s quality guidelines to reduce telephone\r\nhandle time.\r\n\u2022 Assisted with developing training materials and procedures for two multi-million dollar system\r\nconversions.\r\n\u2022 Responsible for training and developing course content for five student loan processing\r\ndepartments.”,
“metadataJson”: “{ "end_date": "2000/12/31 00:00:00",\n "start_date": "1994/01/01 00:00:00",\n "company_name": "Sallie Mae"}”
}
]
},
{
“text”: “Danielle Anderson worked at these companies: Microsoft, Plexus Worldwide, Desert Schools Federal Credit Union, Nelnet Student Loans, Leslies Pool Supplies, USA Group / Sallie Mae”
}
]
}

(made it shorter than the original, but just to demonstrate the idea; I didn’t check full validity of the JSON).