At this point, that web crawler doesn’t provide for any authentication mechanism. I’m sure it’d be possible to modify the crawler to add some authentication but the code isn’t really set up to handle that right now. You’d need to modify web-crawler/crawler.py at main · vectara/web-crawler · GitHub by either importing a session or logging in and then persisting the session throughout the crawler instance. The crawler uses pyhtml2pdf for PDF generation, and that library may need some modifications as well, depending on how the authentication is set up.
In general, I suggest that folks use web crawling as a last resort. If you happen to have the source data in some other format (a json file, etc), it’s almost always better/faster/more robust to ingest that.