I wanted to ask what the recommended way of weighting by a timestamp (e.g. creation timestamp) is.
It seems like it may be possible to do this using custom dimensions, but as far as I can tell this means the weight of all results will be infinitely increasing.
Is it possible to weight timestamps in any other way that I’ve missed in the docs?
If you want to use recency as a ranking factor in the retrieval stage, then custom dimensions are the way to go.
as far as I can tell this means the weight of all results will be infinitely increasing.
Yes, this is a drawback. You can work around it by applying a sigmoidal function to the time dimension.
If you instead want to apply a hard time-based filter instead (e.g. only show results in the past week, or the past month, or from January 2022), then you should store epoch seconds in metadata and apply a filter expression (doc.pubdate > 1673845542).
Thanks for the ask @rbhalla ! Would love to know if it’s for wanting to apply “recency bias” or if there’s a different use case you had in mind. (We’ve heard a few such requests and are looking at the potential usage and considering prioritization.)
Do you mind expanding on how I would apply a sigmoidal function to the time dimension? This would help!
@shane, that is exactly right, I would like to weight documents that were published more recently. But my definition of “published” may not always align with indexing date. That’s why I was talking about a more generic timestamp field.