The PATH procedure primarily involves training small-scale neural information retrieval models using few gold relevance labels. It creates fictitious document queries via a language model (LM) and optimizes the LM prompt to enhance the quality of synthetic data during the training process. This method has shown to improve trained models' performance, even outperforming larger models.
The PATH method uses language models to create training data by generating fictitious document queries and optimizing the LM prompt to improve synthetic data quality. This involves creating pairs of queries and passages, with the LM prompt being automatically optimized to enhance training data caliber.
The PATH method significantly enhances the performance of Information Retrieval (IR) models. It enables the training of small-scale neural IR models with as few as ten gold relevance labels, thereby outperforming larger models like RankZephyr and competing with RankLLama, even though these models have billions of parameters and are trained on datasets with over 100,000 labels. This improvement is achieved through the automatic optimization of prompts used by a language model (LM) to create fictitious document queries, which improves the quality of synthetic training data.