Information Extraction (IE) in Natural Language Processing (NLP) is a crucial technology that automatically extracts structured information from unstructured text data1. It involves identifying and extracting specific entities, relationships, and events to transform unstructured text data into structured data for analysis and various applications4.
Low-resource languages challenge Large Language Models (LLMs) due to the lack of both unlabeled text for pre-training and labeled data for fine-tuning. This scarcity of data makes it difficult for LLMs to perform well in these languages, resulting in lower accuracy and limited generalization capabilities.
Instruction tuning enhances LLMs by training them on a dataset of (INSTRUCTION, OUTPUT) pairs, improving their ability to follow instructions and generalize to new tasks. This technique bridges the gap between the next-word prediction objective in pre-training and the specific objectives of downstream tasks, resulting in better performance and controllability.