Fayoumi, Kenan (2022) Developing tools to support the search needs of news readers and news writers. [Thesis]
PDF
10439521_Fayoumi_Kenan.pdf
Download (6MB)
10439521_Fayoumi_Kenan.pdf
Download (6MB)
Abstract
The ongoing digitization of online news has changed and democratized the industry of news writing. The huge increase in the number of news sources has called for research on automated methods that link relevant news articles or entities that provide background information and enhance the reader’s experience. In this work, we tackle three different tasks in the context of news articles: Wikification, Entity Ranking, and Background Linking. The work done on these tasks is in alignment with the tasks in News Track of Text REtrieval Conference (TREC). In Wikification, we detect a list of mentioned entities in articles, link them to their corresponding Wikipedia entry and rank the list of entities in terms of relevance to the article. Standalone Entity Ranking task is only concerned with the final ranking step of Wikification where the list mentioned entities are given. As for Background Linking, the task is to retrieve and rank a list of relevant articles given a query news article. Our proposed solutions for these tasks are oriented towards deep modelling and using vector representations to estimate similarity and relevance. For Entity Ranking, we encode news articles and entities using Doc2Vec then use proximity between the pair to rank entities. As for Wikification, we use transformer-based architectures for detecting entity mentions and encoding mentions and entities into vector representations. These vectors are used for candidate retrieval and ranking as part of the entity linking pipeline. In Background Linking, we again use a transformerbased language model to encode news articles and fine-tune it for relevance ranking between articles. For evaluation, we compare our approaches to classic information retrieval systems to analyze the quality or increase in performance brought by using deep complex architectures. Using Doc2Vec and Cosine similarity to measure relevance in a setting of perfect entity linking yields high performances. In Wikification, encoding mentions and performing dense vector search for candidate retrieval performs on-par with baseline. However, using contextual encoding for candidate entity ranking significantly improves the Wikification performance. The transformer-based re-ranker used in Background Linking does not improve over full-text search baseline but shows promising improvements in results when provided with more data for fine-tuning.
Item Type: | Thesis |
---|---|
Uncontrolled Keywords: | Wikification. -- Entity Linking. -- Entity Ranking. -- Background Linking . -- Information Retrieval. -- vikifikasyon. -- varlık sıralaması. -- geçmiş bağlantısı. -- doğal dil işleme. - bilgi getirimi ve çıkarımı. |
Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics > TK7885-7895 Computer engineering. Computer hardware |
Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng. Faculty of Engineering and Natural Sciences |
Depositing User: | Dila Günay |
Date Deposited: | 21 Jun 2022 10:03 |
Last Modified: | 21 Jun 2022 10:03 |
URI: | https://research.sabanciuniv.edu/id/eprint/42953 |