A benchmark study of clustering based record linkage methods

Uğurlu, Kerem (2009) A benchmark study of clustering based record linkage methods. [Thesis]

[thumbnail of KeremUgurlu.pdf] PDF
KeremUgurlu.pdf

Download (362kB)

Abstract

Record linkage (or record matching) tries to identify the records in datasets which represent the same entity. These entities could be people or any other entity of interest. In this study, there has been processed a benchmark of clustering algorithms used in record linkage was conducted. The reason for the interest was that with the rise of the machine learning, record linkage has been considered as a classification problem with two classes of matched and unmatched pairs. The pairs to be compared are the entries in the dataset with a possible reduction of comparisons to avoid the quadratic complexity. The reason for the need for the clustering benchmark is that the experiments are processed by assuming that the experimenter has substantial training data for the classification procedure so that he can proceed in a supervised fashion. However, this is usually not the case in real life scenarios. For that reason, in this benchmarking study, the main three clustering algorithms are applied on three different datasets which are selected with different characteristics on purpose.
Item Type: Thesis
Uncontrolled Keywords: Record. -- Record linkage. -- Clustering. -- Kayıt. -- Kayıt eşleştirme. -- Öbekleştirme.
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics > TK7885-7895 Computer engineering. Computer hardware
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng.
Faculty of Engineering and Natural Sciences
Depositing User: IC-Cataloging
Date Deposited: 05 Jul 2011 16:38
Last Modified: 26 Apr 2022 09:54
URI: https://research.sabanciuniv.edu/id/eprint/16595

Actions (login required)

View Item
View Item