Identification of anonymous users in Twitter

Arın, İnanç (2012) Identification of anonymous users in Twitter. [Thesis]

[thumbnail of InancArin_440013.pdf] PDF
InancArin_440013.pdf

Download (453kB)

Abstract

Users may have multiple profiles when writing comments, blogs, and tweets on the web. While some of these profiles reveal true identity, the others are created under pseudonyms. This is essential especially in the countries with oppressive governments where activists are writing pseudonymous tweets or Facebook messages. In these countries, government offcials discovering the fact that a person is among the activists may have serious consequences, the activist being imprisoned, or even his or her life being jeopardized. Pseudonyms may provide a sense of anonymity, however the writing patterns of an author can provide clues that can be used to link the pseudonymous account to the public account. More specifically, one can look at some features within the text whose author is known, and build a model by using these features to predict whether a given (supposedly) anonymous text belongs to that author or not. In this work, we first demonstrate that a person can be identified as being part of a group by using his/her tweets. We used twitter since it is a popular platform, but the problem is not specific to twitter. We show that through tweets, an adversary can build a classifier from public tweets of known users to match them with pseudonymous twitter accounts. We use a simple vector-space model with tf-idf weights to represent documents and a Naive-Bayes classifer with cosine similarity measure. We show that the problem of matching public and pseudonymous accounts exists in twitter through experiments with real data. We also provide a formalism to describe the problem and based on the formalism we provide a solution to protect the privacy of individuals who would like to stay anonymous when writing tweets.
Item Type: Thesis
Uncontrolled Keywords: Confidentiality. -- Multi classification. -- Privacy. -- Multiple users systems. -- User identification. -- Gizlilik. -- Çoklu sınıflandırma. -- Privacy. -- Çok kullanıcılı sistemler. -- Kullanıcı belirleme.
Subjects: Q Science > QA Mathematics > QA076 Computer software
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng.
Faculty of Engineering and Natural Sciences
Depositing User: IC-Cataloging
Date Deposited: 29 Sep 2014 00:03
Last Modified: 26 Apr 2022 10:02
URI: https://research.sabanciuniv.edu/id/eprint/24604

Actions (login required)

View Item
View Item