Arın, İnanç (2012) Identification of anonymous users in Twitter. [Thesis]
PDF
InancArin_440013.pdf
Download (453kB)
InancArin_440013.pdf
Download (453kB)
Official URL: http://192.168.1.20/record=b1493231 (Table of Contents)
Abstract
Users may have multiple profiles when writing comments, blogs, and tweets on the web. While some of these profiles reveal true identity, the others are created under pseudonyms. This is essential especially in the countries with oppressive governments where activists are writing pseudonymous tweets or Facebook messages. In these countries, government offcials discovering the fact that a person is among the activists may have serious consequences, the activist being imprisoned, or even his or her life being jeopardized. Pseudonyms may provide a sense of anonymity, however the writing patterns of an author can provide clues that can be used to link the pseudonymous account to the public account. More specifically, one can look at some features within the text whose author is known, and build a model by using these features to predict whether a given (supposedly) anonymous text belongs to that author or not. In this work, we first demonstrate that a person can be identified as being part of a group by using his/her tweets. We used twitter since it is a popular platform, but the problem is not specific to twitter. We show that through tweets, an adversary can build a classifier from public tweets of known users to match them with pseudonymous twitter accounts. We use a simple vector-space model with tf-idf weights to represent documents and a Naive-Bayes classifer with cosine similarity measure. We show that the problem of matching public and pseudonymous accounts exists in twitter through experiments with real data. We also provide a formalism to describe the problem and based on the formalism we provide a solution to protect the privacy of individuals who would like to stay anonymous when writing tweets.
Item Type: | Thesis |
---|---|
Uncontrolled Keywords: | Confidentiality. -- Multi classification. -- Privacy. -- Multiple users systems. -- User identification. -- Gizlilik. -- Çoklu sınıflandırma. -- Privacy. -- Çok kullanıcılı sistemler. -- Kullanıcı belirleme. |
Subjects: | Q Science > QA Mathematics > QA076 Computer software |
Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng. Faculty of Engineering and Natural Sciences |
Depositing User: | IC-Cataloging |
Date Deposited: | 29 Sep 2014 00:03 |
Last Modified: | 26 Apr 2022 10:02 |
URI: | https://research.sabanciuniv.edu/id/eprint/24604 |