title   
  

Identification of anonymous users in Twitter

Arın, İnanç (2012) Identification of anonymous users in Twitter. [Thesis]

[img]PDF - Registered users only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
443Kb

Official URL: http://192.168.1.20/record=b1493231 (Table of Contents)

Abstract

Users may have multiple profiles when writing comments, blogs, and tweets on the web. While some of these profiles reveal true identity, the others are created under pseudonyms. This is essential especially in the countries with oppressive governments where activists are writing pseudonymous tweets or Facebook messages. In these countries, government offcials discovering the fact that a person is among the activists may have serious consequences, the activist being imprisoned, or even his or her life being jeopardized. Pseudonyms may provide a sense of anonymity, however the writing patterns of an author can provide clues that can be used to link the pseudonymous account to the public account. More specifically, one can look at some features within the text whose author is known, and build a model by using these features to predict whether a given (supposedly) anonymous text belongs to that author or not. In this work, we first demonstrate that a person can be identified as being part of a group by using his/her tweets. We used twitter since it is a popular platform, but the problem is not specific to twitter. We show that through tweets, an adversary can build a classifier from public tweets of known users to match them with pseudonymous twitter accounts. We use a simple vector-space model with tf-idf weights to represent documents and a Naive-Bayes classifer with cosine similarity measure. We show that the problem of matching public and pseudonymous accounts exists in twitter through experiments with real data. We also provide a formalism to describe the problem and based on the formalism we provide a solution to protect the privacy of individuals who would like to stay anonymous when writing tweets.

Item Type:Thesis
Uncontrolled Keywords:Confidentiality. -- Multi classification. -- Privacy. -- Multiple users systems. -- User identification. -- Gizlilik. -- Çoklu sınıflandırma. -- Privacy. -- Çok kullanıcılı sistemler. -- Kullanıcı belirleme.
Subjects:Q Science > QA Mathematics > QA076 Computer software
ID Code:24604
Deposited By:IC-Cataloging
Deposited On:29 Sep 2014 00:03
Last Modified:29 Sep 2014 00:03

Repository Staff Only: item control page