title   
  

Application of automatic mutation-gene pair extraction to diseases

Erdoğmuş, Müge (2007) Application of automatic mutation-gene pair extraction to diseases. [Thesis]

[img]PDF - Registered users only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
813Kb

Official URL: http://risc01.sabanciuniv.edu/record=b1221498 (Table of Contents)

Abstract

Nowadays, it is known that several inherited genetic diseases? such as sickle cell anemia, are caused by mutations in genes. In order to find ways to prevent and even better to circumvent occurrence of these diseases, knowledge of mutations and the genes on which the mutations occur is of crucial importance. Information on disease related mutations and genes can be accessed through publicly available databases or biomedical literature sources. However, acquiring relevant information from such resources can be problematic because of two reasons. Firstly manually created databases are usually incomplete and not up to date. Secondly reading through vast amount of publicly available biomedical documents is very time consuming. Therefore, there is a need for systems that are capable of extracting relevant information from publicly available resources in an automated fashion. This thesis presents the design and implementation of a system, MuGeX, that automatically extracts mutationgene pairs from MEDLINE abstracts for a given disease. MuGeX performs mainly three tasks. First task is identification of mutations, applying pattern matching in conjunction with a machine learning algorithm. The second task is identification of gene names utilizing a dictionarybased method. The final task is building relations between genes and mutations based on proximity measures. Results of experiments indicate that MuGeX identifies 85.9% of mutations that are on experiment corpus at 95.9% precision. For mutationgene pair extraction, we focused on Alzheimer’s disease. We observed that 88.9% of mutationgene pairs retrieved by MuGeX for Alzheimer’s disease are correct.

Item Type:Thesis
Uncontrolled Keywords:Disease. -- Mutation. -- Gene. -- Information extraction. -- Hastalık. -- Mutasyon. -- Gen. -- Bilgi çıkarımı
Subjects:T Technology > TK Electrical engineering. Electronics Nuclear engineering
ID Code:8519
Deposited By:IC-Cataloging
Deposited On:20 May 2008 16:27
Last Modified:26 Dec 2008 09:15

Repository Staff Only: item control page