Application of automatic mutation-gene pair extraction to diseases
Erdoğmuş, Müge (2007) Application of automatic mutation-gene pair extraction to diseases. [Thesis]
Nowadays, it is known that several inherited genetic diseases? such as sickle cell anemia, are caused by mutations in genes. In order to find ways to prevent and even better to circumvent occurrence of these diseases, knowledge of mutations and the genes on which the mutations occur is of crucial importance. Information on disease related mutations and genes can be accessed through publicly available databases or biomedical literature sources. However, acquiring relevant information from such resources can be problematic because of two reasons. Firstly manually created databases are usually incomplete and not up to date. Secondly reading through vast amount of publicly available biomedical documents is very time consuming. Therefore, there is a need for systems that are capable of extracting relevant information from publicly available resources in an automated fashion. This thesis presents the design and implementation of a system, MuGeX, that automatically extracts mutationgene pairs from MEDLINE abstracts for a given disease. MuGeX performs mainly three tasks. First task is identification of mutations, applying pattern matching in conjunction with a machine learning algorithm. The second task is identification of gene names utilizing a dictionarybased method. The final task is building relations between genes and mutations based on proximity measures. Results of experiments indicate that MuGeX identifies 85.9% of mutations that are on experiment corpus at 95.9% precision. For mutationgene pair extraction, we focused on Alzheimer’s disease. We observed that 88.9% of mutationgene pairs retrieved by MuGeX for Alzheimer’s disease are correct.
Repository Staff Only: item control page