GSTF Journal on Computing (JoC)

, Volume 4, Issue 2, pp 1-6

First online:

Open Access This content is freely available online to anyone, anywhere at any time.

An Efficient DNA Molecule Clustering using GCC Algorithm

  • Faisal KholoodAffiliated withGeorge Washington University
  • , Alsaby AlnowaiserAffiliated withGeorge Washington University


Researchers in the biotechnology field have accomplished many achievements in the past century. They can now measure expression levels for thousands of genes, testing different conditions over varying periods of time. The analysis of the measurement results is essential to understand gene patterns and extract information about their functions and their biological roles. This paper describes a novel approach for clustering large-scale next-generation sequences (NGS). It also facilitates the process of predicting patterns and the likelihood of mutations based on a semi-supervised clustering technique. The process is based on the previously developed construction of FuzzyFind Dictionary utilizing the Golay Code for error correction. The introduced method is exceptional; it has linear time complexity with one passage through the file.


DNA RNA Gene Clustering Pattern Recognition Golay Code Big data