Article

Journal of Engineering Research

, 3:33

First online:

Open Access This content is freely available online to anyone, anywhere at any time.

Strategic enhancement of the collaborative framework for novelty in retrieval from digital textual data corpus by deploying DPSC and RBWM algorithms for forensic analysis

  • Gowri ShanmugamAffiliated withDepartment of Information Technology, Sathyabama University Email author 
  • , Anandha Mala Ganapathy SankarAffiliated withDepartment of Computer Science and Engineering, Easwari Engineering College

Abstract

This paper proposes two advanced algorithms embedded into an integrated system; one is a Dynamic Path Selection Clustering (DPSC) algorithm for the document clustering and the other is the Rearward Binary Window Match (RBWM) algorithm for the user’s search engine. The DPSC algorithm is derived from the concept of Google’s crawler technique implemented in offline processing and the RBWM algorithm for search engine is derived by utilizing the techniques of other search algorithms. The proposed system is being accomplished for giving an appropriate data structure to the input dataset content. The dataset used as input is the Enron dataset, which is large in volume and unstructured. The system is designed with the help of integrating all the individual and independent units into a system by bringing them under one frame and the units are data preprocessing, document clustering, mapping of clusters and search engine. This system, with fine refining integrated frame, would likely evidence in a better way, since simple definition of the system for data retrieval affects the consistency of irrelevant information retrieval for evidencing to be increased. Though there are plenty of existing systems in forensic department with only simple definition of search engines, without any other processes the irrelevancy in retrieval is seen to a larger extent. Consequently, a design of this integrated system, which is automated in process by using the above well defined configured units, is proposed. This systematic approach is for adequate use of digital textual evidences, which assists in quicker crime identification rate. The outcomes of the proposed system are analyzed by obtaining the precision and recall values and comparing them with the results of Metasearch engines like Dogpile and Metacrawler, to test the efficacy in retrieval rate.

Keywords:

Data management document clustering Google’s Crawler preprocessing semantic