Data Block Saving Policy in the Hadoop Architecture

Authors

  • Elisabeta Zagan
  • Mirela Danubianu Stefan cel Mare University of Suceava

Keywords:

Data Lake, On-Premises Data Lake; Hadoop framework; Name Node; Data Node; Secondary Name Node, Task Tracker, Job Tracker

Abstract

The most common technology for implementing an On-Premises Data Lake architecture is offered
by Apache through the Hadoop open-source framework that will be addressed in Chapter 2 in this research
paper. Also, some Cloud providers have turned to the Hadoop framework to offer the Data Lake storage
service. This article presents the Hadoop storage environment for implementing an On-Premises Data Lake
architecture using the Hadoop V1 framework. The Hadoop V1 architecture consists of two levels HDFS and
MapReduce. The HDFS level contains the following components: Name Node, Data Nodes, Secondary Name
Node and the MapReduce level based also on a master/slave architecture incorporates the components: Job
Trackers, Task Trackers. Hadoop storage system will be analysed in order to highlight its advantages and
disadvantages as well as to deepen some technical aspects that are part of this technology of storage and
analysis of large volumes of data in there raw format.

References

Sanjay, G.; Howard G.; Leung. S.-T. (2003). The Google file system. SIGOPS Oper. Syst. Rev. 37, 5, 29–43.

doi:https://doi.org/10.1145/1165389.945450

Turkington, G. & Modena, G. (2015). Big data con Hadoop, ISBN: 9788850333431

Ghazi, M. & Gangodkar, D. (2015). Hadoop, MapReduce and HDFS: a developers perspective. Procedia Computer Science.

45-50. 10.1016/j.procs.2015.04.108.

Correia, R.C.M.; Spadon, G.; De Andrade Gomes, P.H.; Eler, D.M.; Garcia, R.E.; Olivete Junior, C. (2018). Hadoop Cluster

Deployment: A Methodological Approach. Information 2018, 9, 131, DOI:10.3390/info9060131.

Uzunkaya, C.; Ensaria, T.; Kavurucub, Y. (2015). Hadoop Ecosystem and Its Analysis on Tweets, Procedia - Social and

Behavioral Sciences, Vol. 195.

Zagan, E. & Danubianu, M. (2021). HADOOP: A Comparative Study between Single-Node and Multi-Node Cluster,

International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 12 Issue 2, 2021. doi:

14569/IJACSA.2021.0120207

Downloads

Published

2021-06-30

Issue

Section

Contemporary Scientific and Technological Aspects towards an Entrepreneurial App