Data Block Saving Policy in the Hadoop Architecture
Keywords:
Data Lake, On-Premises Data Lake; Hadoop framework; Name Node; Data Node; Secondary Name Node, Task Tracker, Job TrackerAbstract
The most common technology for implementing an On-Premises Data Lake architecture is offered
by Apache through the Hadoop open-source framework that will be addressed in Chapter 2 in this research
paper. Also, some Cloud providers have turned to the Hadoop framework to offer the Data Lake storage
service. This article presents the Hadoop storage environment for implementing an On-Premises Data Lake
architecture using the Hadoop V1 framework. The Hadoop V1 architecture consists of two levels HDFS and
MapReduce. The HDFS level contains the following components: Name Node, Data Nodes, Secondary Name
Node and the MapReduce level based also on a master/slave architecture incorporates the components: Job
Trackers, Task Trackers. Hadoop storage system will be analysed in order to highlight its advantages and
disadvantages as well as to deepen some technical aspects that are part of this technology of storage and
analysis of large volumes of data in there raw format.
References
Sanjay, G.; Howard G.; Leung. S.-T. (2003). The Google file system. SIGOPS Oper. Syst. Rev. 37, 5, 29–43.
doi:https://doi.org/10.1145/1165389.945450
Turkington, G. & Modena, G. (2015). Big data con Hadoop, ISBN: 9788850333431
Ghazi, M. & Gangodkar, D. (2015). Hadoop, MapReduce and HDFS: a developers perspective. Procedia Computer Science.
45-50. 10.1016/j.procs.2015.04.108.
Correia, R.C.M.; Spadon, G.; De Andrade Gomes, P.H.; Eler, D.M.; Garcia, R.E.; Olivete Junior, C. (2018). Hadoop Cluster
Deployment: A Methodological Approach. Information 2018, 9, 131, DOI:10.3390/info9060131.
Uzunkaya, C.; Ensaria, T.; Kavurucub, Y. (2015). Hadoop Ecosystem and Its Analysis on Tweets, Procedia - Social and
Behavioral Sciences, Vol. 195.
Zagan, E. & Danubianu, M. (2021). HADOOP: A Comparative Study between Single-Node and Multi-Node Cluster,
International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 12 Issue 2, 2021. doi:
14569/IJACSA.2021.0120207
Downloads
Published
Issue
Section
License
Copyright (c) 2021 EIRP Proceedings
This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material
- for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.