Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java

Nowadays with the technology revolution the term of big data is a phenomenon of the decade moreover, it has a significant impact on our applied science trends. Exploring well big data tool is a necessary demand presently. Hadoop is a good big data analyzing technology, but it is slow because the Job...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hoger Khayrolla Omar, Alaa Khalil Jumaa
Format:	Article
Language:	English
Published:	Sulaimani Polytechnic University 2019-05-01
Series:	Kurdistan Journal of Applied Research
Subjects:	Keywords: Big data, Data analysis, Apache Spark, Hadoop HDFS, Machine learning, Spark MLlib, Resilient Distributed Datasets(RDD).
Online Access:	https://kjar.spu.edu.iq/index.php/kjar/article/view/265
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1823861323525521408
author	Hoger Khayrolla Omar Alaa Khalil Jumaa
author_facet	Hoger Khayrolla Omar Alaa Khalil Jumaa
author_sort	Hoger Khayrolla Omar
collection	DOAJ
description	Nowadays with the technology revolution the term of big data is a phenomenon of the decade moreover, it has a significant impact on our applied science trends. Exploring well big data tool is a necessary demand presently. Hadoop is a good big data analyzing technology, but it is slow because the Job result among each phase must be stored before the following phase is started as well as to the replication delays. Apache Spark is another tool that developed and established to be the real model for analyzing big data with its innovative processing framework inside the memory and high-level programming libraries for machine learning, efficient data treating and etc. In this paper, some comparisons are presented about the time performance evaluation among Scala and Java in apache spark MLlib. Many tests have been done in supervised and unsupervised machine learning methods with utilizing big datasets. However, loading the datasets from Hadoop HDFS as well as to the local disk to identify the pros and cons of each manner and discovering perfect reading or loading dataset situation to reach best execution style. The results showed that the performance of Scala about 10% to 20% is better than Java depending on the algorithm type. The aim of the study is to analyze big data with more suitable programming languages and as consequences gaining better performance.
format	Article
id	doaj-art-366ed084da514b2f849e09a8004eed94
institution	Kabale University
issn	2411-7684 2411-7706
language	English
publishDate	2019-05-01
publisher	Sulaimani Polytechnic University
record_format	Article
series	Kurdistan Journal of Applied Research
spelling	doaj-art-366ed084da514b2f849e09a8004eed942025-02-09T21:00:39ZengSulaimani Polytechnic UniversityKurdistan Journal of Applied Research2411-76842411-77062019-05-014110.24017/science.2019.1.2265Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and JavaHoger Khayrolla Omar0Alaa Khalil Jumaa1Technical College of Informatics, Sulaimani Polytechnic University, Sulaimani \| Kirkuk University, Kirkuk, IraqDatabase Technology Department, Technical College of Informatics, Sulaimani Polytechnic University, Sulaimani, IraqNowadays with the technology revolution the term of big data is a phenomenon of the decade moreover, it has a significant impact on our applied science trends. Exploring well big data tool is a necessary demand presently. Hadoop is a good big data analyzing technology, but it is slow because the Job result among each phase must be stored before the following phase is started as well as to the replication delays. Apache Spark is another tool that developed and established to be the real model for analyzing big data with its innovative processing framework inside the memory and high-level programming libraries for machine learning, efficient data treating and etc. In this paper, some comparisons are presented about the time performance evaluation among Scala and Java in apache spark MLlib. Many tests have been done in supervised and unsupervised machine learning methods with utilizing big datasets. However, loading the datasets from Hadoop HDFS as well as to the local disk to identify the pros and cons of each manner and discovering perfect reading or loading dataset situation to reach best execution style. The results showed that the performance of Scala about 10% to 20% is better than Java depending on the algorithm type. The aim of the study is to analyze big data with more suitable programming languages and as consequences gaining better performance. https://kjar.spu.edu.iq/index.php/kjar/article/view/265Keywords: Big data, Data analysis, Apache Spark, Hadoop HDFS, Machine learning, Spark MLlib, Resilient Distributed Datasets(RDD).
spellingShingle	Hoger Khayrolla Omar Alaa Khalil Jumaa Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java Kurdistan Journal of Applied Research Keywords: Big data, Data analysis, Apache Spark, Hadoop HDFS, Machine learning, Spark MLlib, Resilient Distributed Datasets(RDD).
title	Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java
title_full	Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java
title_fullStr	Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java
title_full_unstemmed	Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java
title_short	Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java
title_sort	big data analysis using apache spark mllib and hadoop hdfs with scala and java
topic	Keywords: Big data, Data analysis, Apache Spark, Hadoop HDFS, Machine learning, Spark MLlib, Resilient Distributed Datasets(RDD).
url	https://kjar.spu.edu.iq/index.php/kjar/article/view/265
work_keys_str_mv	AT hogerkhayrollaomar bigdataanalysisusingapachesparkmllibandhadoophdfswithscalaandjava AT alaakhaliljumaa bigdataanalysisusingapachesparkmllibandhadoophdfswithscalaandjava

Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java

Similar Items