This Repository consist different Big data Technology Projects and exercise assignments, which I practice myself. My main concern is to develop aptitude towards Big data. Learning data science, specifically deep learning on different frameworks gives me intuition about core value of data science that the ultimate thing is data. And, if neural network works on huge data, then need to learn the concern technologies. Where Hive is used instead of SQL. And, Spark is best tool for realtime data modeling and use of MLlib gives power of prediction and classification. There is also rumours about spark's new venture of adding deeplearing framwork in spark. But, until now (08/-7/2020) it does not arrive in spark 3 update.
- Hive - I worked with hive on VM. Here, you can find script files for data operations. On hive cheatsheet addition, you can get idea of basic operations reuqired for Hive.
- Spark with Scala - I used scala by concerning speed and stability of operation for RDD datasets. It contains more than 20+ mini projects on big data taken from open datasets. First time I am using scala, so please cut me some slack. Currently, I am working on these topic. Thank you.