Hadoop Eco System
As you may aware Hadoop Eco system consists of so many open source tools. There is a lot of research is going on in this area now and everyday you would see a new version of an existing framework or a new framework altogether getting popular undermining the existing ones. Hence if you are a Hadoop developer you need to constantly gather current technological advancements, which happen around you.
As a start to understand the technological frameworks around, I myself tried to sketch a diagram to summarize some of the key open source frameworks and their relationship with their usage. I will try to evolve this diagram as much as I learn in the future and I will not forget to share the same with you all as well.
1. Feeding RDBMS data to HDFS via Sqoop
2. Cleansing imported data via Pig
4. Hive Data Warehouse schema’s are stored separately in a Hive Data Warehouse RDBMS Schema
6. Batch queries can be executed directly via Hive
Comments are closed.