Data Interview Question | What is a Data hub ?

Hello Friends, Last month I recieved an email from one of my reader saying why you are not write on data related questions. I replied saying that I shall start soon. Today is a very good day to start the data related question. I shall also cover the BIG Data as well as data warehousing question. Now the category I am using these kind of questions as data interview questions. So today lets start with the data hub.

What is a data hub?-

A data hub basically is a collection of data (consider this as Big Data Management) from multiple sources. It uses a Hadoop platform as the central data repository. The idea of an data hub is to provide an organization with a centralized data source that can quickly provide users with the information they need.

The data distribution normally happens in the form of a hub and spoke architecture. The spoke-hub distribution paradigm is a form of transport topology optimization in which traffic routes are organized as a series of ‘spokes’ that connect outlying points to a central ‘hub.’

Enterprise data hubs are designed to address the challenge of data that is ballooning in volume, variety and velocity (3Vs). This term is largely associated with Cloudera and MapR.

Data Hub notes-
1. Data from all the sources data is moved to one place.
2. Data is (at least partially) harmonized as it is moved.
3. Data is indexed in the harmonized form for efficient access and analysis.

Data hubs products avaible are-

1. Apache Hadoop
2. Avoiding Mass Extinctions Engine
4. DataHub
5. DataMarket
6. Dataverse
7. DSpace
8. MarkLogic
9. OpenDataSoft
10. Quandl
11. QuickCode
12. Socrata


