Saturday, September 21, 2019

DWH / BI / Data Science

Business Intelligence (BI) 

BI is a technology driven process to gathering, storing and analyzing the data in such a way to make a better business decision.

BI Solution help a company in making business decision faster, accurate and market oriented.

Data Warehouse / Data Mart (DW / DM)

To implement the BI solution, we need to design the data model in such a way that should be perfect for reporting ans analysis purpose. This type of databases are called DW/DM.

Data Mart is a subset DW that are design for a company's  specific department like (Finance, HR, Marketing...) whereas DW are design  for complete company point of view (to combine the data for all department of company) to get company level Reporting and Analysis.

There are two approach to store the data into DW/DM

  1. Dimensional Approach (For Analysis Purpose)
  2. Normalized Approach (For Reporting Purpose)
Dimensional Approach:

In this approach, the data is divided and stored into two form Dimension and Facts in the database.
The type of data modelling for Dimensional Approach is called Dimensional Modeling.
  • Facts : this is data the we need to analyze mathematically (numerical data that can be aggregated)
  • Dimension : this is the data that for which we need to analyze the fact data. 
Ex. If we need to analyze the product sales on the basis of Date & Product Name so here Date & product name are Dim and Product Sales is the measure come from fact data. 

Dimensional Modeling can be done in two form:
  1. Star Schema
  2. Snowflake Schema 
Star Schema: In the start schema design,  a single object (Fact Table) site in the middle and is rapidly connected to the other surrounding objects (dimension lookup table) like a star.

Snowflake Schema: The snowflake schema is a extension of star schema where each point of the star explodes into more points. Can say sometime we need to normalize the dimension to optimize the star schema by breaking the one dimension into multiple dimension that called snowflake schema. 

Advantage : Retrieving the data from DW is very quickly.
Disadvantage :  Adding the New data and updating the old data in DW is big challenge.  

Normalized Approach:

In this approach, the data is stored in DW in 3NF to easily insert, update and delete but data accessing is slow because of multiples joins can be require to perform a query.

* Data Vault (DV) Approach: This is hybrid data model approach (Star Schema + 3NF)

There are some limitation in Dim & 3NF data model approach (3NF down the query performance and Dim model  is not scalable). And nowadays companies focus on the EDW (Enterprise Data warehouse) where they can store the historical data with scalable data model that can
  • adapt the new data sources
  • absorb changes to existing source
  • adapt new business rules
So on that context DV approach is most valuable.

In this approach, the data is divided and stored into three type of table form:
  1. Hub : That have the Business Key
  2. Satellite : That contain the descriptive attribute for the Business Key
  3. Link : That have the relationship between Business Key & descriptive attribute Dimension data
Business Intelligence VS Data Science: Data science is an evolution of BI. 
  • BI can help to interpret the past data for reporting and Descriptive Analytics that means BI can help to tell what is happening and why is happening and cant be exactly tell what will happen in future. 
  • Data Science use the Machine Learning Technic to analyze the past data (trends or patterns) to make future predictions more accurately. Data Science is more used for Predictive Analytics or Prescriptive Analytics. 

No comments:

Post a Comment