March 3, 2012

Approach of parallelism used in Data warehouse and OLAP

Approach of parallelism used in Data warehouse and OLAP

(Download Reference Research Paper)

Processor grouping is used to divide and allocate queries to run parallel and data is divided an assigned to different processors as well.

In fact parallelism is implemented in two phases:

i) A processor group partitioning (d+1 groups)
ii) A physical data placement phase (allocate data fragments to individual processors)

These two phase works with the help of followings:

1) Data Warehouse Physical Design (w.r.t. DI)
2) DataIndexing
3) Processor Partitioning (d+1)
4) Tables Partitioning into fact table & dimension tables
5) Appropriate use of BDI & JDI
6) Data Placement (Star Schema)

It is explained in the research paper that the size of database is not the only issue facing by the DBA and it can not be resolved only by increasing I/O facility. Quick look into these bulleted points will give an idea that parallelism works.

Data Warehouse Physical Design (w.r.t. DI) & DataIndexing:

DataIndexing helped not only in indexing the columns or rows but stored the data as well. Instead of just indexing it provides facility to reach the desired record in less time. To get optimum results it is suggested to design data warehouse according to dataindexing. If tables will be structured to support dataindexing like these tables characteristics has been taken care of,

i) Primary Keys Uniqueness (Capitalization if required)
ii) Foreign Key Selection
iii) Dimension Tables and Sub-Tables
iv) Sub-Tables (As smaller as good to be fit in memory)

Processor Partitioning (d+1)

For processor portioning following things taken care of

The smallest number such that dimension table can be fit in the aggregate memory of processor group is being chosen with the formula

Now number with the proportional to data volume stored

If more processors are available then we will divide the load among them by
i=1 to d+1

Data Placement (Star Schema)

As tables are designed and partition according to the dataindexing now i-th dimension table and its associated JDI will be allocated to processor group i.

How & When tables partitioning?

If N is greater
If smallest number is being larger and out of aggregate memory of processor group so partition the i-th dimension table w.r.t. JDI which will help to overcome this situation.

If >1
Then partition horizontally the metric data.

But basic dataindexing is allocated to d+1 group.

Last updated: March 19, 2014