March 3, 2012

Improvements suggested for solution proposed in research paper “A Case for Parallelism in Data Warehousing and OLAP”

Improvements suggested for solution proposed in research paper “A Case for Parallelism in Data Warehousing and OLAP”

(Download Reference Research Paper)

As the solution in research paper is proposed to best fit the situation faced by data warehouses but as it is a proposal so few suggestions can be added to it as below:

1) Be careful while in data warehouse physical design process
e.g. primary key uniqueness, foreign key column/attribute selection etc.

2) De-clustering of tasks partitioning can be improved by estimating the actual/expected requirement of I/O facilities

3) OLAP will help in DSS much better if insertions will be corrected/validated and meaningful instead of
e.g. different spellings of the same word, different format for the same date etc.
this can be controlled in designing phase or even in transformation phase while in ELT

4) As we studies inter-query and intra-query parallelism, but I suggest inter-query parallelism because if a processor take 1 minute for a query so 4 queries will take 4 minutes. In contrast 4 processors will take 1 minute each but 4 queries will be executed parallel in one minute.

5) Try dimension tables as smaller as they can because if sub-tasks will take more time to be completed so aggregate efficiency will be affected.
e.g. if there are 10 queries and each take 1 minute to complete so total time will be 10minutes. But if we execute them parallel and an exception as well that a couple of queries are taking 2 minutes each so overall estimation is disturbed just because of sub-tasks are underestimated. It can be reason of size of dimension tables.

 

Last updated: March 19, 2014