Latest Activity In Study Groups

Join Your Study Groups

VU Past Papers, MCQs and More

We non-commercial site working hard since 2009 to facilitate learning Read More. We can't keep up without your support. Donate.


CS614 Short Notes - CS614 Short Questions Answers

Online Study Material , Short Notes, Handouts, Study Resource Links and much more

CS614 - Data Warehousing Notes

Why both aggregation and summarization are required? - Data ware housing

Briefly describe snowflake schema - Data Ware housing

Difference between Low granular and high granular - Data Ware housing

CDC time stamping triggers and Portion, which is the best? tell reason

Aggregate or hardware which is best to enhance the DWH.

Factors behind poor data Quality - Data Ware Housing

Differentiate between MOLAP and ROLAP implementation - Data Ware Housing

How Cube is created in ROLAP? (CS614 - Data Warehousing)

How does aggregates awareness helps the users? (CS614 - Data Warehousing)

Timestamp - (CS614 - Data Ware housing )

Classification Process and Accuracy Measurement - ( CS614 - Data Ware Housing)

Data parallelism - Data Ware housing

Purposes of Data Profiling - Data Ware housing

Real life Examples of Clustering - Data Warehousing

Explain the Additive and non-additive (Data Warehousing)

Clustering and Association Rules

Reason to summarization during data transformation

One-to-One Transformation and One-to-many Transformation

Views: 660

Replies to This Discussion

Why both aggregation and summarization are required? - Data ware housing

Although summarization and aggregation are sometimes used interchangeably

Summarization and aggregation are typically used for the following reasons:
They are required when the lowest level of detail stored in the data ware at a higher level than the detail arriving from the source. This situation of when data warehouse queries do not require the lowest level of detail or sometimes when sufficient disk space is not available to store all the time frame required by the data warehouse.
• They can be used to populate data marts from the data warehouse where mart does not require the same level of detail as is stored in the warehouse
• They can be used to roll up detail values when the detail is removed from warehouse

Briefly describe snowflake schema - Data Ware housing

Snowflake Schema:
Sometimes a pure star schema might suffer performance problems.
This can occur when a de-normalized dimension table becomes very large and penalizes
the star join operation.
Conversely, sometimes a small outer-level dimension table does
not incur a significant join cost because it can be permanently stored in a memory buffer.
Furthermore, because a star structure exists at the center of a snowflake, an efficient star
join can be used to satisfy part of a query. Finally, some queries will not access data from
outer-level dimension tables.
These queries effectively execute against a star schema that
contains smaller dimension tables. Therefore, under some circumstances, a snowflake
schema is more efficient than a star schema.

Source: CS614 - Data Warehousing - Handouts

Difference between Low granular and high granular - Data Ware housing

High Granular 

higher the level of aggregation of the fact table, fewer will be the number of dimensions can attach to the fact records.
The converse of this is surprising.

Low Granular
more granular the data, the more dimensions make sense.
data in any organization is the most dimensional.

Source: CS614 - Data Ware Housing - Handouts

CDC time stamping triggers and Portion, which is the best? tell reason

Q. CDC time stamping triggers, and portion which is the best? also tell the reason.

Some source systems might use range partitioning, such that the source tables are
partitioned along a date key, which allows for easy identification of new data.

Reason :-
if you are extracting from an orders table, and the orders table is partitioned by

week, then it is easy to identify the current week's data.

Source : CS614 - Data Ware housing - Handouts of virtual university of pakistan

Aggregate or hardware which is best to enhance the DWH.

Q. Aggregate or hardware which is best to enhance the DWH.

An aggregate is information stored in a data warehouse in summarized form. Aggregations are used to improve the performance of business intelligence tools - When queries run faster, they take up less processing time and the users get their information back more quickly.

• To exclude one or more dimensions when summarizing a fact table.
• To have one or more dimensions replaced by rolled up versions of themselves
• The goal of an aggregate program in a large data warehouse must be more than just improving performance.

Source: www.dama-nj.org/presentations/SyncSort%20DAMA_NJ_Apr24-rpv6.pdf

Factors behind poor data Quality - Data Ware Housing

Poor quality data the pandemic problem that needs addressing. The factor of poor quality data inability to match payroll records to the official employment records can cost millions in payroll overpayments to retirees, personnel on leave without pay status, and “ghost” employees. Inability to correlate purchase orders to invoices may be a major problem in unmatched disbursements. Resultant costs, such as payroll overpayments and unmatched disbursements, may be significant enough to warrant extensive changes in processes, systems, policy and procedure, and information system data designs.

Source : CS614 - Data Ware Housing - Handouts

Differentiate between MOLAP and ROLAP implementation - Data Ware Housing

MOLAP
• implemented with a multi-dimensional data structure.
• It is physically builds “cubes” for direct access
• ANSI SQL is not supported.
 proprietary file format of a multi-dimensional database (MDD

ROLAP
• Relational OLAP provides access to information
• ANSI SQL is supported.
• data is stored in a relational database (for example a star schema)
• The fact table is a way of visualizing as an “un-rolled” cube.

Source : CS614 -  Data Ware Houses - Handouts

How Cube is created in ROLAP? (CS614 - Data Warehousing)

Cube is a logical entity containing values of a certain fact at a certain aggregation level at an intersection of a combination of dimensions.

Source: CS614  - Data Ware housing - Virtual University of pakistan handouts

How does aggregates awareness helps the users? (CS614 - Data Warehousing)

The existence of aggregate tables does not help user queries. Users must access the summary tables, ideally automatically with awareness built into the BI tool or the RDBMS to force the query to be processed in the most efficient way. In recent years, BI vendors have debated if aggregate awareness belongs in the BI tool or in the RDBMS. Business Objects is the only BI tool that has built-in functionality to take advantage of RDBMS aggregate tables in a way that is seamless to users.

Source : http://hasustorm.com/books/English/Mcgraw.Hill.Osborne.Business.Obj...

Timestamp - (CS614 - Data Ware housing )

Q. What is timestamps?

Answer:
Timestamp specifies the time and date that a given row was last modified. If the tables in an operational system have columns containing timestamps, then the latest data can easily be identified using the timestamp columns.

Source : CS614 - Data Ware housing - Virtual university of Pakistan Handouts

Classification Process and Accuracy Measurement - ( CS614 - Data Ware Housing)

Q. What is Meant by classification process and How to measure accuracy of classification?

Answer:
Classification process actually set is divided into two parts, one is called test set and the other is called the training set. We pick the training set and a model is constructed based on known facts, historical data and class properties as we already know the number of classes. After building the classification model, every record of the test set is posed to the classification model which decides the class of the input record.

Source - CS614 - Virtual University Handouts

Data parallelism - Data Ware housing

Q. What is Data parallelism? explain with example.

Answer:
Data parallelism is simplest form of parallelization. The idea is that we have parallel execution of single data operation across multiple partitions of data.

Examples
we have a question that we want to select the number of accounts where balance is greater that 5,000$ and the open data is after first of June 2000. This account table and what we end up doing is say ok send the query to each query server and each query server then runs the query against a particular partition

Reference: Data Ware Housing - Handouts

RSS

© 2021   Created by + M.Tariq Malik.   Powered by

Promote Us  |  Report an Issue  |  Privacy Policy  |  Terms of Service