CS614 Short Notes - CS614 Short Questions Answers
Why both aggregation and summarization are required? - Data ware housing
Briefly describe snowflake schema - Data Ware housing
Difference between Low granular and high granular - Data Ware housing
CDC time stamping triggers and Portion, which is the best? tell reason
Aggregate or hardware which is best to enhance the DWH.
Factors behind poor data Quality - Data Ware Housing
Differentiate between MOLAP and ROLAP implementation - Data Ware Housing
How Cube is created in ROLAP? (CS614 - Data Warehousing)
How does aggregates awareness helps the users? (CS614 - Data Warehousing)
Timestamp - (CS614 - Data Ware housing )
Classification Process and Accuracy Measurement - ( CS614 - Data Ware Housing)
Data parallelism - Data Ware housing
Purposes of Data Profiling - Data Ware housing
Real life Examples of Clustering - Data Warehousing
Explain the Additive and non-additive (Data Warehousing)
Clustering and Association Rules
Reason to summarization during data transformation
One-to-One Transformation and One-to-many Transformation
Q. Describe the purposes of Data Profiling.
Data profiling is a powerful method to have an idea about the quality of data. While profiling data we need to run queries to identify:
• Inconsistencies in date formats
• Missing values of dates
• Violations in business rules
Reference: CS614 - Date Ware Housing - Handouts Page No. 477
Below are real examples of Clustering
Discovering distinct groups in customer databases, such as customers who make lot of long-distance calls and don’t have a job. Who are they? Students. Marketers use this knowledge to develop targeted marketing programs.
Identifying groups of crop insurance policy holders with a high average claim rate. Farmers crash crops, when it is “profitable”.
Identification of areas of similar land use in a GIS database.
Identifying probable areas for oil/gas exploration based on seismic data.
Reference : CS614 - Data Warehousing - Handouts Page No. 264
Additive facts are those facts which give the correct result by an addition operation.
Examples of such facts could be number of items sold, sales amount
Non-additive facts can also be added, but the addition gives incorrect results.
Examples of non-additive facts are average, discount, ratios etc.
Ref: Handouts Page No. 104
Identify outlier records using clustering based on Euclidian (or other) distance. Existing clustering algorithms provide little support for identifying outliers. However, in some cases clustering the entire record space can reveal outliers that are not identified at the field level inspection. The main drawback of this method is computational time. The clustering algorithms have high computational complexity. For large record spaces and large number of records, the run time of the clustering algorithms is prohibitive.
Association rules with high confidence and support define a different kind of pattern. As before, records that do not follow these rules are considered outliers. The power of association rules is that they can deal with data of different types. However, Boolean association rules do not provide enough quantitative and qualitative information.
Ref: Handouts Page No. 146
The reason for this is to make transformation of data easy, and to be able to use a wide. In this term describe programs for transforming data for a grocery chain, sales data at the lowest level of detail for every transaction at the checkout may not be needed. Storing sales by product by store by day in the data warehouse may be quite adequate. So, in this case, the data transformation function includes summarization of daily sales by product and by store.
Reference: CS614 Handouts Page No. 136
definitions by umair saulat,
• It is Simple scalar transformation is a one-to-one mapping from one set of values to another set of values
• it is sufficient to ensure that the transformation is one-to-one.
• it provides a design environment for creating data transformation applications.
• The transformation functions are polynomials.
• A one-to-many transformation is more complex than scalar transformation
• It is data element form the source system results in several columns in the DW
• Code generation can also create transformation in easy-to-maintain computer languages such as Java or XSLT.
• a data transformation converts data from a source data format into destination data.