Data warehousing (CS614)
Assignment # 04 (GRADED)
Total marks = 20
Deadline Date = 10-August-2015
Please carefully read the following instructions before attempting the assignment.
It should be clear that your assignment would not get any credit if:
The assignment is submitted after due date.
The submitted assignment does not open or file is corrupt.
The assignment is copied. Note that strict action would be taken if the submitted assignment is copied from any other student. Both students will be punished severely.
1) You should consult recommended books to clarify your concepts as handouts are not sufficient.
2) You are supposed to submit your assignment in .doc or docx format. Any other formats like scan images, PDF, Zip, rar, bmp etc will not be accepted
3) You are advised to upload your assignment at least two days before Due date.
Assignment comprises of 20 Marks. Note that no assignment will be accepted after due date via email in any case (whether it is the case of load shedding or emergency electric failure or internet malfunctioning etc.). Hence, refrain from uploading assignment in the last hour of the deadline, and try to upload Solutions at least 02 days before the deadline to avoid inconvenience later on.
For any query please contact: CS614@vu.edu.pk
Question no. 1:
Consider the following table:
Player_ID Player_Name Team_ID Pool_ID
WI-06 Richordson WI A
PK-05 Misbah PK A
SA-07 AB Devillier SA B
AU-01 Steve Waugh AU B
PK-01 Hafiz PK A
AU-04 Maxwell AU B
WI-01 Ambrose WI A
SA-09 Dal Styn SA B
Consider the following query:
SELECT Count (Player_Team.Pool_ID) AS CountOfPool_ID
GROUP BY Player_Team.Pool_ID;
Answer the following questions:
1. You need to identify the number of clusters from this data.
2. Secondly, you have to identify whether the given clustering is one way or two way clustering. Your answer should support by valid reasons.
Question no. 2:
Consider the following tables:
Player_ID Player_Name Team
PK-01 Wasim Pakistan
PK-02 Misbah Pakistan
SA-03 AB Devillier South Africa
Award_ID Match_ID Player_ID
01 01 PK-01
01 02 PK-01
02 03 PK-02
01 04 SA-03
Consider the following query:
Select * from Player P, Award A where P.Team= ‘Pakistan’ and A.Award_ID = ‘01’ and P.Player_ID = A.Player_ID
Suppose this query is executed using Naive Nested-Loop join (i.e. there is no index created on both Player and Award tables). Mention that which table should be the Outer table to get minimum I/O by manually calculating the cost in both cases i.e. when “Player” is outer table and when “Award” is outer table.
Note: You need to mention the calculations in your solutions where required.
+ http://bit.ly/vucodes (Link for Assignments, GDBs & Online Quizzes Solution)
+ http://bit.ly/papersvu (Link for Past Papers, Solved MCQs, Short Notes & More)+ Click Here to Search (Looking For something at vustudents.ning.com?) + Click Here To Join (Our facebook study Group)
lec # ??????
any body help regarding question 1
anyone with solution Please.
Cluster index is lecture 27..............
its one way as it performed on data records(rows) however two cluster performed on both column and rows
i think there is another cluster which is on Team_ID
Q1: It is one way clustering and there are two nos of clustered index in this data
complete table ko outer rakhna hai ya quilifying rows ko. Kindly reply
qualifying rows ko hi lena hai