We are here with you hands in hands to facilitate your learning & don't appreciate the idea of copying or replicating solutions. Read More>>
+ Link For Assignments, GDBs & Online Quizzes Solution |
+ Link For Past Papers, Solved MCQs, Short Notes & More |
Dear Students! Share your Assignments / GDBs / Quizzes files as you receive in your LMS, So it can be discussed/solved timely. Add Discussion
How to Add New Discussion in Study Group ? Step By Step Guide Click Here.
Share Your Final Term Papers Here in Reply of this Discussion
May Allah Solve Our Problems and Give Us Success
You Can download papers by a simple click on file. All the uploaded filed are in Zip file or PDF format So Install Adobe Reader and Winrar Software’s to open these Files
Note: If you download files with Internet Download Manager (IDM), you would face problem of damage files. All files are correct and no damage file, If you face this problem, Open IDM > Click on Options > Click on File Types > Remove PDF and ZIP from files types and save. After it download again files, files will work properly.
You Can Download Solved Final Term Papers, Short Notes, Lecture Wise Questions Answers Files, Solved MCQs, Solved Quizzes , Solved Final Term Subjective Papers , Solved Final Term Objective Papers From This Discussion For Preparation Final Term Paper of Spring 2014
Or
For Important Helping Material related to this subject (Solved MCQs, Short Notes, Solved past Papers, E-Books, FAQ,Short Questions Answers & more). You must view all the featured Discussion in this subject group.
For how you can view all the Featured discussions click on the Back to Subject Name Discussions link below the title of this Discussion & then under featured Discussion corner click on the view all link.
Or visit this link
Tags:
+ How to Follow the New Added Discussions at Your Mail Address?
+ How to Join Subject Study Groups & Get Helping Material? + How to become Top Reputation, Angels, Intellectual, Featured Members & Moderators? + VU Students Reserves The Right to Delete Your Profile, If?.
+ http://bit.ly/vucodes (Link for Assignments, GDBs & Online Quizzes Solution)+ http://bit.ly/papersvu (Link for Past Papers, Solved MCQs, Short Notes & More)
+ Click Here to Search (Looking For something at vustudents.ning.com?) + Click Here To Join (Our facebook study Group)Q:1 Briefly explain any two types of precedence constraints that we can use in DTS.
Q:2 Time complexity of K-means algorithm is O(tkn) what does t,k,and n represents here?
Q:3 what are the problems you will face if low priority is given to cube construction?
Q:4 List down any two parallel software Architectures?
Q :5 what is unsupervised learning in Data mining?
Q:6 which scripting language are used to perform complex transformations in Data pakages?
Q :7 "Dense index consist of a number of bit vector" justifi it .
Q :8 It is essenstial to have a sub-matter expert as part of data modling team . what will be the implication if such expert is not present in organisation.
cs614 paper:
objective all most virtualians file few new but easy...
Suppose there is a large enterprise which uses the same server for the development and production environments. What problems can arise if it uses single server for both purposes? 5m
Write down any two drawbacks if “Date” is stored in text format rather than using proper date format like “dd-MMM-yy” etc. 5m
In context of Web data warehousing, consider the “web page” dimension, list at least five possible attributes of this dimension. 5m
There are different data mining techniques e.g. “clustering”, “description” etc. Each of the following statement corresponds to some data mining technique. For each statement name the technique the statement corresponds to. 5m
a) Assigning customers to predefined customer segments (i.e. good vs. bad)
b) Assigning credit applicants to predefined classes (i.e. low, medium, or high risk)
c) Guessing how much customers will spend during next 6 months
d) Building a model and assigning a value from 0 to 1 to each member of the set. Then classifying the members into categories based on a threshold value.
e) Guessing how much students will score more than 65% grades in midterm.
Specify at least one implication, if you don’t provide proper documentation as part of data warehouse development.3 m
In context of nested loop join, mention two guidelines for selecting a table as inner table. 3m
We can identify the Session in Word Wide Web by using “Time-contiguous Log Entries” however there are some limitations of this technique. Briefly explain any two limitations. 3m
Identify the given statement as correct or incorrect and justify your answer in either case.
"The problem of Referential Integrity always occurs in traditional OLTP system as well as in DWH". 3m
There are two primary techniques for gathering requirements i.e. interviews or facilitated sessions. Which technique is preferred by Ralph Kimball? 2m
List down any two Parallel Software Architectures?2m
List down any four Static Attributes recorded by the scouts in Agriculture Data Warehouse Case Study. 2m
List down any four issues of Clickstream Data. 2m
Remember for me and Best of luck...
Answer:
Answer: If we apply Run length Encoding on the input “11001100”, the output will be
SELECT*FROM R WHERE A= 5 page no.228
Btana tha k is men dense index sparse index B-tree index and bitmap index men se konsi technique use ho gi aur explain kerna tha ise
Bayesian modeling is an example of unsupervised learning” page no 270
Answer: incorrect. Bayesian modeling is an example of supervised learning
IF(ITEMS/TIME)>6
Then
Gender= female
Else
Gender= male esa ek question tha aur baki yad nahi but past men se the paper start wale lectures men se tha
Forward Proxy (2)
Answer: Ch#40 Page no: 369
The type of proxy we are referring to in this discussion is called a forward proxy. It is outside of our control because it belongs to a networking company or an ISP. When people talk about a proxy server (often simply known as a "proxy"), more often than not they are referring to a forward proxy. Let me explain what this particular server does. When one of these clients makes a connection attempt to that file transfer server on the Internet, its requests have to pass through the forward proxy first. A forward proxy is typically used in tandem with a firewall to enhance an internal network's security by controlling traffic originating from clients in the internal network that are directed at hosts on the Internet.
Drawbacks of waterfall model for DWH (3)
First and foremost, the project is likely to occur over an extended period of time, during which the users may not have had an opportunity to review what will be delivered.
Second, in today's demanding competitive environment there is a need to produce results in a much shorter timeframe.
in which scenario we can use waterfall (2)
The model is a linear sequence of activities like requirements definition, system design, detailed design, integration and testing, and finally operations and maintenance. The model is used when the system requirements and objectives are known and clearly specified.
my paper on 25 august/
how gender guide used.
If for very large number of records gender is missing, it would become impossible for us to manually check each and every individual‘s name and identify the gender. In such cases we can formulate a mechanism to correct gender. We can either use a standard gender guide or create a new table Gender_guide. Gender_guide contains only two columns name and gender. Populate Gender_guide table by a query for selecting all distinct first names from student table. Then manually placing their gender.
This table can serve us as guide by telling what can be the gender against this particular name. For example if we have hundred students in our database with first name equal to ‘Muhammad’. Then in our Gender guide table we will have just one entry ‘Muhammad’ and we will manually set the gender as ‘Male’ against ‘Muhammad’. Now to fill missing genders in exception table we will just do an inner join on Error table and Gender guide table.
run length encoding on these 2 ad-hoe the or output btana the .
Run length used in bitmap indexing
Output 1 may be
15#02# 18# (mean 1 come 5 time and 0 come 2 times and 1 come 1 8 times
(111110011111111))
Output 2 may be
11#01#11#
Output 3 may be
112#012#
Step of kimball approach for data life cycle.
Kimball Process. Four step approach. (Business process-->Grains-->Facts-->dimension). He defines a business process as a major operational process in the organization that is supported by some kind of legacy system (or systems). (Read "Business Development Lifecycle" see page#290)
Drawback of traditional web search. ch: 39 page 351
1. Limited to keyword based matching.
2. Cannot distinguish between the contexts in which a link is used.
3. Coupling of files has to be done manually.
Two ways of session describe in world wide web.
Identifying the Session
what problem when single server is used to both purpose 2 purpose dye the..
MCQs
execution will be terminated abnormaly.... (quiz 4 file- 2 MCQs)
kimballs approach ......driven ( quiz 4 file-5 mcqs)
pipeline per increase through .....( quiz 4 file- 1 mcq)
selectivity of query in olap... (queries must be executed in a small number of seconds.)
star schema simplify ...
majority of data ...fail if (Majority of projects fail due to the complexity of the development process.)
er is .......design (constituted to optimize OLTP performance)
survival of fittest is.....algorithm (Genetic Algorithms: These are based on the principle survival of the fittest. In these techniques, a model is formed to solve problems having multiple options and many values. Briefly, these techniques are used to select the optimal solution out of a number of possible solutions. However, are not much robust as can not perform well in the presence of noise.
shipy in kobol devlop .......( In 1972 the Mitsubishi Shipyards in Kobe developed a technique in which customer wants were linked to product specifications via a matrix format. Technique is known today as The House of Quality and is one of many techniques of Quality Function Deployment, which can briefly be defined as “a system for translating customer requirements into appropriate company requirements”. The purpose of the technique is to reduce two types of risk. First, the risk that the product specification does not comply with the wants of the predetermined target group of customers. Secondly, the risk that the final product does not comply with the product specification
Q:1 Briefly explain any two types of precedence constraints that we can use in DTS.
Answer: page 395
Precedence constraints sequentially link tasks in a package. In DTS, you can use three types of precedence constraints, which can be accessed either through DTS Designer or programmatically:
Unconditional: If you want Task 2 to wait until Task 1 completes, regardless of the outcome, link Task 1 to Task 2 with an unconditional precedence constraint.
On Success: If you want Task 2 to wait until Task 1 has successfully completed, link Task 1 to Task 2 with an On Success precedence constraint.
On Failure: If you want Task 2 to begin execution only if Task 1 fails to execute successfully, link Task 1 to Task 2 with an On Failure precedence constraint. If you want to run an alternative branch of the workflow when an error is encountered, use this constraint.
Q:2 Time complexity of K-means algorithm is O(tkn) what does t,k,and n represents here?
Page 281
Answer: Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations.
Normally, k, t n.
Q:3 what are the problems you will face if low priority is given to cube construction?
Answer: page 313
Low priority for OLAP Cube Construction: Make sure your OLAP cube-building or pre-calculation process is optimized and given the right priority. It is common for the data warehouse to be on the bottom of the nightly batch loads, and after the loading the DWH, usually there isn't much time left for the OLAP cube to be refreshed. As a result, it is worthwhile to experiment with the OLAP cube generation paths to ensure optimal performance.
Q:4 List down any two parallel software Architectures?
Answer: Shared Memory, Shard Disk and Shared Nothing
Q :5 what is unsupervised learning in Data mining?
Answer: page 27
Unsupervised learning where you don’t know the number of clusters and obviously no idea about their attributes too. In other words you are not guiding in any way the DM process for performing the DM, no guidance and no input. Unsupervised learning is closer to the exploratory spirit of Data Mining as stressed in the definitions given above. In unsupervised learning situations all variables are treated in the same way, there is no distinction between explanatory and dependent variables. However, in contrast to the name undirected data mining there is still some target to achieve. This target might be as general as data reduction or more specific like clustering. For unsupervised learning typically either the target variable is unknown or has only been recorded for too small a number of cases.
Q:6 which scripting language are used to perform complex transformations in Data packages?
Answer: Microsoft SQL Server provides graphical tools to build DTS packages. These tools provide good support for transformations. Complex transformations are achieved through VB Script or Java Script that is loaded in DTS package. Package can also be programmed by using DTS object model instead of using graphical tools but DTS programming is rather complicated.
Q :7 "Dense index consist of a number of bit vector" justify it .
Answer: Dense Index: Every key in the data file is represented in the index file. Bitmap index record (Value, Bit Vector): Bit Vector has one bit for every record in the file, ith bit of Bit Vector is set iff record i has Value in the given column. Bit vectors typically compressed. Converted to sets of rids during query evaluation.
Q :8 It is essential to have a sub-matter expert as part of data modeling team . What will be the implication if such expert is not present in organization?
Answer: It is essential to have a subject-matter expert as part of the data modeling team. This person can be an outside consultant or can be someone in-house with extensive industry experience. Without this person, it becomes difficult to get a definitive answer on many of the questions, and the entire project gets dragged out, as the end users may not always be available
==================================================================
Suppose there is a large enterprise which uses the same server for the development and production environments. What problems can arise if it uses single server for both purposes? 5m
To save capital, often data warehousing teams will decide to use only a single database and a single server for the different environments i.e. development and production. Environment separation is achieved by either a directory structure or setting up distinct instances of the database.
This is awkward for the following reasons:
• Sometimes it is possible that the server needs to be rebooted for the development environment. Having a separate development environment will prevent the production environment from being effected by this.
• There may be interference while having different database environments on a single server. For example, having multiple long queries running on the development server could affect the performance on the production server, as both are same.
Write down any two drawbacks if “Date” is stored in text format rather than using proper date format like “dd-MMM-yy” etc. 5m
In context of Web data warehousing, consider the “web page” dimension, list at least five possible attributes of this dimension. 5m
Page key
Page source
Page function
Page template
Item type
Graphic type
Animation type
Sound type
Page file name
There are different data mining techniques e.g. “clustering”, “description” etc. Each of the following statement corresponds to some data mining technique. For each statement name the technique the statement corresponds to. 5m
a) Assigning customers to predefined customer segments (i.e. good vs. bad) classification
b) Assigning credit applicants to predefined classes (i.e. low, medium, or high risk) classification
c) Guessing how much customers will spend during next 6 months prediction
d) Building a model and assigning a value from 0 to 1 to each member of the set. Then classifying the members into categories based on a threshold value. Estimation
e) Guessing how much students will score more than 65% grades in midterm. Prediction
Specify at least one implication, if you don’t provide proper documentation as part of data warehouse development.3 m
Usually by this time most, if not all, of the developers will have left the project, so it is essential that proper documentation is left for those who are handling production maintenance. There is nothing more frustrating than staring at something another person did, yet unable to figure it out due to the lack of proper documentation.
Another pitfall is that the maintenance phase is usually boring. So, if there is another phase of the data warehouse planned, start on that as soon as possible.
In context of nested loop join, mention two guidelines for selecting a table as inner table. 3m
For a Nested-Loop join inner and outer tables are determined as follows: page 242
The outer table is usually the one that has:
• The smallest number of qualifying rows, and/or
• The largest numbers of I/Os required to locate the rows.
The inner table usually has:
• The largest number of qualifying rows, and/or
The smallest number of reads required to locate rows
We can identify the Session in Word Wide Web by using “Time-contiguous Log Entries” however there are some limitations of this technique. Briefly explain any two limitations. 3m
Answer: A session can be consolidated by collecting time-contiguous log entries from the same host (Internet Protocol, or IP, address). In many cases, the individual hits comprising a session can be consolidated by collating time-contiguous log entries from the same host (Internet Protocol, or IP, address). If the log contains a number of entries with the same host ID in a short period of time (for example, one hour), one can reasonably assume that the entries are for the same session.
Limitations: • This method breaks down for visitors from large ISPs because different visitors may reuse dynamically assigned IP addresses over a brief time period.
• Different IP addresses may be used within the same session for the same visitor.
• This approach also presents problems when dealing with browsers that are behind some firewalls.
Identify the given statement as correct or incorrect and justify your answer in either case.
"The problem of Referential Integrity always occurs in traditional OLTP system as well as in DWH". 3m
Answer: While doing total quality measurement, you measure RI every week (or month) and hopefully the number of orphan records will be going down, as you will be fine tuning the processes to get rid of the RI problems. Remember, RI problem is peculiar to a DWH, this will not happen in a traditional OLTP system.
There are two primary techniques for gathering requirements i.e. interviews or facilitated sessions. Which technique is preferred by Ralph Kimball? 2m
Both have their advantages and disadvantages. Interviews encourage lots of individual participation. They are also easier to schedule. Facilitated sessions may reduce the elapsed time to gather requirements, although they require more time commitment from each participant. Kimball prefers using a hybrid approach with interviews to gather the gory details and then facilitation to bring the group to consensus.
List down any two Parallel Software Architectures? 2m
Brief Intro to Parallel Processing:
Parallel Hardware Architectures
Parallel Software Architectures
Types of parallelism
List down any four Static Attributes recorded by the scouts in Agriculture Data Warehouse Case Study. 2m
Static attributes |
Dynamic attributes |
Farmer name |
Date of visit |
Farmer address |
Pest population |
Field acre age |
CLCV |
Variety sown |
Predator population |
Sowing date |
Pesticide spray dates |
Sowing method |
Pesticides used |
List down any four issues of Clickstream Data. 2m
Issues of Clickstream Data: (Page#341)
Clickstream data has many issues:
Identifying the Visitor Origin
Identifying the Session
Identifying the Visitor
Proxy Servers
Browser Caches
===========================================================
objective was too much easy and almost from moaz file..
Subjective:
1. What is Web Data Warehouse? (2 marks)
Answer: Page no: 350 Chapter: 39
Web Warehousing can be used to mine the huge web content for searching information of interest. It’s like searching the golden needle from the haystack. Second reason of Web warehousing is to analyze the huge web traffic. This can be of interest to Web Site owners, for e-commerce, for e-advertisement and so on. Last but not least reason of Web warehousing is to archive the huge web content because of its dynamic nature.
2. What are the four issues of Clickstream Data? (2 marks) repeat
3. Write first two phases of Kimball's Aproach of business dimensional lifecycle. (2 marks)
Answer= Kimball also proposes a four-step approach where he starts to choose a business process, takes the grain of the process, and chooses dimensions and facts. He defines a business process as a major operational process in the organization that is supported by some kind of legacy system (or systems).
4. There are four categories of data quality improvement. Write any two. (2 marks)
Ans. The four categories of Data Quality Improvement
• Process
• System
• Policy & Procedure
• Data Design
Answer: Data profiling is a process which involves gathering of information about column through execution of certain queries with intention to identify erroneous records. In this process we identify the following:
We run different SQL queries to get the answers of above questions. During this process we can identify the erroneous records. Whenever we will come across an erroneous record, we will just copy it in error or exception table and set the dirty bit of record in the actual student table. Then we will correct the exception table. After this profiling process we will transform the records and load them into a new table
Student_Info
Ref: Handout Page No. 354
6. What are the drawbacks of Traditional Web Searchers? (3 marks) repeat
7. Apply Run length encoding on the given code and write output. (3 marks)
Case-I: 1111111110000111
Answer: 19#04#13
Case-II: 00001111000000
Answer: 04#14#06
8. Identify the given statement as correct or incorrect and justify your answer in either case. (3 marks)
"One-way clustering is used to get local view and Two-way clustering is used to get global view."
Answer: Incorrect
One-way clustering gives global view and bi-clustering gives local view
9. A pilot project strategy is highly recommended in data warehouse. What are the reasons for its recommendation? (5 marks)
Answer: A pilot project strategy is highly recommended in data warehouse construction, as a full blown data warehouse construction requires significant capital investment, effort and resources. Therefore, the same must be attempted only after a thorough analysis, and a valid proof of concept. A small scale project in this regard serves many purposes such as (i) Show users the value of DSS information, (ii) establish blue print processes for later full-blown project, (iii) identify problem areas and, (iv) reveal true data demographics. Hence doing a pilot project on a small scale seemed to be the best strategy.
10. Data acquisition and cleansing. (5 marks)
• The pest scouting sheets are larger than A4 size (8.5” x 11”), hence the right end was cropped when scanned on a flat-bed A4 size scanner.
• The right part of the scouting sheet is also the most troublesome, because of pesticide names for a single record typed on multiple lines i.e. for multiple farmers.
• As a first step, OCR (Optical Character Reader) based image to text transformation of the pest scouting sheets was attempted. But it did not work even for relatively clean sheets with very high scanning resolutions.
• Subsequently DEO’s (Data Entry Operators) were employed to digitize the scouting sheets by typing.
Data cleansing and standardization is probably the largest part in an ETL exercise. For Agri-DWH major issues of data cleansing had arisen due to data processing and handling at four levels by different groups of people i.e.
(i) Hand recordings by the scouts at the field level
(ii) typing hand recordings into data sheets at the DPWQCP office
(iii) photocopying of the scouting sheets by DPWQCPpersonnel and finally
(iv) data entry or digitization by hired data entry operators.
11. A table was given and asked to use bitmap indexes technique and make index tables for "TicketType" and "FlightNo". (5 marks)
12. 1 table dia hua tha us mein Name, item, time aur gender dia hua tha aur sath ye statement di hui thi. (5 marks)
IF
Items/Time >= 6
Then
Gender= ‘F’
else
Gender = ‘M’
a) Find the accuracy % of given data.
b) If Name: Ali, Items: 2, time: 14, then find the gender of Ali.
Answer: page 278
The model in our case is a rule that if the per item minutes for any customer is greater or equal than 6 than the customer is female else a male i.e.
The above rule is based on the common notion that females spend more time during shopping than male customers. Exceptions can be there and are treated as outliers.
Since for the first record the ration is greater than 6 meaning that our model will assign it to the female class, but that may be an exception or noise. The second and the third records are as per rule. Thus, the accuracy of our model is 2/3 i.e. .66%. In other words we can say the confidence level of our classification model is 66%. The accuracy may change as we add more data. Now unseen data is brought into the picture. Suppose there is a record with name Firdous, time spent 15 minutes and 1 item purchased. We predict the gender by using our classification model and as per our model the customer is assigned ‘F’ (15/1=15 which is greater than 6).
Best file to prepare current subjectives
best files to prepare MCQS other thn moaaz files.
My todays paper of CS614 with solution
Syntactically Dirty data: lexical errors, irregularities
Semantically dirty data: integrity constraint violation, business rule contradiction, duplication
Coverage anomalies: missing attributes, missing records
Some mcqs from my midterm paper. 2 underlined MCQs are also included in my final paper
Subjective:
Answer:
Shared nothing RDBMS architecture requires a static partitioning of each table in the database.
How do you perform the partitioning?
Answer:
Nested-Loop Join: Variants
1. Naive nested-loop join
2. Index nested-loop join
3. Temporary index nested-loop join
Answer: page 480
There are no fixed strategies to standardize the columns.
Answer:
Answer= Limitations
Answer:
Answer:
Requirements preplanning: This phase consists of activities like choosing the forum, identifying and preparing the requirements team and finally selecting, scheduling and preparing the business representatives.
Answer:
a) Assigning customers to predefined customer segments (i.e. good vs. bad) classification
b) Assigning credit applicants to predefined classes (i.e. low, medium, or high risk) classification
c) Guessing how much customers will spend during next 6 months prediction
d) Building a model and assigning a value from 0 to 1 to each member of the set. Then classifying the members into categories based on a threshold value. Estimation
e) Guessing how much students will score more than 65% grades in midterm. Prediction
Answer:
• The pest scouting sheets are larger than A4 size (8.5” x 11”), hence the right end was cropped when scanned on a flat-bed A4 size scanner.
• The right part of the scouting sheet is also the most troublesome, because of pesticide names for a single record typed on multiple lines i.e. for multiple farmers.
• As a first step, OCR (Optical Character Reader) based image to text transformation of the pest scouting sheets was attempted. But it did not work even for relatively clean sheets with very high scanning resolutions.
• Subsequently DEO’s (Data Entry Operators) were employed to digitize the scouting sheets by typing.
Data cleansing and standardization is probably the largest part in an ETL exercise. For Agri-DWH major issues of data cleansing had arisen due to data processing and handling at four levels by different groups of people i.e.
(i) Hand recordings by the scouts at the field level
(ii) typing hand recordings into data sheets at the DPWQCP office
(iii) photocopying of the scouting sheets by DPWQCPpersonnel and finally
(iv) data entry or digitization by hired data entry operators.
thnx 4 sharing friends
Ans???
i think
i think greater hoga... Q k aik jaga ye aya ha notes mai
Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations.
Normally, k, t >> n.
hmmmm
© 2019 Created by + M.Tariq Malik.
Powered by
Promote Us | Report an Issue | Privacy Policy | Terms of Service
VU Students reserves the right to delete profile, which does not show any Activity at site nor has not activity more than 01 month.
We are user-generated contents site. All product, videos, pictures & others contents on vustudents.ning.com don't seem to be beneath our Copyrights & belong to their respected owners & freely available on public domains. We believe in Our Policy & do according to them. If Any content is offensive in your Copyrights then please email at m.tariqmalik@gmail.com or Contact us at contact Page with copyright detail & We will happy to remove it immediately.
Management: Admins ::: Moderators
Awards Badges List | Moderators Group
All Members | Featured Members | Top Reputation Members | Angels Members | Intellectual Members | Criteria for Selection
Become a Team Member | Safety Guidelines for New | Site FAQ & Rules | Safety Matters | Online Safety | Rules For Blog Post