Latest Activity In Study Groups

Join Your Study Groups

VU Past Papers, MCQs and More

We non-commercial site working hard since 2009 to facilitate learning Read More. We can't keep up without your support. Donate.


Data Warehousing (CS614) 

Graded Discussion Board (GDB) 

Dear Students,

GDB of the subject Data Warehousing (CS614) is going to open in a couple of days, before submitting the GDB, please read all instructions thoroughly.

The GDB will remain open for two days (48 hours).

You may submit your GDB from “February 17, 2021 To February 18, 2021, 11:59 PM”

GDB Topic: 

There are a number of data mining techniques and the selection of a particular technique is highly application dependent, although other factors affect the selection process too. These techniques are classification, estimation, prediction, and clustering.

Now Suppose you want to build a recommendation system for restaurants based on facilities like parking, reservation, acceptance of credit cards, Free Wi-Fi, TV etc along with users (customer) views (positive & negative). The proposed recommendation system can be built using any data mining techniques. Which data mining technique is most suitable for the above scenario? Justify your answer with strong arguments.

Regards,

Instructor CS614

Views: 627

Replies to This Discussion

Share the GDB Question & Discuss Here....         

Stay touched with this discussion, Solution idea will be uploaded as soon as possible in replies here before the due date.

CS614 GDB Solution:
According to the given data for mining data technique we use the clustering technique. because it
is the oldest technique used in data mining clustering analysis the process of identifying data that
are similar to each other .clustering help to understanding the differences and similarities of data,
it is also understanding what is going on within the database like different company can group of
customer based on their income, nature of policy and type of claims, so clustering is best the
other technique.

CS614 GDB Fall 2020 solution idea:

 

Solution:

  • Increased resource availability: If one Intelligence Server in a cluster fails, the other Intelligence Servers in the cluster can pick up the workload. This prevents the loss of valuable time and information if a server
  • Strategic resource usage: You can distribute projects across nodes in whatever configuration you prefer. This reduces overhead because not all machines need to be running all projects, and allows you to use your resources flexibly.
  • Increased performance: Multiple machines provide greater processing power.
  • Greater scalability: As your user base grows and report complexity increases, your resources can grow.
  • Simplified management: Clustering simplifies the management of large or rapidly growing
  • The cluster is actually a collection of data objects; those objects are similar within the
  • Clustering methods can be used to implement Collaborative
  • In simple words, the aim is to segregate groups with similar traits and assign them into
  • Clustering is the basic techniques machine learning that is simply group data points based on the similarity of each
  • Clustering is purely
  • The main advantage of a clustered solution is automatic recovery from failure, that is, recovery without user

CS614 GDB solution Due Date February 18, 2021

CS614 GDB Solution Fall 2020 - 2021 ||Data Warehousing||

What are some examples of data mining techniques?

  • Regression.
  • Association Rule Discovery.
  • Classification.
  • Clustering.
  • Clustering.

Data Mining is an important analytic process designed to explore data. Much like the real-life process of mining diamonds or gold from the earth, the most important task in data mining is to extract non-trivial nuggets from large amounts of data.

Data Mining process and collecting and distilling data

Extracting important knowledge from a mass of data can be crucial, sometimes essential, for the next phase in the analysis: the modeling. Many assumptions and hypotheses will be drawn from your models, so it’s incredibly important to spend appropriate time “massaging” the data, extracting important information before moving forward with the modeling.

Although the definition of data mining seems to be clear and straightforward, you may be surprised to discover that many people mistakenly relate to data mining tasks such as generating histograms, issuing SQL queries to a database, and visualizing and generating multidimensional shapes of a relational table.

For example: data mining is not about extracting a group of people from a specific city in our database; the task of data mining in this case will be to find groups of people with similar preferences or taste in our data. Similarly, data mining is not about creating a graph of, say, the number of people that have cancer against power voltage—data mining’s task in this case could be something like: is the chance of getting cancer higher if you live near a power-line?

The tasks of data mining are twofold: create predictive power—using features to predict unknown or future values of the same or other feature—and create a descriptive power—find interesting, human-interpretable patterns that describe the data. In this post, we’ll cover four data mining techniques:

  • Regression (predictive)
  • Association Rule Discovery (descriptive)
  • Classification (predictive)
  • Clustering (descriptive)

Regression

Regression is the most straightforward, simple, version of what we call “predictive power.” When we use a regression analysis we want to predict the value of a given (continuous) feature based on the values of other features in the data, assuming a linear or nonlinear model of dependency.

Here are some examples:

  • Predicting revenue of a new product based on complementary products.
  • Predicting cancer based on the number of cigarettes consumed, food consumed, age, etc.
  • Time series prediction of stock market and indexes.

Regression techniques are very useful in data science, and the term “logistic regression” will appear almost in every aspect of the field. This is especially the case due to the usefulness and strength of neural networks that use a regression-based technique to create complex functions that imitate the functionality of our brain.

Association Rule Discovery

Association rule discovery is an important descriptive method in data mining. It’s a very simple method, but you’d be surprised how much intelligence and insight it can provide—the kind of information many businesses use on a daily basis to improve efficiency and generate revenue.

Our goal is to find all rules (X —> Y) that satisfy user-specified minimum support and confidence constraints, given a set of transactions, each of which is a set of items. Given a set of records—each of which contain some number of items from a given collection—we want to find dependency rules which will discover occurrence of an item based on occurrences of other items.

For example: Assume you have a dataset of all your past purchases from your favorite grocery store, and I found a dependency rule (minimizing with respect to the constraints) between these items: {Diapers} —> {Beer}.

This “links” or creates dependencies, based on the specified minimum support and confidence, which are defined as such:

Support and Confidence Formulas

The applications for associate roles are vast and can add lots of value to different industries and verticals within a business. Here are some examples: Cross-selling and up-selling of products, network analysis, physical organization of items, management, and marketing. This was an industry staple for decades in market basket analysis, but in recent years, recommendation engines have largely come to dominate these traditional methods.

Classification

Classification is another important task you should handle before digging into the hardcore modeling phase of your analysis. Assume you have a set of records: each record contains a set of attributes, where one of the attributes is our class (think about letter grades). Our goal is to find a model for the class that will be able to predict unseen or unknown records (from external similar data sources) accurately as if the label of the class was seen or known, given all values of other attributes.

In order to train such a model, we usually divide the data set into two subsets: training set and test set. The training set will be used to build the model, while the test set used to validate it. The accuracy and performance of the model is determined on the test set.

Classification has many applications in the industry, such as direct marketing campaigns and churn analysis:

Direct marketing campaigns are intended to reduce the cost of spreading marketing content (advertising, news, etc.) by targeting a set of consumers that are likely to be interested in the specific content (product, discount, etc.) based on their revealed past data and behavior.

The method is simply to collect data for a similar product (for simplicity) introduced in the recent past and to classify the profiles of customers based upon whether they did buy or didn’t buy. This target feature will become the class attribute. Now we need to enhance the data with additional demographic, lifestyle, and other relevant features in order to use this information as input attributes to train a classifier model.

Churn is the measure of individuals losing interest in your offering (service, information, product, etc.). In business it’s incredibly important to monitor churn and attempt to identify why subscribers (clients, etc.) decided to stop paying for the subscription. In other words, churn analysis tries to predict whether a customer is likely to be lost to a competitor.

To analyze churn, we need to collect a detailed record of transactions with each of the past and current customers, to find attributes that can explain or add value to the question in hand. Some of these attributes can be related to how engaged the subscriber was with the services and features that the company offers. Then we simply need to label the customers as churn or not churn and find a model that will best fit the data to predict how likely each of our current subscribers is to churn.

Clustering

Clustering is an important technique that aims to determine object groupings (think about different groups of consumers) such that objects within the same cluster are similar to each other, while objects in different groups are not. The Clustering problem in this sense is reduced to the following:

Given a set of data points, each having a set of attributes, and a similarity measure, find clusters such that:

  1. Data points in one cluster are more similar to one another.
  2. Data points in separate clusters are less similar to one another.

In order to find how close or far each cluster is from one another, you can use the Euclidean distance (if attributes are continuous) or any other similarity measure that is relevant to the specific problem.

A useful application of clustering is marketing segmentation, which aims to subdivide a market into distinct subsets of customers where each subset can be targeted with a distinct marketing strategy.

This is done by collecting different attributes of customers based on their geographical- and lifestyle-related information in order to find clusters of similar customers. Then we can measure the clustering quality by observing the buying patterns of customers in the same cluster vs. those from different clusters.

Types of Data Mining

Data mining can be performed on the following types of data:

1. Smoothing (Prepare the Data)

This particular method of data mining technique comes under the genre of preparing the data. The main intent of this technique is removing noise from the data. Here algorithms like simple exponential, the moving average are used to remove the noise. During exploratory analysis, this technique is very handy to visualize trends/sentiments.

2. Aggregation (Prepare the Data)

As the term suggests a group of data is aggregated to achieve more information. This technique is employed to give an overview of business objectives and can be performed manually or using specialized software. This technique is generally employed on big data, as big data don’t provide the required information as a whole.

There are different types of clustering methods. They are as follows.

  • Partitioning Methods
  • Hierarchical Agglomerative methods
  • Density-Based Methods
  • Grid-Based Methods
  • Model-Based Methods

The most popular clustering algorithm is the Nearest Neighbour. The nearest neighbour technique is very similar to clustering. It is a prediction technique to predict what an estimated value is in one record look for records with similar estimated values in a historical database and use the prediction value from the form near the unclassified document. This technique states that the objects which are closer to each other will have similar prediction values. Through this method, you can easily predict the importance of the nearest items very quickly. Nearest Neighbour is the easiest to use the technique because they work as per the people’s thought. They also work very well in terms of automation. They perform complex ROI calculations with ease. The level of accuracy in this technique is as good as the other Data Mining techniques.

In business, the Nearest Neighbour technique is most often used in the process of Text Retrieval. They are used to find the documents that share the important characteristics with that main document that have been marked as impressive.

3. Visualization

Visualization is the most useful technique which is used to discover data patterns. This technique is used at the beginning of the Data Mining process. Many types of research are going on these days to produce an interesting projection of databases called Projection Pursuit. There is a lot of data mining technique which will have useful patterns for good data. But visualization is a technique which converts Poor data into useful data letting different kinds of Data Mining methods to be used in discovering hidden patterns.

4. Induction Decision Tree Technique

A decision tree is a predictive model, and the name itself implies that it looks like a tree. In this technique, each branch of the tree is viewed as a classification question. The leaves of the trees are considered partitions of the dataset related to that particular classification. This technique can be used for exploration analysis, data pre-processing, and prediction work.

The decision tree can be considered a segmentation of the original dataset where segmentation is done for a particular reason. Each data that comes under a segment has some similarities in their information being predicted. Decision trees provide results that the user can easily understand.

Statisticians mostly use the decision tree technique to find out which database is more related to the business’s problem. The decision tree technique can be used for Prediction and Data pre-processing.

The first and foremost step in this technique is growing the tree. The basic of growing the tree depends on finding the best possible question to be asked at each tree branch. The decision tree stops growing under any one of the below circumstances.

  • If the segment contains only one record
  • All the records contain identical features.
  • The growth is not enough to make any further .spilt

CART, which stands for Classification and Regression Trees, is a data exploration and prediction algorithm that picks the questions more complexly. It tries them all and then selects one best question, which is used to split the data into two or more segments. After deciding on the details, it again asks questions on each of the new element individually.

Another popular decision tree technology is CHAID (Chi-Square Automatic Interaction Detector). It is similar to CART, but it differs in one way. CART helps in choosing the best questions, whereas CHAID helps in choosing the splits.

5. Neural Network

Neural Network is another important technique used by people these days. This technique is most often used in the starting stages of data mining technology. The artificial neural network was formed out of the community of Artificial intelligence.

Neural networks are straightforward to use as they are automated to a particular extent. Because of this, the user is not expected to have much knowledge about the work or database. But to make the neural network work efficiently, you need to know.

  • How are the nodes connected?
  • How many processing units to be used?
  • When should the training process be stopped?

There are two main parts of this technique – the node and the link.

  • The node – which freely matches to the neuron in the human brain
  • The link – which freely matches to the connections between the neurons in the human brain

A neural network is a collection of interconnected neurons, forming a single layer or multiple layers. The formation of neurons and their interconnections are called the architecture of the network. There are many neural network models, and each model has its own advantages and disadvantages. Every neural network model has different architectures, and these architectures use other learning procedures.

Neural networks are a powerful predictive modelling technique. But it is not very easy to understand even by experts. It creates very complex models that are impossible to understand fully. Thus to understand the Neural network technique companies are finding out new solutions. Two solutions have already been suggested.

  • The first solution is Neural network is packaged up into a complete solution which will let it be used for a single application.
  • The second solution is it is bonded with expert consulting services.

The neural network has been used in various kinds of applications. This has been used in the business to detect frauds taking place in the industry.

6. Association Rule Technique

This technique helps to find the association between two or more items. It helps to know the relations between the different variables in databases. It discovers the hidden patterns in the data sets used to identify the variables and the frequent occurrence of other variables with the highest frequencies.

Association rule offers two primary information.

  • Support – Hoe often is the rule applied?
  • Confidence – How often the rule is correct?

This technique follows a two-step process.

  • Find all the frequently occurring data sets.
  • Create strong association rules from the frequent data sets

There are three types of an association rule. They are

  • Multilevel Association Rule
  • Multidimensional Association Rule
  • Quantitative Association Rule

This technique is most often used in the retail industry to find patterns in sales. This will help increase the conversion rate and thus increases profit.

7. Classification

Data mining techniques classification is the most commonly used data mining technique with a set of pre-classified samples to create a model that can classify the large group of data. This technique helps in deriving important information about data and metadata (data about data). This technique is closely related to the cluster analysis technique, and it uses the decision tree or neural network system. There are two main processes involved in this technique

  • Learning – In this process the data are analyzed by the classification algorithm
  • Classification – In this process, the data is used to measure the precision of the classification rules

There are different types of classification models. They are as follows

  • Classification by decision tree induction
  • Bayesian Classification
  • Neural Networks
  • Support Vector Machines (SVM)
  • Classification Based on Associations

One good example of a classification technique is Email provider.

The 7 Most Important Data Mining Techniques

Data mining is the process of looking at large banks of information to generate new information. Intuitively, you might think that data “mining” refers to the extraction of new data, but this isn’t the case; instead, data mining is about extrapolating patterns and new knowledge from the data you’ve already collected.

Relying on techniques and technologies from the intersection of database management, statistics, and machine learning, specialists in data mining have dedicated their careers to better understanding how to process and draw conclusions from vast amounts of information. But what are the techniques they use to make this happen?

Data Mining Techniques

Data mining is highly effective, so long as it draws upon one or more of these techniques:

1. Tracking patterns. One of the most basic techniques in data mining is learning to recognize patterns in your data sets. This is usually a recognition of some aberration in your data happening at regular intervals, or an ebb and flow of a certain variable over time. For example, you might see that your sales of a certain product seem to spike just before the holidays, or notice that warmer weather drives more people to your website.

2. Classification. Classification is a more complex data mining technique that forces you to collect various attributes together into discernable categories, which you can then use to draw further conclusions, or serve some function. For example, if you’re evaluating data on individual customers’ financial backgrounds and purchase histories, you might be able to classify them as “low,” “medium,” or “high” credit risks. You could then use these classifications to learn even more about those customers.

3. Association. Association is related to tracking patterns, but is more specific to dependently linked variables. In this case, you’ll look for specific events or attributes that are highly correlated with another event or attribute; for example, you might notice that when your customers buy a specific item, they also often buy a second, related item. This is usually what’s used to populate “people also bought” sections of online stores.

4. Outlier detection. In many cases, simply recognizing the overarching pattern can’t give you a clear understanding of your data set. You also need to be able to identify anomalies, or outliers in your data. For example, if your purchasers are almost exclusively male, but during one strange week in July, there’s a huge spike in female purchasers, you’ll want to investigate the spike and see what drove it, so you can either replicate it or better understand your audience in the process.

5. Clustering. Clustering is very similar to classification, but involves grouping chunks of data together based on their similarities. For example, you might choose to cluster different demographics of your audience into different packets based on how much disposable income they have, or how often they tend to shop at your store.

6. Regression. Regression, used primarily as a form of planning and modeling, is used to identify the likelihood of a certain variable, given the presence of other variables. For example, you could use it to project a certain price, based on other factors like availability, consumer demand, and competition. More specifically, regression’s main focus is to help you uncover the exact relationship between two (or more) variables in a given data set.

7. Prediction. Prediction is one of the most valuable data mining techniques, since it’s used to project the types of data you’ll see in the future. In many cases, just recognizing and understanding historical trends is enough to chart a somewhat accurate prediction of what will happen in the future. For example, you might review consumers’ credit histories and past purchases to predict whether they’ll be a credit risk in the future.

Data Mining Tools

So do you need the latest and greatest machine learning technology to be able to apply these techniques? Not necessarily. In fact, you can probably accomplish some cutting-edge data mining with  relatively modest database systems, and simple tools that almost any company will have. And if you don’t have the right tools for the job, you can always create your own.

However you approach it, data mining is the best collection of techniques you have for making the most out of the data you’ve already gathered. As long as you apply the correct logic, and ask the right questions, you can walk away with conclusions that have the potential to revolutionize your enterprise.

umeed hay es say madad zaror milay gi,,

or GDB time say upload kr Lo gay ap sab.

RSS

© 2021   Created by + M.Tariq Malik.   Powered by

Promote Us  |  Report an Issue  |  Privacy Policy  |  Terms of Service