# Data Mining Exam Questions Help

Data Mining Exam Questions

Multiple Choice Questions: There could be more than one correct answer-choose all that apply

Describe Questions: Please describe briefly – no more than 500 words

1. Data Mining is:
2. Most applicable in large datasets
3. Discovering patterns and hidden trends in the data
4. Retrospective analyses of data
5. For providing accurate models and correct predictions

ALL OF ABOVE

• (T/F) Data Mining requires a good understanding of statistics and computer sciences

TRUE

• Data Mining relies on:
• Cleaned and Curated data
• Unstructured data
• Computational efficiency of the algorithm
• Training data
• Non-experimental (Observational) data
• Data Mining Exam Questions Help
• The model selection process depends on several criteria including:
• Hypothesis to be proved or disproved
• Type of data available
• Underlying methods such as association, etc.
• All of the above
• (T/F) Association mining typically requires you to identify strong rules for measures of minimum support and threshold.
•  Interestingness of patterns in a dataset can be determined by these methods
1. Correlation
2. Association Rules
3. Classification
4. Lift & Chi Square Test
• (T/F) R2 is a measure of the explanatory power of the independent variables Data Mining Exam Questions Help
• (T/F) Model fit refers to how well the variables correlate with one another in a model
• Sensitivity and Specificity are two values useful in:
• Sigmoid curve
• Logit curve
• Sinusoidal curve
• None of the above
1. (T/F): Its best to compare and contrast model by using measures of information criteria AIC/BIC for individual and hybrid models.
1. Statistical inference refers to:
2. Predicting the outcome of a model run
3. Probability of an event occurrence
4. Measuring dependent variable and any error terms to arrive at a solution
5. None of the above Data Mining Exam Questions Help
1. (T/F) Sample and Population in Statistics refers to how clean the dataset is before data modeling
1. The following technique is useful for a single descriptive measure of income by age
1. Variance
2. Central Tendency
3. Outliers
4. All of the above
1. (T/F) Probability theory is useful in statistics for improving upon ‘random guess’ related to events occurring
1. Probability of joint occurrence refers to:
2. Two independent events
3. Co-occurring events
4. Conditionally independent events
5. Multiplying the probabilities of individual events
1. In the article: Advanced Scout – Data Mining and Knowledge Discovery in NBA Data

Describe the purpose of creating the data mining software (application) i.e. what value add does it bring? Data Mining Exam Questions Help

1. In the article: Advanced Scout – Data Mining and Knowledge Discovery in NBA Data

Describe the 4 general steps used in the application as part of data mining – including possible data structure for the application to read the data from.

1. A few applications of Text Mining & NLP (Natural Language Processing) are:
2. Web reviews and ratings
3. Medical Records