FOUNDATIONS OF DATA SCIENCE : CHAPTER 2 : EXAM PREPARATION : SPPU PYQ

  -Define standard deviation?

Standard Deviation is a measure of the amount of variation  in a set of values. It indicates how much individual data points differ from the mean (average) of the dataset. 
Define statistical data analysis?
Statistical Data Analysis involves using statistical methods to explore, summarize, and draw inferences from data. It includes descriptive statistics, hypothesis testing, regression analysis, and other techniques to understand patterns and relationships in the data.
-What is data cube?
A data cube is a multidimensional representation of data, where values are organized along multiple dimensions. It allows for the analysis of data by enabling users to slice, dice, and drill down into the information.

 -What are the measures of central tendency? Explain any two of them in

 brief.

Measures of central tendency describe the center or average of a data set. Two common measures are:

Mean (Average): It is calculated by summing up all values and dividing by the total number of values.

Median: It is the middle value when data is arranged in ascending order. If there's an even number of values, the median is the average of the two middle values.

What are the various types of data available? Give example of each?

Nominal Data: Categorical data with no inherent order (e.g., colors, types of fruit).

Ordinal Data: Categorical data with a meaningful order (e.g., ranking in a race, customer satisfaction levels).

Interval Data: Numeric data with equal intervals but no true zero point (e.g., temperature in Celsius).

Ratio Data: Numeric data with equal intervals and a true zero point (e.g., height, weight).

 -What is outlier? State types of outliers.

n outlier is an observation that lies an abnormal distance from other values in a random sample from a population. Types of outliers include:

Univariate Outliers: Unusual values in a single variable.

Multivariate Outliers: Unusual combinations of values across multiple variables.

- What is a quartile?
Quartiles divide a dataset into four equal parts. The three quartiles (Q1, Q2, and Q3) are the values that separate the data into quarters. Q2 is the median.

- State the methods of feature selection.

Filter Methods: Select features based on statistical characteristics.

Wrapper Methods: Evaluate feature subsets using a specific machine learning model.

- List any two libraries used in Python for data analysis.

Pandas: For data manipulation and analysis.

NumPy: For numerical operations and array processing.

-  Explain role of statistics in data science.
Statistics helps in making sense of data by providing methods for summarizing, analyzing, and interpreting information.

- Calculate the variance and standard deviation for the following data.

 X : 14 9 13 16 25 7 12

Mean (X̄) = (14 + 9 + 13 + 16 + 25 + 7 + 12) / 7 = 96 / 7 ≈ 13.71

Variance (σ²) = Σ(Xᵢ - X̄)² / n = (0.04 + 12.49 + 0.09 + 4.84 + 64.69 + 37.69 + 1.96) / 7 ≈ 20.70

Standard Deviation (σ) = √Variance ≈ √20.70 ≈ 4.55

- Write a short note on hypothesis testing.
Hypothesis testing is a statistical method to make inferences about a population based on a sample. It involves forming a hypothesis, collecting and analyzing data, and drawing conclusions about the validity of the hypothesis.