FOUNDATIONS OF DATA SCIENCE : CHAPTER 3 : EXAM PREPARATION : SPPU PYQ

  - What is missing values?

Missing values refer to the absence of data in a specific field or variable where information is expected.
-Define Data cleaning?
Data cleaning, or data cleansing, is the process of identifying and correcting errors or inconsistencies in datasets. It involves handling missing values, removing duplicates, correcting inaccuracies, and ensuring data quality for accurate analysis. 
-Give the purpose of data preprocessing?
Data preprocessing is done to prepare raw data for analysis. Its purposes include cleaning and handling missing values, transforming data into a suitable format, and ensuring that data is ready for machine learning algorithms.

- What is venn diagram? How to create it? Explain with example.

A Venn diagram is a visual representation of the relationships between different sets. To create one, draw overlapping circles to represent each set, and where the circles overlap, you show the elements that belong to both sets.

Example: If Set A represents mammals and Set B represents four-legged animals, the overlapping part shows mammals that are also four-legged.

- What is data quality? Which factors are affected data qualities?

Data quality refers to the accuracy, completeness, consistency, and reliability of data. Factors affecting data quality include:

Accuracy

Completeness

Consistency

Timeliness

Relevance

 -State and explain any three data transformation techniques

Normalization: Scaling values to a standard range, often between 0 and 1.

Log Transformation: Applying the logarithm to data to handle skewed distributions.

Standardization: Transforming data to have a mean of 0 and a standard deviation of 1.

- Define Data Discretization.
Data discretization involves converting continuous data into discrete intervals or categories. It's useful for simplifying complex data and can be applied to numerical variables.

- List different types of attributes.

Nominal Attributes: Categorical with no inherent order.

Ordinal Attributes: Categorical with a meaningful order.

Interval Attributes: Numeric with equal intervals but no true zero.

Ratio Attributes: Numeric with equal intervals and a true zero point.

- Define Data object.
In data science, a data object refers to an individual unit of information, such as a row in a dataset.

 -What is Data Transformation?
Data transformation involves converting data from one format or structure into another to make it more suitable for analysis or modeling.

- Explain two methods of data cleaning for missing values.

Imputation: Replacing missing values with estimated or calculated values.

Deletion: Removing rows or columns with missing values.

Explain any one technique of data transformation.

Normalization:

Normalization is a data transformation technique used to scale numerical features, bringing them to a standard range, typically between 0 and 1. This ensures that all features contribute equally to analyses, especially in machine learning, by preventing features with larger scales from dominating the model. The Min-Max normalization formula is commonly employed for this purpose.