FOUNDATIONS OF DATA SCIENCE : CHAPTER 1 : EXAM PREPARATION : SPPU PYQ

   - What is Data science?

Data science is a field that involves using scientific methods, processes, algorithms, and systems to extract insights and knowledge from structured and unstructured data.

-Define Data source?
A data source is any location, platform, or system from which data originates or is collected.

-List applications of data science.

Climate Modeling:

Studying climate patterns and making predictions for climate change mitigation.

Recommendation Systems:

Providing personalized recommendations for products, movies, music, etc.

Social Network Analysis:

Studying relationships and patterns within social networks.

Predictive Analytics:

Forecasting future trends and outcomes based on historical data.

Fraud Detection:

Identifying and preventing fraudulent activities by analyzing patterns.

Customer Segmentation:

Grouping customers based on common characteristics for targeted marketing.

- What is data transformation?
Data transformation refers to the process of converting raw data into a more suitable format for analysis. 

- What is use of Bubble plot?
A Bubble plot is a variation of a scatter plot where a third dimension of the data is shown through the size of markers (bubbles). It is useful for visualizing three variables in a two-dimensional space, where the size of the bubbles represents the magnitude of the third variable.

 List the tools for data scientist.

Python (with libraries like NumPy, Pandas, Scikit-learn)

Jupyter Notebooks

Tableau

Excel (for basic analysis)

- Explain different data formats in brief.

CSV (Comma-Separated Values): Text-based format where values are separated by commas.

JSON (JavaScript Object Notation): Lightweight data interchange format.

Excel Spreadsheets: Tabular format with rows and columns.

- Define volume characteristic of data in reference to data science.
Volume refers to the sheer size of data. In data science, dealing with large volumes of data is common, and technologies like big data tools and distributed computing are employed to handle and analyze massive datasets.

- Give examples of semistructured data.

XML (eXtensible Markup Language)

JSON (JavaScript Object Notation)

- Explain any two ways in which data is stored in files.

CSV (Comma-Separated Values): Text-based format with values separated by commas.

JSON (JavaScript Object Notation): Lightweight data interchange format.

- Explain any two tools in data scientist tool box.

Jupyter Notebooks: For interactive and collaborative coding.

Git: Version control system for tracking changes in code.

 -Explain data science life cycle with suitable diagram.
The data science life cycle typically involves stages like problem definition, data collection, data cleaning, exploration, modeling, evaluation, and deployment. It forms a cyclical process where insights drive further iterations.

-Differentiate between structured data and unstructured data.

Structured Data: Well-organized data with a clear format, often stored in databases.

Unstructured Data: Data lacking a predefined data model or structure, such as text, images, or videos.

- Define data science.

Data Science is a deep study of the massive amount of data, which involes extracting meaningful insights from row,structured and unstructured data.

-Write any two applications of data science

1)Healthcare Predictive Analytics:

Application: Predicting disease outcomes, optimizing patient care, and personalized medicine.

2)E-commerce Recommendation Systems:

Application: Enhancing user experience and driving sales through personalized product recommendations.