GATE 2025: Data Science & AI - Machine Learning Practice + PYQs (Part 1)

Introduction

Welcome, students! We are excited to kick off our live session today where we'll dive into a variety of machine learning questions. To start, please let me know in the chat section whether I’m properly audible and visible. A quick thumbs up would be appreciated so we can proceed with the session.

Session Overview

In today's session, we will discuss a wide range of questions related to machine learning. These questions have been curated from various resources and are tailored to help you understand how to solve GATE-level problems and answer related queries. We have a total of three machine learning sessions lined up on Wednesday, Thursday, and Friday. Each session is designed to cover all essential machine learning algorithms and basic concepts, including cross-validation, bias-variance tradeoff, and more.

Remember, if at any point you have any doubts, feel free to ask in the chat section. Let’s dive into our first question!

Understanding Classification and Regression

The distinction between classification and regression tasks is fundamental in machine learning. If we have an output variable that is numerical (a continuous value), we identify it as a regression problem. Conversely, when dealing with categorical values or classes, we classify it as a classification task.

Question 1

A company wants to launch a new product and seeks to know whether it will succeed or fail based on data from the last 100 products launched. Given that the output variable is categorical (success or failure), this qualifies as a classification problem.

Question 2

When analyzing several Bay Area tech companies to determine which features influence average employee salaries, we are tasked with identifying the important features. This task does not lead to a categorical output but instead points to a regression task, as we would be predicting a numerical outcome.

Question 3

Consider a dataset of 100 individuals with genomic information to predict if they might exhibit a particular disease. Again, this is a classification problem, as the output is categorical (disease present or absent).

Question 4

Classifying movie reviews as positive, negative, or neutral exemplifies supervised learning since we are working with labeled data.

Reinforcement Learning Explanation

Imagine a newborn learning to walk: this scenario best describes reinforcement learning. The child learns through trial and error, adjusting their strategy based on successes and failures.

Question 5

To classify election outcomes, we categorize it as a classification task. However, predicting the weight of a giraffe based on its height represents a regression task since weight is a continuous variable.

Question 6

Predicting emotions based on sentences is again a classification task as it deals with categorical outputs.

Question 7

Regarding linear regression, understanding the gradient descent helps in minimizing the cost function. The linear regression model (Y = WX + B) seeks values of (W) and (B) that minimize errors, leading to better predictions.

Outlier Sensitivity in Linear Regression

Linear regression is sensitive to outliers, which can skew results, making it essential to account for these anomalies in datasets.

Limitations and Use of Lasso and Ridge Regression

When it comes to Lasso and Ridge regression, both techniques manage multicollinearity. Lasso can eliminate irrelevant features by reducing some coefficient estimates to zero, while Ridge will shrink them but retains all features. This leads to lower generalized error in Lasso cases, especially useful in datasets with many irrelevant features.

Conclusion

The understanding of bias and variance is crucial for working with models that underfit or overfit data. Increasing model complexity can provide better fit for training data but may lead to poor performance on unseen data.

In summary, this session addresses critical machine learning concepts that are frequently encountered on the GATE exam. It emphasizes the importance of understanding different types of learning tasks and how to apply these concepts in practical contexts.

Keywords

Machine Learning
Classification
Regression
Supervised Learning
Reinforcement Learning
Outliers
Lasso Regression
Ridge Regression
Bias-Variance Tradeoff
GATE 2025

FAQ

Q1: What is the difference between classification and regression?
A1: Classification deals with predicting categorical outputs, while regression involves predicting continuous numerical outputs.

Q2: How is linear regression sensitive to outliers?
A2: Outliers can skew the best-fit line determined by linear regression, leading to inaccurate predictions.

Q3: Why would one use Lasso over Ridge regression?
A3: Lasso is useful for feature selection as it can reduce some coefficients to zero, thereby simplifying the model.

Q4: What does high bias or high variance imply in models?
A4: High bias indicates underfitting, while high variance suggests overfitting, resulting in poor generalization on unseen data.

Q5: Can we use KNN for regression?
A5: Yes, KNN can be used for regression tasks; however, it is generally less common compared to classification tasks.

GATE 2025: Data Science & AI - Machine Learning Practice + PYQs (Part 1) | GfG GATE