# Hypothesis Testing in Data Science

• Other names AB testing, Confirmative Analysis, and significance testing.
• Generally, population parameters (standard deviation, maximum, minimum, and so on) are unknown in real-time.
• However, we do have hypotheses about what the true values are.
• Hypothesis testing is a bunch of methods to evaluate the hypothesis about the population parameter based on the available sample parameters.
• There are 2 terms in the hypothesis, they are null hypothesis and alternate hypothesis.

# Null Hypothesis (H0):

• A general statement about the population parameters which assumed to be true unless strong proof for the opposite statement.
• The default statement is that there is no difference between the measured…

# Everything About t-SNE

t-SNE means t-distribution Stochastic Neighborhood Embedding

## Dimensionality reduction

• 1D, 2D, and 3D data can be visualized. And it’s not always possible to work with a dataset having less than or equal to 3 dimensions in the field of data science. We may end up in a situation to work with higher dimensional data. For a data science professional, it is necessary to visualize and get insights about the working data to do a better job. To mitigate this, dimensionality reduction techniques have been evolved.
• Another most popular use case of the dimensionality reduction technique is to reduce the computational complexity while training…

# Inferential Statistics in Data Science field

Experiment →uncertain situations, which could have multiple outcomes. A coin toss is an experiment.

Outcome → result of a single trial. So, if head lands, the outcome of coin toss experiment is “Heads”

Event → one or more outcomes from an experiment. “Tails” is one of the possible events for this experiment.

## Basic Probability

Chance of something happening, but in the academic term “likelihood of an event or sequence of events occurring”. for example

• Tossing a coin
• Rolling a dice

## Conditional Probability

Probability of an event occurring given that another event has already occurred. for example

• Picking 3 blue balls from a box has…

# Descriptive Statistics in the Data Science field

Measure of Central Tendency

• Measure of Spread
• Dependence

## Measure of Central Tendency:

Mean → Average of a set of data points.

Median → Middle element of data points which is sorted in ascending order.

Mode → particular data point which appeared maximum number of times from a set of data points.

## Measure of Spread:

Standard Deviation (SD) → Average distance between mean and each data points.

Variance → Measure of how far each value in the data set is from the mean (Square of SD).

Range → Maximum value minus Minimum value from a set of data points.

Percentile → Representation of position of a value in… 