- Other names AB testing, Confirmative Analysis, and significance testing.
- Generally, population parameters (standard deviation, maximum, minimum, and so on) are unknown in real-time.
- However, we do have hypotheses about what the true values are.
- Hypothesis testing is a bunch of methods to evaluate the hypothesis about the population parameter based on the available sample parameters.
- There are 2 terms in the hypothesis, they are
**null hypothesis**and**alternate hypothesis.**

- A general statement about the population parameters which assumed to be true unless strong proof for the opposite statement.
- The default statement is that there is no difference between the measured…

t-SNE means t-distribution Stochastic Neighborhood Embedding

- 1D, 2D, and 3D data can be visualized. And it’s not always possible to work with a dataset having less than or equal to 3 dimensions in the field of data science. We may end up in a situation to work with higher dimensional data. For a data science professional, it is necessary to
**visualize and get insights**about the working data to do a better job. To mitigate this, dimensionality reduction techniques have been evolved. - Another most popular use case of the dimensionality reduction technique is to
**reduce the computational complexity**while training…

Experiment→uncertain situations, which could have multiple outcomes. A coin toss is an experiment.

Outcome→ result of a single trial. So, if head lands, the outcome of coin toss experiment is “Heads”

Event→ one or more outcomes from an experiment. “Tails” is one of the possible events for this experiment.

Chance of something happening, but in the academic term “likelihood of an event or sequence of events occurring”. for example

- Tossing a coin
- Rolling a dice

Probability of an event occurring given that another event has already occurred. for example

- Picking 3 blue balls from a box has…

Measure of Central Tendency

- Measure of Spread
- Dependence

**Mean** → Average of a set of data points.

**Median** → Middle element of data points which is sorted in ascending order.

**Mode** → particular data point which appeared maximum number of times from a set of data points.

**Standard Deviation (SD) **→ Average distance between mean and each data points.

**Variance** → Measure of how far each value in the data set is from the mean (Square of SD).

**Range → **Maximum value minus Minimum value from a set of data points.

**Percentile** → Representation of position of a value in…