Theory
T5
Illustrate the concept of conditional, joint, marginal (relative) frequency using a simple bivariate distribution
To answer this question we will use the following table looking at a bivariate distribution that shows income and gender of a sample of 100 people.
<= $20,000 | > $20,000 | Total | |
---|---|---|---|
Male | 30 | 15 | 45 |
Female | 45 | 10 | 55 |
Total | 75 | 25 | 100 |
Joint frequency: how many times times a combination of two variables happens. In this case the joint frequency of the gender (male/female) of the subject and his income (less or equal than 20,000 dollars or more) is a joint frequency. Marginal frequency: the total of any single row or column. In this case, it represents the total of all the occurrencies of one variable (either gender or income) regardless of the value of the other. Conditional frequency: it represents the ratio between the value of a variable and its total. In our example, 1/3 of the male subjects have an income higher than $20,000.
T6
Illustrate the concept of statistical independence and the resulting mathematical relationships between the above frequencies
Two events are statistically independent if the occurrence of one does not affect in any way the occurrence of the other. To be more precise, mathematically speaking, two events are considered statistically independent if and only if their joint frequency equals the product of their frequencies.
Applications
A5
Create a distribution from the data obtained by the sniffer Wireshark by reading the CSV file or realtime data generated by the program
[optional: create a bivariate distribution] source code is available this GitHub link
Research
TA3
A survey on ONLINE algorithms (mean, variance, median, etc…) An online algorithm is a type of algorithm that can process data as it arrives, without necessarily having it entirely from the beginning. Offline algorithms, on the contrary, need the entire dataset prior to begin processing. Online algorithms used in statistics are Knuth’s recursion algorithm and Welford’s algorith. The first one is used to calculate the mean of a dataset and the second one to calculate the variance.
TA4
Illustrate in particular, Knuth’s recursion for the computation of the arithmetic mean or average, discussion why it is preferable to the “naive” algo
Knuth’s recursion algorithm is preferred to the naive approach when it comes to the computation of arithmetic mean of a dataset because it greatly reduces the error that would otherwise manifest because of the accumulation of floating point arithmetic errors.