Chapter 2 Notation
Notation refers to the system of written symbols used to represent data or mathematical operations. Within this chapter, you’ll find a summary of all notation used throughout Taking Stats by the Helm.
2.1 Table of Notation
Label | Description |
---|---|
\(N\) | Number of individuals within the sample |
\(n_i\) | The number of observations for individual \(i\) (this is for repetead measures) |
\(B_0\), \(B_1\), …, \(B_p\) | These are the \(p+1\) regression coefficients |
\(X_{k}\) | The \(k_{th}\) variable |
\(X_{k,i}\) | The \(k_{th}\) variable measured on individual \(i\) |
\(X_{k,i,t}\) | The \(k_{th}\) variable measured on individual \(i\) at time \(t\) |
2.2 Examples With Cross-Sectional Data
Consider the following data set. There is an identification variable (labelled ‘ID’), and there are three measured variables, labeled ‘Gender’, ‘Age’, and ‘Score’. Within this data set, each individual has one observation per variable (i.e., there are no repeated measures).
ID | Gender | Age | Score |
---|---|---|---|
1 | Male | 37 | 12.4 |
2 | Female | 41 | 19.8 |
3 | Male | 50 | 17.2 |
\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
100 | Female | 62 | 24.7 |
Within this example:
- \(N\) equals 100 (there are 100 individuals within the data set)
- \(n_i\) equals 1 for each individual (i.e., we only have one observation for each individual)
- \(k\) equals 3 because we have three observed variables within our data set (Gender, Age, and Score)
- For \(X_k\), we can define \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score
- Please note that this labelling is arbitrary
- We could have used \(X_1\) = Age, \(X_2\) = Score, and \(X_2\) = Gender
- In this case, we prefer \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score because it matches the order of the columns within the data set
- \(X_{k,i}\) refers to a specific observation on variable \(k\) for individual \(i\). Using the labeling system from the above bullet point (i.e., \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score),
- \(X_{1,1}\) = Male
- \(X_{1,100}\) = Female
- Within this case we do not have any meaningful information for \(X_{kit}\)
- We only have one measurement per individual. Therefore, we do not refer to any of the observations with cross-section data using the \(X_{kit}\) nomenclature.
2.3 Examples With Repeated Measures Data
Consider the following data set. There is an identification variable (labelled ‘ID’), a time variable (labelled ‘Time’), and there are three measured variables, labeled ‘Gender’, ‘Age’, and ‘Score’. Within this data set, each individual has multiple observations per variable.
ID | Time | Gender | Age | Score |
---|---|---|---|---|
1 | 1 | Male | 37 | 12.4 |
1 | 2 | Male | 38 | 13.8 |
1 | 3 | Male | 39 | 13.9 |
2 | 1 | Female | 41 | 19.8 |
2 | 2 | Female | 42 | 22.0 |
3 | 1 | Male | 50 | 17.2 |
3 | 2 | Male | 51 | 18.4 |
3 | 3 | Male | 52 | 19.6 |
3 | 4 | Male | 53 | 19.8 |
\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
50 | 4 | Male | 39 | 22.8 |
Within this example:
- \(N\) equals 50 (there are 50 individuals within the data set)
- \(n_i\) equals
- 3 for the \(1^{st}\) individual
- 2 for the \(2^{nd}\) individual
- 4 for the \(3^{rd}\) individual
- 4 for the \(50^{th}\) individual
- \(k\) equals 3 because we have three observed variables within our data set (Gender, Age, and Score)
- For \(X_k\), we can define \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score
- Please note that this labelling is arbitrary
- We could have used \(X_1\) = Age, \(X_2\) = Score, and \(X_2\) = Gender
- In this case, we prefer \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score because it matches the order of the columns within the data set
- \(X_{k,i}\) doesn’t have a specific meaning because we must refer to a specific time point for each observation
- \(X_{k,i,t}\) refers to a specific observation on variable \(k\) for individual \(i\) at time \(t\). Using the labeling system from the above bullet point (i.e., \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score),
- \(X_{1,2,2}\) = Female
- \(X_{2,1,3}\) = 39
- \(X_{3,50,4}\) = 22.8