Chapter 2 Notation

Notation refers to the system of written symbols used to represent data or mathematical operations. Within this chapter, you’ll find a summary of all notation used throughout Taking Stats by the Helm.

2.1 Table of Notation

Label	Description
\(N\)	Number of individuals within the sample
\(n_i\)	The number of observations for individual \(i\) (this is for repetead measures)
\(B_0\), \(B_1\), …, \(B_p\)	These are the \(p+1\) regression coefficients
\(X_{k}\)	The \(k_{th}\) variable
\(X_{k,i}\)	The \(k_{th}\) variable measured on individual \(i\)
\(X_{k,i,t}\)	The \(k_{th}\) variable measured on individual \(i\) at time \(t\)

2.2 Examples With Cross-Sectional Data

Consider the following data set. There is an identification variable (labelled ‘ID’), and there are three measured variables, labeled ‘Gender’, ‘Age’, and ‘Score’. Within this data set, each individual has one observation per variable (i.e., there are no repeated measures).

ID	Gender	Age	Score
1	Male	37	12.4
2	Female	41	19.8
3	Male	50	17.2
\(\vdots\)	\(\vdots\)	\(\vdots\)	\(\vdots\)
100	Female	62	24.7

Within this example:

\(N\) equals 100 (there are 100 individuals within the data set)
\(n_i\) equals 1 for each individual (i.e., we only have one observation for each individual)
\(k\) equals 3 because we have three observed variables within our data set (Gender, Age, and Score)
For \(X_k\), we can define \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score
- Please note that this labelling is arbitrary
- We could have used \(X_1\) = Age, \(X_2\) = Score, and \(X_2\) = Gender
- In this case, we prefer \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score because it matches the order of the columns within the data set
\(X_{k,i}\) refers to a specific observation on variable \(k\) for individual \(i\). Using the labeling system from the above bullet point (i.e., \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score),
- \(X_{1,1}\) = Male
- \(X_{1,100}\) = Female
Within this case we do not have any meaningful information for \(X_{kit}\)
- We only have one measurement per individual. Therefore, we do not refer to any of the observations with cross-section data using the \(X_{kit}\) nomenclature.

2.3 Examples With Repeated Measures Data

Consider the following data set. There is an identification variable (labelled ‘ID’), a time variable (labelled ‘Time’), and there are three measured variables, labeled ‘Gender’, ‘Age’, and ‘Score’. Within this data set, each individual has multiple observations per variable.

ID	Time	Gender	Age	Score
1	1	Male	37	12.4
1	2	Male	38	13.8
1	3	Male	39	13.9
2	1	Female	41	19.8
2	2	Female	42	22.0
3	1	Male	50	17.2
3	2	Male	51	18.4
3	3	Male	52	19.6
3	4	Male	53	19.8
\(\vdots\)	\(\vdots\)	\(\vdots\)	\(\vdots\)	\(\vdots\)
50	4	Male	39	22.8

Within this example:

\(N\) equals 50 (there are 50 individuals within the data set)
\(n_i\) equals
- 3 for the \(1^{st}\) individual
- 2 for the \(2^{nd}\) individual
- 4 for the \(3^{rd}\) individual
- 4 for the \(50^{th}\) individual
\(k\) equals 3 because we have three observed variables within our data set (Gender, Age, and Score)
For \(X_k\), we can define \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score
- Please note that this labelling is arbitrary
- We could have used \(X_1\) = Age, \(X_2\) = Score, and \(X_2\) = Gender
- In this case, we prefer \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score because it matches the order of the columns within the data set
\(X_{k,i}\) doesn’t have a specific meaning because we must refer to a specific time point for each observation
\(X_{k,i,t}\) refers to a specific observation on variable \(k\) for individual \(i\) at time \(t\). Using the labeling system from the above bullet point (i.e., \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score),
- \(X_{1,2,2}\) = Female
- \(X_{2,1,3}\) = 39
- \(X_{3,50,4}\) = 22.8