Chapter 2 Notation

Notation refers to the system of written symbols used to represent data or mathematical operations. Within this chapter, you’ll find a summary of all notation used throughout Taking Stats by the Helm.

2.1 Table of Notation

Label Description
\(N\) Number of individuals within the sample
\(n_i\) The number of observations for individual \(i\) (this is for repetead measures)
\(B_0\), \(B_1\), …, \(B_p\) These are the \(p+1\) regression coefficients
\(X_{k}\) The \(k_{th}\) variable
\(X_{k,i}\) The \(k_{th}\) variable measured on individual \(i\)
\(X_{k,i,t}\) The \(k_{th}\) variable measured on individual \(i\) at time \(t\)

2.2 Examples With Cross-Sectional Data

Consider the following data set. There is an identification variable (labelled ‘ID’), and there are three measured variables, labeled ‘Gender’, ‘Age’, and ‘Score’. Within this data set, each individual has one observation per variable (i.e., there are no repeated measures).

ID Gender Age Score
1 Male 37 12.4
2 Female 41 19.8
3 Male 50 17.2
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)
100 Female 62 24.7

Within this example:

  • \(N\) equals 100 (there are 100 individuals within the data set)
  • \(n_i\) equals 1 for each individual (i.e., we only have one observation for each individual)
  • \(k\) equals 3 because we have three observed variables within our data set (Gender, Age, and Score)
  • For \(X_k\), we can define \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score
    • Please note that this labelling is arbitrary
    • We could have used \(X_1\) = Age, \(X_2\) = Score, and \(X_2\) = Gender
    • In this case, we prefer \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score because it matches the order of the columns within the data set
  • \(X_{k,i}\) refers to a specific observation on variable \(k\) for individual \(i\). Using the labeling system from the above bullet point (i.e., \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score),
    • \(X_{1,1}\) = Male
    • \(X_{1,100}\) = Female
  • Within this case we do not have any meaningful information for \(X_{kit}\)
    • We only have one measurement per individual. Therefore, we do not refer to any of the observations with cross-section data using the \(X_{kit}\) nomenclature.

2.3 Examples With Repeated Measures Data

Consider the following data set. There is an identification variable (labelled ‘ID’), a time variable (labelled ‘Time’), and there are three measured variables, labeled ‘Gender’, ‘Age’, and ‘Score’. Within this data set, each individual has multiple observations per variable.

ID Time Gender Age Score
1 1 Male 37 12.4
1 2 Male 38 13.8
1 3 Male 39 13.9
2 1 Female 41 19.8
2 2 Female 42 22.0
3 1 Male 50 17.2
3 2 Male 51 18.4
3 3 Male 52 19.6
3 4 Male 53 19.8
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)
50 4 Male 39 22.8

Within this example:

  • \(N\) equals 50 (there are 50 individuals within the data set)
  • \(n_i\) equals
    • 3 for the \(1^{st}\) individual
    • 2 for the \(2^{nd}\) individual
    • 4 for the \(3^{rd}\) individual
    • 4 for the \(50^{th}\) individual
  • \(k\) equals 3 because we have three observed variables within our data set (Gender, Age, and Score)
  • For \(X_k\), we can define \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score
    • Please note that this labelling is arbitrary
    • We could have used \(X_1\) = Age, \(X_2\) = Score, and \(X_2\) = Gender
    • In this case, we prefer \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score because it matches the order of the columns within the data set
  • \(X_{k,i}\) doesn’t have a specific meaning because we must refer to a specific time point for each observation
  • \(X_{k,i,t}\) refers to a specific observation on variable \(k\) for individual \(i\) at time \(t\). Using the labeling system from the above bullet point (i.e., \(X_1\) = Gender, \(X_2\) = Age, and \(X_2\) = Score),
    • \(X_{1,2,2}\) = Female
    • \(X_{2,1,3}\) = 39
    • \(X_{3,50,4}\) = 22.8