Slide 20
Slide 20 text
20
Let Σ∗ be the finite set of characters from alphabet Σ.
Let Dom be a finite set of domains {dom
1
,dom
2
, ...}.
Let each dom
i
∈ Dom have a mapping p
i
: Σ∗ → dom
i
.
A dataframe is a tuple (A
mn
, R
m
, C
n
, D
n
), where A
mn
is an arrangement of entries in columns and rows
from the domain Σ∗, R
m
is a vector of row labels from Σ∗, C
n
is a vector of column labels from Σ∗, and D
n
is
a vector of n domains from some finite set of domains Dom, one per column, each of which can also be
left unspecified. We call D
n
the schema of the dataframe. If any of the n entries within D
n
is left
unspecified, then that domain can be induced by applying a schema induction function S(·) to the
corresponding column of A
mn
. The schema induction function S: Σ∗ → Dom, assigns an arrangement of m
strings to a domain in Dom.
Dataframe formal definition