salient substrings in values that can be used for classification • Start with an embedding layer for dimensionality reduction • Use convolutional layers to capture substring patterns • End with fully connected layers
similar. ChemicalSubstance ChemicalCompound Biomolecule Protein NaturalPlace BodyOfWater Semantic types can be used to detect and measure the semantic similarities between database types.
main source of confusion of the neural network. Thus, we can infer the degree of semantic similarity between two database types t and t’ based on their mutual confusion.
to organize all the database types into a similarity graph. • A logistic function is used to rescale the similarity measure so we can control the spatial layout of the graph with the parameters: a and b.
analyze graphs. • We apply spectral clustering to organize the database types into clusters which represent semantic topics in the database. • Spectral clustering can be applied recursively so that a hierarchical organization can be obtained.
data where data values that are assigned incorrect types. Example: “Beethoven’s 9th Symphony” being classified as Artist. • We can utilize the neural network to detect likely candidates of such dirty data. • Using substring reasoning, we can also generate visual explanation of the abnormalities of the identified data values.
lakes • Similarity information derived from neural networks can be used to construct a hierarchy of semantic types • Semantic types are useful for analysis and data cleaning