Trust as a Proxy Measure for the Quality of VGI in the Case of OSM

Carsten Keßler a,b and René de Groot a a Institute
for Geoinformatics, University of Münster | b soon: Hunter College, CUNY http://carsten.io | @carstenkessler Trust as a Proxy Measure for the Quality of VGI in the Case of OSM

The Idea ‣ Develop a measure to assess the degree
to which a data consumer can trust the quality of a feature

to which a data consumer can trust the quality of a feature ‣ Trust measure is based on a feature’s editing history

to which a data consumer can trust the quality of a feature ‣ Trust measure is based on a feature’s editing history ‣ Benefits ‣ Works at feature level ‣ Filter features by quality ‣ Spot problematic features

Does this work? Can we reliably assess the quality of
a feature in OpenStreetMap based on its editing history?

a feature in OpenStreetMap based on its editing history? amenity = university name = Institute for Geoinformatics v1

a feature in OpenStreetMap based on its editing history? amenity = university name = Institute for Geoinformatics amenity = university building = yes name = Institute for Geoinformatics v1 v2

a feature in OpenStreetMap based on its editing history? amenity = university name = Institute for Geoinformatics amenity = university building = yes name = Institute for Geoinformatics addr:city = Münster addr:country = DE addr:housenumber = 253 addr:street = Weseler Straße building = yes wheelchair = limited v1 v2 v3 …

OSM Heatmap Kudos: Johannes Trame

OSM Provenance Ontology http://carsten.io/osm/osm-provenance.rdf prv:Tag includesEdit Changeset prv:CreationGuideline Edit prv:createdBy
prv:precededBy prv:usedData NodeState WayState prv:DataCreation User prv:performedBy changesGeometry addsTag removesTag changesValueOfKey rdfs:Literal prv:DataItem prv:HumanActor subClassOf hasTag FeatureState

Does this work? ‣ Get a first idea whether this
is a viable approach ‣ Compare results of ‣ a simple trust measure and ‣ observed feature quality ‣ Is there a correlation between the two?

Study area: Münster’s old town

Feature Selection

Feature Selection ‣ Re-mapping the whole district was not feasible

‣ Up to 100 features were manageable

‣ Up to 100 features were manageable ‣ Selection based on minimum number of versions

‣ Up to 100 features were manageable ‣ Selection based on minimum number of versions ‣ 74 features with 6+ versions

74 features selected

Trust measure

Trust measure ‣ Positive factors: ‣ Versions ‣ Users ‣
Indirect confirmations = edits in the direct vicinity (50m)

Trust measure ‣ Positive factors: ‣ Versions ‣ Users ‣
Indirect confirmations = edits in the direct vicinity (50m) ‣ Negative factors: ‣ Tag corrections ‣ Rollbacks

Trust measure (contd.) ‣ Classification for each factor: 5 equal
classes ‣ Combined into one classification ‣ Equal weights

Trust measure

Field Survey ‣ Thematic accuracy 4 classes: 1. Main tag
wrong 2. Other tags wrong 3. Thematic ambiguities 4. Thematically correct

Field Survey ‣ Thematic accuracy 4 classes: 1. Main tag
wrong 2. Other tags wrong 3. Thematic ambiguities 4. Thematically correct ‣ Results: ‣ 6 features (~8%) ‣ 2 features (~3%) ‣ 9 features (~12%) ‣ 57 features (~77%)

Field Survey (contd.) ‣ Topological consistency

Field Survey (contd.) ‣ Topological consistency ‣ Is the feature
correctly positioned relative to the surrounding features?

correctly positioned relative to the surrounding features? ‣ Results: ‣ 73 out of 74 features (~99%)

correctly positioned relative to the surrounding features? ‣ Results: ‣ 73 out of 74 features (~99%) ‣ Information completeness ‣ TF-IDF measure to identify relevant tags per main tag

correctly positioned relative to the surrounding features? ‣ Results: ‣ 73 out of 74 features (~99%) ‣ Information completeness ‣ TF-IDF measure to identify relevant tags per main tag ‣ ~37% tags missing (avg.)

Observed quality: combined results

Trust measure

mean quality class: ~4.2 mean trust class: ~2.8

Do we get the trend right?

Do we get the trend right? ‣ Removed outliers ‣
Kendall’s τ: 0.52 ‣ Moderate, but significant positive correlation

Conclusions

Conclusions ‣ Initial study

Conclusions ‣ Initial study ‣ A feature’s history can determine
its trustworthiness

its trustworthiness ‣ Trust values correlate with observed quality

its trustworthiness ‣ Trust values correlate with observed quality ‣ Even with a very simple model

its trustworthiness ‣ Trust values correlate with observed quality ‣ Even with a very simple model ‣ Outliers cannot be explained yet

Tons of Future Work

Tons of Future Work ‣ Extend and refine the trust
model: Classification, weighting, positive vs negative aspects, …

model: Classification, weighting, positive vs negative aspects, … ‣ Social aspects: Who has edited a feature?

model: Classification, weighting, positive vs negative aspects, … ‣ Social aspects: Who has edited a feature? ‣ Repeat study without spatial focus

model: Classification, weighting, positive vs negative aspects, … ‣ Social aspects: Who has edited a feature? ‣ Repeat study without spatial focus ‣ How to scale the data collection?

model: Classification, weighting, positive vs negative aspects, … ‣ Social aspects: Who has edited a feature? ‣ Repeat study without spatial focus ‣ How to scale the data collection? ‣ Learn the trust model from the data

Thank you! All data used in this research © OpenStreetMap
contributors. [email protected] | http://carsten.io | @carstenkessler Carsten Keßler | René de Groot

Trust as a Proxy Measure for the Quality of VG...

Trust as a Proxy Measure for the Quality of VGI in the Case of OSM

More Decks by Carsten Keßler

Other Decks in Research

Featured

Transcript