datawookie
September 12, 2013
4.7k

# Clustering Lightning into Storms

A short talk that I gave at the LIGHTS 2013 Conference (Johannesburg, 12 September 2013). The slides are a little short on text because I like the audience to hear the content rather than read it, but the central message is that clustering lightning discharges into storms is not a trivial task. But it is a worthwhile challenge because it can lead to some very interesting science!

## datawookie

September 12, 2013

## Transcript

1. Clustering Lightning
Andrew B. Collier
[email protected]
http://www.exegetic.biz/

2. Clustering & Complexity
k-means

Time: O(nk)

Space: O(n+k)
Hierarchical

Time: O(n2 log n)

Space: O(n2)
where n = number of points
k = number of clusters

3. Hierarchical Clustering
A method of cluster analysis which tries
to build a hierarchy of clusters.
Agglomerative: each observation starts in its own cluster, and
pairs of clusters are merged.
Divisive: all observations start in one cluster, and splits are
performed recursively.

4. Distance Matrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.00000 0.90660 0.98676 2.59730 1.21076 0.64162 1.37218 1.83724 1.33919 1.21728
[2,] 0.90660 0.00000 1.54973 1.81014 0.51097 1.22292 0.76644 1.29365 0.46367 1.37455
[3,] 0.98676 1.54973 0.00000 2.69411 2.01335 1.56353 1.51223 1.73953 1.80822 0.61954
[4,] 2.59730 1.81014 2.69411 0.00000 1.97469 3.03009 1.24766 0.97643 1.35618 2.12959
[5,] 1.21076 0.51097 2.01335 1.97469 0.00000 1.27080 1.18563 1.68454 0.69746 1.88420
[6,] 0.64162 1.22292 1.56353 3.03009 1.27080 0.00000 1.88041 2.38623 1.68528 1.85847
[7,] 1.37218 0.76644 1.51223 1.24766 1.18563 1.88041 0.00000 0.52860 0.53994 1.04872
[8,] 1.83724 1.29365 1.73953 0.97643 1.68454 2.38623 0.52860 0.00000 0.99706 1.15659
[9,] 1.33919 0.46367 1.80822 1.35618 0.69746 1.68528 0.53994 0.99706 0.00000 1.47596
[10,] 1.21728 1.37455 0.61954 2.12959 1.88420 1.85847 1.04872 1.15659 1.47596 0.00000
[11,] 11.70194 11.19894 11.09504 9.49063 11.45953 12.30475 10.45600 9.93351 10.79570 10.57239
[12,] 11.01346 10.37439 10.57979 8.58636 10.55138 11.55993 9.67963 9.18187 9.93618 9.99942
[13,] 10.15958 9.63720 9.59680 7.92908 9.89656 10.75400 8.89718 8.37645 9.23239 9.05385
[14,] 8.65864 7.98640 8.31356 6.19131 8.15378 9.18362 7.30701 6.82147 7.54379 7.71149
[15,] 11.16103 10.79475 10.43696 9.21898 11.12921 11.79252 10.03042 9.50193 10.43790 9.97271

5. Euclidean Distance (Pythagoras' Theorem)

Geographical Distance (“great circle”)
– Cosine
– Haversine
– Vincenty Sphere
– Vincenty Ellipsoid
Distance Measures

6. Big steps (SIGNIFICANT)
Small steps (INSIGNIFICANT)

7. Minimum between-cluster distance

Maximum within-cluster distance

8. We could just take a statistical approach...
… but why ignore domain-specific knowledge?

9. Conclusion

Isolate storms
easily identified

Clustering not as
easy as it looks

Need to use other
information

10. Strauss, C., Rosa, M. B., & Stephany, S. (2013). Spatio-temporal clustering and density
estimation of lightning data for the tracking of convective events. Atmospheric Research,
134, 87–99. doi:10.1016/j.atmosres.2013.07.008.

11. Kernel Density
Kernel Density & Spatio-
Temporal Clustering

12. Why is this important?
To gain a better understanding of

spatial and

temporal
distribution of lightning within a storm we need to actually isolate
individual storms.
http://www.wallconvert.com/
Tracking convective events in