Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Detecting Communities in Science Blogs

cpikas
December 10, 2008

Detecting Communities in Science Blogs

Presentation given to 2008 IEEE E-Science Conference

cpikas

December 10, 2008
Tweet

More Decks by cpikas

Other Decks in Science

Transcript

  1. Problem • eScience includes using electron science and for communicating

    a g • There are an abundance of tools to help scientists communicate to help scientists communicate • Lots of scientists and members o i t i bl ( 2500?) maintain blogs (~2500?) • Ultimate Questions: Why? With whom are scientists commu What are scientists communicati What are scientists communicati What is the value to the scientist m Area nic tools both for conducting about science s both online and offline of the interested public unicating? ing about? ing about? ts and to science?
  2. Specific Proble • What is the nature o science blogosphere

    g p – What is its shape? Who are the central p – Who are the central p – What is the connectiv – Where are the potent em Addressed f the e? participants? participants? vity? tial information flows?
  3. Out • Background Background • Methods – Data gathering A

    l i – Analysis • Results • Results • Discussion Discussion line
  4. Backgrou • Defined by format Defined by format – Individual

    posts, with Comments – Comments • Links Links – In content I bl ll – In blogroll – In comments and trac • Community develops d bl th and among blogs thro nd: Blogs permanent URLs ckbacks around single blogs h ti ough commenting
  5. Links to Static Pages Posts Link auto osts gen cont

    http://dorigo.word ks and omatically t d erated tent press.com/
  6. Access to posts by search Access to posts by search

    and older posts using the calendar A li t f t t t A list of most recent posts is automatically generated
  7. A list of categories the blogger used to describe his

    posts used to describe his posts. Clicking will list all of the posts in that category. The blogroll is a list of blogs the author reads or endorses the author reads or endorses to some extent. Access to the older posts by month.
  8. And a form to leave your ow comment. Typically your

    e-m will not appear on the site But with Comments, which may be signed with the y g the commenter’s URL n mail
  9. Background: Socia •Uses connections bet understand potential p and influence

    •Uses graph theoretic – Central or prestigious Central or prestigious – Cohesive subgroups al Network Analysis tween actors to flows of information methods to find s actors s actors including communities
  10. Methods: Sam Operational Definitio Operational Definitio • Blogs maintained by

    sc t f b i any aspect of being a s • Blogs about scientific to Blogs about scientific to Omitted Omitted • Primarily political speec • Ones maintained by co • Non-English language mple Selection n of Science Blog n of Science Blog cientists that deal with i ti t scientist opics by non-scientists opics by non scientists ch rporations
  11. Methods: Da • Two Networks: Link • Link Data (Blogroll)

    – Used seed list developed Used seed list developed using directories and sea – Snowball sampled using p g – Visited and copied links • Commenter Data – Selected most central blo – Used Perl scripts to pull t from each of the last 10 p ata Gathering ks and Commenters ) d in previous study d in previous study arches links from blogrolls g ogs from blogroll data the commenter URLs posts
  12. Methods: U d i l t k •Used social network

    a and graphing software •Examined graph and descriptive statistics descriptive statistics •Found centrality and p y p –Degree: the links in an Betweenness: the num –Betweenness: the num that flow through that n Closeness: short paths –Closeness: short paths Analysis l i analysis e calculated basic prestige measures p g nd out mber of shortest paths mber of shortest paths node s to other nodes s to other nodes
  13. Methods: Located cohesive su • Link methods • Link methods

    – Components LS S t – LS Sets • Clustering methods g • Community detection te – Newman-Girvan – Spin Glass Analysis ubgroups echniques
  14. Results: Link An •One large component •There were 1091 node

    •Diameter is 9 •In-degree ranges from median in-degree of 3 median in-degree of 3, – 10 of the top 20 blogs b or co-authored by wome – 4 of the top 5 blogs by c p g y co-authored by women nalysis (Blogroll) es, 6621 arcs 1 to 292, with the and mean 6 and mean 6 y in-degree are authored en closeness are authored or
  15. Results: C •5 components, the larg others with 11 or

    fewer •938 nodes (starting wit •The largest component Commenter gest with 911, r nodes h the 46), 1152 arcs t has a diameter of 5
  16. Discussion: Li • Most of the blogs we dense component

    p – A result of the diffus • There were a few ve then many less cent then many less cent – Typical skewed dist • The community of w merits further study merits further study inks (Blogroll) ere connected in one sion of blogs? ery central blogs, and ral ral tribution women scientists
  17. Discussion: C • Analysis easily locate commenter who leav comments

    on physics – High out-degree no – High out-degree, no • Traffic on the women Traffic on the women uniform, with frequen widely distributed am widely distributed am – Indicates a different Commenters ed a notorious es incendiary y s and chemistry blogs links in links in n scientist blogs is more n scientist blogs is more nt comments that are mong the blogs mong the blogs use
  18. Take Home • The science blogosp • The science blogosp

    connected with many f f influence and informa • Communities tend to • Communities tend to disciplinary boundari • An exception is the c women scientist blog women scientist blog from many different d e Messages phere is densely phere is densely y opportunities for ff ation diffusion o form within o form within es community of ggers who are ggers who are disciplines
  19. Acknowle • Thanks to Dr. Jen G this work as

    part of a p • Thanks also to – Dr. Alan Neustadtl fo – Dr. Dagobert Soerge Dr. Dagobert Soerge dgements olbeck for supervising an independent study p y r SNA advice l for research advice l for research advice
  20. Christina K. Pikas Doctoral Student U i it f M

    l d University of Maryland College of Information College of Information [email protected] http://terpconnect.umd.edu/ Studies Studies /~cpikas/ScienceBlogging