BoT2013 海量資料時代的網路分析

海量資料時代的網路分析 BoT2013ୋʞ̨֣ᝄਜBotnetਈ಻ၾԣطҦஔ޼ীึ ॽख͍ Allen Own [email protected]

Who Am I ॽख͍ (Allen Own) [email protected] DEVCORE Ꮦ˃੒ဧੂБڗ CHROOT
ϓࡰ HITCON ̨ᝄᎡ܄ϋึਓᐼ̜ NISRA ༟τྠඟ௴፬ɛ ༟τҦঐږ޷ᆤᘩᒄϋڿࠏ

ආБִ݁ఊЗeΆุഃသீ಻༊ਖ਼ࣩ ΂Άุeኪஔʿִ݁ఊЗᑺࢪʿᚥਪ ༟ৃτΌӻ୕޼Ӻʿක೯ ၣ༩τΌ஝ྌܔໄ Ꭱ܄ҸᏘ˓جeᎡ܄৛ᔳ ҸᏘ೻όeܝژ೻όක೯ၾ޼Ӻ EC-Council Certiﬁed Ethical Hacker
Computer Hacking Forensic Investigator

2013/07/19 (五) ~ 20 (六) 中央研究院⼈人⽂文社會科學館

What is Big Data?

Big Data Big data[1][2] is the term for a collection
of data sets so large and complex that it becomes diﬃcult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, [3] search, sharing, transfer, analysis,[4] and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traﬃc conditions."[5][6][7]

How Big Data ﬁghts back against APTs and Malware? http://www.seculert.com/blog/2013/05/how-big-
data-ﬁghts-back-against-apts-and-malware.html http://info.umbrella.com/infographic-using-big-data- for-malware-protection.html

What is Big Data?

What is Big Data? BIG

How about... 9TB?

Internet Census 2012 http://internetcensus2012.bitbucket.org/ While playing around with the Nmap
Scripting Engine (NSE) we discovered an amazing number of open embedded devices on the Internet. Many of them are based on Linux and allow login to standard BusyBox with empty or default credentials. We used these devices to build a distributed port scanner to scan all IPv4 addresses. These scans include service probes for the most common ports, ICMP ping, reverse DNS and SYN scans. We analyzed some of the data to get an estimation of the IP address usage.

Internet Census 2012 http://internetcensus2012.bitbucket.org/download/ internet_census_2012.torrent Decompressing all data results in
9TB of raw logﬁles, but this code can also be used to recompress the data into gzip ﬁles. The gziped dataset should be ~1.5TB.

Hilbert Browser http://internetcensus2012.bitbucket.org/hilbert/ index.html

We prepared 26TB. Special thanks to GD.

We spent 2 months to decompress parts of zpaq ﬁles.

How to use?

How to use? ANS: Use your force.

grep -e "Apache/2.2.3" *

grep -e "Apache/2.2.3" * 1MBJOUFYU3VMFT

But... it took 15 minutes... (one single ﬁle)

Search Faster!

elasticsearch http://www.elasticsearch.org/ ﬂexible and powerful PQFOTPVSDF, distributed real- time search
and analytics engine for the cloud. Case Study: Fog Creek, Stack Overﬂow, SoundCloud, StumbleUpon, Github, foursquare, Wordpress, salesforce

You Know, for Search.

I'm too lazy to code.

logstash http://logstash.net logstash is a tool for managing events and
logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs. *UJTGVMMZGSFFBOEGVMMZPQFOTPVSDF

It works!

I need stronger UI/UX!

Kibana http://kibana.org Kibana is an PQFOTPVSDF (MIT License), browser based
interface to Logstash and ElasticSearch. Once you have those in place, Kibana is a breeze to install and conﬁgure (really, I swear). And as you'll see below, none too hard to operate. Check out the screenshots for an idea of what Kibana is all about.

LIVE DEMO

"OBMZTJTPG *OUFSOFU$FOTVT 4FBSDILOPXOWVMOFSBCMFTFSWFST BQQMJDBUJPOT "OBMZ[F*1BDUJWJUJFT %FUFDUCPUOFUBDUJWJUJFT 'JOETQFDJGJDEPNBJOIPTUT

5IBOLT 翁浩正 Allen Own [email protected]

BoT2013 海量資料時代的網路分析

BoT2013 海量資料時代的網路分析

Allen Own

More Decks by Allen Own

Other Decks in Technology

Featured

Transcript

海量資料時代的網路分析 BoT2013ୋʞ̨֣ᝄਜBotnetਈ಻ၾԣطҦஔ޼ীึ ॽख͍ Allen Own [email protected]

Who Am I ॽख͍ (Allen Own) [email protected] DEVCORE Ꮦ˃੒ဧੂБڗ CHROOT

ආБִ݁ఊЗeΆุഃသீ಻༊ਖ਼ࣩ ΂Άุeኪஔʿִ݁ఊЗᑺࢪʿᚥਪ ༟ৃτΌӻ୕޼Ӻʿක೯ ၣ༩τΌ஝ྌܔໄ Ꭱ܄ҸᏘ˓جeᎡ܄৛ᔳ ҸᏘ೻όeܝژ೻όක೯ၾ޼Ӻ EC-Council Certiﬁed Ethical Hacker

2013/07/19 (五) ~ 20 (六) 中央研究院⼈人⽂文社會科學館

What is Big Data?

Big Data Big data[1][2] is the term for a collection

How Big Data ﬁghts back against APTs and Malware? http://www.seculert.com/blog/2013/05/how-big-

What is Big Data?

What is Big Data? BIG

How about... 9TB?

How about... 9TB?

Internet Census 2012 http://internetcensus2012.bitbucket.org/ While playing around with the Nmap

Internet Census 2012 http://internetcensus2012.bitbucket.org/download/ internet_census_2012.torrent Decompressing all data results in

Hilbert Browser http://internetcensus2012.bitbucket.org/hilbert/ index.html

We prepared 26TB. Special thanks to GD.

We spent 2 months to decompress parts of zpaq ﬁles.

How to use?

How to use? ANS: Use your force.

grep -e "Apache/2.2.3" *

grep -e "Apache/2.2.3" * 1MBJOUFYU3VMFT

But... it took 15 minutes... (one single ﬁle)

Search Faster!

elasticsearch http://www.elasticsearch.org/ ﬂexible and powerful PQFOTPVSDF, distributed real- time search

You Know, for Search.

I'm too lazy to code.

logstash http://logstash.net logstash is a tool for managing events and

It works!

I need stronger UI/UX!

Kibana http://kibana.org Kibana is an PQFOTPVSDF (MIT License), browser based

LIVE DEMO

"OBMZTJTPG OUFSOFU$FOTVT 4FBSDILOPXOWVMOFSBCMFTFSWFST BQQMJDBUJPOT "OBMZ[F1BDUJWJUJFT %FUFDUCPUOFUBDUJWJUJFT 'JOETQFDJGJDEPNBJOIPTUT

2"

5IBOLT 翁浩正 Allen Own [email protected]