Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mining Interest Topics from Plurk by Using Python

Mining Interest Topics from Plurk by Using Python

Ken Lee

May 24, 2013
Tweet

More Decks by Ken Lee

Other Decks in Programming

Transcript

  1. About Me * Ken Lee * @echain * Developer at

    Synology Inc. * Worked for NCTU CSCC * Just graduated from NCTU
  2. >>> import Java Traceback (most recent call last): File "<stdin>",

    line 1, in <module> ImportError: No module named Java
  3. >>> import Java Traceback (most recent call last): File "<stdin>",

    line 1, in <module> ImportError: No module named Java >>> >>> import PHP Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named PHP
  4. >>> from multiprocessing import Process >>> import multiprocessing.connection >>> import

    MySQLdb >>> import re >>> import urllib2 >>> >>> import ecspy >>> from matplotlib import pyplot
  5. # pacman -S python2-pip # cd /usr/bin # ln -s

    pip2 pip # pip install plurk-oauth $ python2 >>> from PlurkAPI import PlurkAPI
  6. # pip install gevent $ python2 >>> from gevent import

    monkey >>> from gevent.pool import Pool >>> from gevent.queue import Queue
  7. MariaDB [plurk]> SHOW TABLE STATUS; +---------+-----------+------------+ | Name | Data

    | Index | +---------+-----------+------------+ | fans | 233720343 | 901548032 | | friends | 421496919 | 1625860096 | | users | 189420940 | 79509504 | +---------+-----------+------------+
  8. # pacman -S redis # pip install redis $ python2

    >>> import redis >>> from retry import retry
  9. # pacman -S pypy # pypy get-pip.py # pip install

    networkx $ pypy >>> import networkx >>> import community
  10. # pacman -S zeromq # pip install pyzmq # pip

    install msgpack-python $ python2 >>> import zmq >>> import msgpack
  11. $ free -h total used Mem: 15G 15G -/+ buffers/cache:

    8.9G Swap: 251M 3.8M $ cat /proc/cpuinfo ... model name: Dual Core AMD Opteron(tm) Processor 270 ...
  12. # pacman -S openssl # pip install ujson $ python2

    >>> import hmac_sha1 >>> import ujson
  13. # pip install regex # pip install marisa-trie # pip

    install sqlalchemy $ python2 >>> import regex >>> import marisa_trie >>> import sqlalchemy
  14. # pacman -S mod_wsgi2 # pacman -S memcached # pip

    install flask $ python2 >>> from gevent.wsgi \ import WSGIServer >>> import flask
  15. $ whois snsd.tw Domain Name: snsd.tw Contact: Ken Lee [email protected]

    Record expires on 2013-10-11 (YYYY-MM-DD) Record created on 2012-10-10 (YYYY-MM-DD) Registration Service Provider: APT
  16. * 可嘗試使用 NoSQL 來解決問題 * 不夠快?寫 C-extension * 跟 C

    不熟?那就交給 pypy 吧! * pypi 上的套件很多,多看多比較 * 要感謝的人太多了,就感謝 * 少女時代吧!