Upgrade to Pro — share decks privately, control downloads, hide ads and more …

distcloud: distributed virtualization platform

distcloud: distributed virtualization platform

Hiroki (REO) Kashiwazaki

February 14, 2017
Tweet

More Decks by Hiroki (REO) Kashiwazaki

Other Decks in Research

Transcript

  1. &VSBTJBO QMBUF /PSUI "NFSJDBO 1MBUF 1BDJpD 0DFBO 1MBUF 1IJMJQQJOF4FB 1MBUF

    FQJDFOUFS PG /BOLBJ 4PVUI4FB  5SPVHI </&95>
  2. Cybermedia Center Osaka University Kitami Institute of Technology University of

    the Ryukyus 9FO4FSWFS  $MPVE4UBDL  9FO4FSWFS  $MPVE4UBDL 
  3. 64 256 1024 4096 16384 65536 262144 1.04858e+06 4.1943e+06 1.67772e+07

    6.71089e+07 4 16 64 256 1024 4096 16384 0 20000 40000 60000 80000 100000 120000 Kbytes/sec File size in 2^n KBytes Record size in 2^n Kbytes 0 20000 40000 60000 80000 100000 120000 )JHI 3BOEPN38 1FSGPSNBODF
  4. ) A -‐‑‒ ) 81 ( . . 2 Con$idential

    0( GIC E   Global VM migration is also available by sharing "storage space" by VM host machines. Real time availability makes it possible. Actual data copy follows. (VM operator need virtually common Ethernet segment and fat pipe for memory copy) TOYAMA site OSAKA site TOKYO site before Migration Copy to DR-sites Copy to DR-sites live migration of VM between distributed areas real time and active-active features seem to be just a simple "shared storage". Live migration is also possible between DR sites (it requires common subnet and fat pipe for memory copy, of course) after Migration Copy to DR-sites
  5. 23   -‐‑‒ 23 1 1 10 Con$idential . AG

    E . C Front-end servers aggregate client requests (READ / WRITE) so that, lots of back-end servers can handle user data in parallel & distributed manner. Both of performance & storage space are scalable, depends on # of servers. front-end (access server) Access Gateway (via NFS, CIFS or similar) clients back-end (core server) WRITE req. write blocks read blocks READ req. scalable performance & scalable storage size by parallel & distributing processing technology
  6. 'JMF CMPDL CMPDL CMPDL CMPDL CMPDL CMPDL CMPDL CMPDL CMPDL

    .FUB %BUB DPOTJTUFOU IBTI CBDLFOE DPSFTFSWFST
  7. 2   -‐‑‒ 55 2 1 1 10 Con$idential .

    AI E GC 1. assign a new unique ID for any updated block (to ensure consistency). 2. make replication in local site (for quick ACK) and update meta data. 3. make replication in global distributed environment (for actual data copies). back-end (multi-sites) a file, consisted from many blocks multiplicity in multi-location, makes each user data, redundant in local, at first, 3 distributed copies, at last. (2) create 2 copies in local for each user data, write META data, ant returns ACK (1) (1') (3-a) (3-a) (3-a) make a copy in different location right after ACK. (3-b) remove one of 2 local blocks, in a future. (3-b) (1) assign a new unique ID for any updated block, so that, ID ensures the consistency Most important ! the key for "distributed replication"
  8. EVOEBODZ  "$, r = 2 e = 0 r

    = 1 e = 0 r = 0 e = 1 r = -1 e = 2 FYUFSOBM
  9. )JSPTIJNB6OJW ,BOB[BXB6OJW /** 7..WJSUVBMNBDIJOFNPOJUPS $4DPSFTFSWFST )4IJOUTFSWFST "4BDDFTTTFSWFST "4 "4 7..

    7.. $4 $4 $4 $4 $4 $4 )4 )4 $4 $4 $4 )4 -71/ -71/ -71/ -71/ -71/ -71/ -71/ &9"(&-"/ &9"(&-"/ BENJO -"/ BENJO -"/ .*(3"5*0/-"/ &9"(&-"/ .*(3"5*0/-"/
  10. iozone -aceI a: full automatic mode c: Include close() in

    the timing calculations e: Include flush (fsync,fflush) in the timing calculations I: Use DIRECT_IO if possible for all file operations.
  11. write 64 256 1024 4096 16384 65536 262144 1.04858e+06 4.1943e+06

    1.67772e+07 6.71089e+07 4 16 64 256 1024 4096 16384 0 20000 40000 60000 80000 100000 120000 Kbytes/sec File size in 2^n KBytes Record size in 2^n Kbytes 0 20000 40000 60000 80000 100000 120000 64 256 1024 4096 16384 65536 262144 1.04858e+06 4.1943e+06 1.67772e+07 6.71089e+07 4 16 64 256 1024 4096 16384 File size in 2^n KBytes Record size in 2^n Kbytes
  12. write read re-read re-write random read backwords read records rewrite

    strided read random write fwrite file size [Bytes] file size [Bytes] file size [Bytes] record size [KB] record size [KB] record size [KB] 4 16 64 256 1024 4096 16384 4 16 64 256 1024 4096 16384 0 20 40 60 80 100 120 64KB 256 4 16 64 256 1GB 4 16 1MB 64KB 256 4 16 64 256 1GB 4 16 1MB 64KB 256 4 16 64 256 1GB 4 16 1MB 64KB 256 4 16 64 256 1GB 4 16 1MB 64KB 256 4 16 64 256 1GB 4 16 1MB 64KB 256 4 16 64 256 1GB 4 16 1MB 64KB 256 4 16 64 256 1GB 4 16 1MB 64KB 256 4 16 64 256 1GB 4 16 1MB 64KB 256 4 16 64 256 1GB 4 16 1MB 64KB 256 4 16 64 256 1GB 4 16 1MB 64KB 256 4 16 64 256 1GB 4 16 1MB MB/sec 4 16 64 256 1024 4096 16384 frewrite
  13. We have been developing a widely distributed cluster storage system

    and evaluating the storage along with various applications. The main advantage of our storage is its very fast random I/O performance, even though it provides a POSIX compatible file system interface on the top of distributed cluster storage.
  14. UZQFPG
 MJOF MPBE
 DPOEJUJPO SFRVJSFEUJNF
 TFD EPNFTUJD OPMPBE  JOUFSOBUJPOBM

    OPMPBE  SFBEMPBE  XSJUFMPBE  SFRVJSFEUJNFUPNJHSBUJPO *0QFSGPSNBODF UZQFPG
 BDDFTTQBUUFSO MPBE
 DPOEJUJPO EPNFTUJD
 SFBE  EPNFTUJD
 XSJUF  JOUFSOBUJPOBM
 SFBE  JOUFSOBUJPOBM
 XSJUF  BWFSBHFUISPVHIQVU .#T PGEE
  15. -BZFS NFUIPE PVUMJOF GFBUVSFT - SPVUJOH VQEBUFSPVUJOHUBCMF
 CZFBDINJHSBUJPOT ˓SPVUJOHQFSSFHJPO ºDBOOPUSPVUJOHQFS7.


    ºSPVUJOHPQFSBUJPODPTU SPVUJOH
 -FYUFOTJPO 71-4 *&&&BE1# 2JO2  *&&&BI .BDJO.BD ˓TUBCJMJUZ PQFSBUJPODPTU
 ºQPPSTDBMBCJMJUZ -PWFS- 79-"/ 057 /7(3& ˓TUBCJMJUZ
 ºPWFSIFBEPGUVOOFMJOH
 º*1NVMUJDBTU 4%/ 0QFO'MPX ˓QSPHSBNBCMFPQFSBUJPO
 ºDPTUPGFRVJQNFOU *%MPDBUPSTFQBSBUJPO -*41 ˓TDBMBCJMJUZ SPVUJOHQFS7.
 ºDPTU JNNFEJBDZ *1NPCJMJUZ ."5 /&.0 .*1 ,BHFNVTIB ˓TDBMBCJMJUZ
 ºMPBEPGSPVUFS - N4$51 4$51NVMUJQBUI ˓JOEFQFOEFOUGSPN--
 ºMJNJUFEJO4$51 - %/4 SFWFSTF/"5 %ZOBNJD%/4 ˓JOEFQFOEFOUGSPN--
 ºBMUFSJOH*1BEES
 ºDMPTJOHDPOOFDUJPO