Example running FALCON-Unzip on AWS

5633e4eaa009d960042a8f32b55b3d7f?s=47 Jason Chin
June 01, 2016

Example running FALCON-Unzip on AWS

This is for running FALCON/FALCON-Unzip diploid assembler on AWS. The codebase used is not 100% up-to-date, but you can see it running through.


Jason Chin

June 01, 2016


  1. Running FALCON-Unzip Example on AWS Jun 1, 2016 Jason Chin

  2. Perquisites • An AWS EC2 account • An SSH keypair

    (http://docs.aws.amazon.com/AWSEC2/latest/User Guide/ec2-key-pairs.html#having-ec2-create-your- key-pair)
  3. Go to the “instance tab” to “Lunch Instance” Data, code

    and an example of the results are in snapshot snap-d555eba7 (set up for r3.8xlarge instance)
  4. Choose Ubuntu 14.04 LTS AMI

  5. Choose rx.8xlarge for 244G ram ( we can run 32

    / 4 = 8 daligner jobs at the same time, each can take up to 24G RAM)
  6. Use “us-east-1a” availability zone

  7. Add the snapshot snap-d555eba7 and attached to device /dev/sdd and

    set the size to 250Gb
  8. You can ignore the warning and use the default here.

  9. Review everything

  10. Click the “Lunch” button in the bottom of the lunch

    review page, AWS will ask which SSH key pair. I had an existing one called “starcluster”. Yours will be different, pick one.
  11. Go back to the instance tab. If the instance starts

    up successfully, you will see “running” status associated with the instance. Right click, you will get a pop-up menu. Click “connect” to get the hostname for connection.
  12. On a local terminal, ssh into the instances following AWS’s

    instruction. After you login, the vanilla Ubuntu box needs some quick configuration. We need to: (1) Mount a new EBS volume generated based on the snapshot that has the data to /EBS (2) It is best to use the instance’s SSD for processing the data. We need to format the storage. (3) Copy configuration and run the assembly.
  13. (1) Mount a new EBS volume generated based on the

    snapshot that has the data to /EBS with the following commands: mkdir /EBS mount /dev/xvdd1 /EBS (2) It is best to use the instance’s SSD for processing the data. We need to format the storage by running `fdisk /dev/xvdb` interactively. Here is an example. What you need to input is highlighted. Input “n” Input “p” Use default, or input “1” Use default Use default Input “w”
  14. We still need to format and mount the new partition,

    here are the commands mkfs.ext4 /dev/xvdb1 mkdir /test_run/ mount /dev/xvdb1 /test_run Now you can change directory to /test_run to use the SSD and do a test assembly run, Here is an example cd /test_run/ cp /EBS/FALCON_asm_example_template/* . bash run_example.sh & It takes about 7 to 8 hours which might cost about $25 for the CPU hours used. The created EBS volume also costs some thing. You can delete it one the run is finished. The assembly results in SSD will be destroyed one the instance is terminated. You should terminated the instance once the assembly is done. If you want to keep the assembly results, you need to copy it out. The I/O from the snapshot is slow initially. (see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html) A previous run example is inside `/EBS/FALCON_asm_example/`. The final Quvier primary contigs and associated haplotigs are in `/EBS/FALCON_asm_example/4- quiver/cns_output/`
  15. CPU Load Time Course

  16. All System Monitoring Time Course

  17. General Disk Usage (including all intermediate data files) and the

    results in 4-quiver/cns_output/
  18. Terminate the instance Delete the associated EBS volume