Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Example running FALCON-Unzip on AWS

Jason Chin
June 01, 2016
1.9k

Example running FALCON-Unzip on AWS

This is for running FALCON/FALCON-Unzip diploid assembler on AWS. The codebase used is not 100% up-to-date, but you can see it running through.

Jason Chin

June 01, 2016
Tweet

Transcript

  1. Perquisites • An AWS EC2 account • An SSH keypair

    (http://docs.aws.amazon.com/AWSEC2/latest/User Guide/ec2-key-pairs.html#having-ec2-create-your- key-pair)
  2. Go to the “instance tab” to “Lunch Instance” Data, code

    and an example of the results are in snapshot snap-d555eba7 (set up for r3.8xlarge instance)
  3. Choose rx.8xlarge for 244G ram ( we can run 32

    / 4 = 8 daligner jobs at the same time, each can take up to 24G RAM)
  4. Click the “Lunch” button in the bottom of the lunch

    review page, AWS will ask which SSH key pair. I had an existing one called “starcluster”. Yours will be different, pick one.
  5. Go back to the instance tab. If the instance starts

    up successfully, you will see “running” status associated with the instance. Right click, you will get a pop-up menu. Click “connect” to get the hostname for connection.
  6. On a local terminal, ssh into the instances following AWS’s

    instruction. After you login, the vanilla Ubuntu box needs some quick configuration. We need to: (1) Mount a new EBS volume generated based on the snapshot that has the data to /EBS (2) It is best to use the instance’s SSD for processing the data. We need to format the storage. (3) Copy configuration and run the assembly.
  7. (1) Mount a new EBS volume generated based on the

    snapshot that has the data to /EBS with the following commands: mkdir /EBS mount /dev/xvdd1 /EBS (2) It is best to use the instance’s SSD for processing the data. We need to format the storage by running `fdisk /dev/xvdb` interactively. Here is an example. What you need to input is highlighted. Input “n” Input “p” Use default, or input “1” Use default Use default Input “w”
  8. We still need to format and mount the new partition,

    here are the commands mkfs.ext4 /dev/xvdb1 mkdir /test_run/ mount /dev/xvdb1 /test_run Now you can change directory to /test_run to use the SSD and do a test assembly run, Here is an example cd /test_run/ cp /EBS/FALCON_asm_example_template/* . bash run_example.sh & It takes about 7 to 8 hours which might cost about $25 for the CPU hours used. The created EBS volume also costs some thing. You can delete it one the run is finished. The assembly results in SSD will be destroyed one the instance is terminated. You should terminated the instance once the assembly is done. If you want to keep the assembly results, you need to copy it out. The I/O from the snapshot is slow initially. (see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html) A previous run example is inside `/EBS/FALCON_asm_example/`. The final Quvier primary contigs and associated haplotigs are in `/EBS/FALCON_asm_example/4- quiver/cns_output/`