How to build your own S3- and EBS-backed instances 

As of 8/9/2011, this page was the only place on the internet for instructions on how to build Amazon instances with Centos/Redhat installation discs and the stock kernels on those discs. As far as I know (03/2013) this page is still the only place to find that information. All of the other instructions you find online, use a yum repository and one of the kernels provided by Amazon or the community AMIs. Also most of the instructions you find elsewhere are not very good about explaining how to build EBS-bootable images. I do a much better job of this here.

Here I explain how to build both S3- and EBS-backed AMIs from scratch, using the CentOS5.6 64-bit installation discs and using the kernel that comes with CentOS (not the Amazon or public kernels). This method gives you the most control over what you're running and most closely imitates a bare-metal CentOS installation.

With this method, you don't use any of the community AMIs. You don't use yum to install the operating system. There's no need to boot anything other than what comes with your install disc. Use your CentOS or Redhat CD to install to a local drive and do all of your configuration from there, then bundle and upload the AMI from your local system.

There are several reasons why you may not want to use the community images available through Amazon or yum: you don't trust the images, you don't trust the repositories, you want to know exactly what has been bundled into your instance, you want full control over your server configuration, you want to keep configuration consistent between your data-center servers and your cloud instances, or you need to use a custom kernel. I'll show you how to do this with CentOS 5.6. The same procedure works with any other xen-compatiable distro (Redhat, etc.).

Before beginning, keep in mind a few key points regarding kernels:

  • Less configuration is required to mount the root filesystem under Amazon's kernels (aki) and ramdisks (ari). These kernels are more flexible and forgiving.
  • Running your own kernel means first launching Amazon's "grub" kernel, which in turn locates the menu.lst file on your filesystem and runs the commands it finds there.
  • A kernel most likely to work out-of-the-box and with minimal modification is a xen-compatible kernel.
  • /dev, /proc and /sys do not get bundled into the AMI. These are dynamic directories that get populated by linux during the boot process.
  • registering an AMI is really just specifying which kernel to launch and associating it with either a manifest (S3) or a block-device-mapping. The root filesystem plays a minimal role

Also keep in mind a few key points regarding volumes, snapshots, and ephemeral storage:

  • Ephemeral storage refers to the hard drives located on the server hardware your instances run on. This storage is much faster than EBS volumes (see below for benchmark details.)
  • Ephemeral storage is automatically provided to both S3- and EBS-backed instances, but to make use of it on an EBS-backed instance, you must designate the storage when you register or launch the instance.
  • When you create an AMI from an EBS snapshot, think of this snapshot as the "base snapshot."
  • When you launch an instance, Amazon first copies the "base snapshot" to a brand new volume and attaches this volume to the instance. Any changes you make on the volume are not written to the base snapshot. The base snapshot always stays the same. Neither do changes propagate to the original volume that the base snapshot was made from. After you have made the base snapshot you may forget about it effectively.
  • Multiple running instances of this AMI each get their own independent volume copy of the base snapshot.

Disk I/O Benchmark

Procedure: mount a 12GB EBS volume and mount one of the ephemeral disks on your running instance. Run the following command to write a 10GB file onto each mountpoint: dd if=/dev/zero of=/mountpoint/zero bs=1024 count=10000000

While the dd is running, use the following methods to obtain a tally of disk i/o stats:

  • vmstat 1
  • kill -SIGUSR1 `pidof dd`

When the file copy has completed, dd displays a total throughput. Run the following command to measure how long it takes to flush the changes to disk:

  • time sync

On an m1.large instance, I measured the following averages:

EBS file write: 25MB/s (total write time 400 seconds), sync time 4 minutes

Ephemeral: 67 MB/s (total write time 150 seconds), sync time 1 minute

Build your own S3-backed AMI

  1. Create a 2GB partition on your local hard drive and use the CentOS5.6 DVD to install to this partition. Select the following options during the install process: Server Installation. Virtualization. GRUB bootloader without a password. Activate eth0 on boot with DHCP and Enable IPv4.
  2. Mount this partition at /mnt/ec2-fs/ on a linux box that has your AWS keys, ami tools and api tools installed.
  3. Edit /mnt/ec2-fs/etc/sysconfig/selinux: disable selinux
  4. Create /mnt/ec2-fs/etc/firstboot with the following contents:
  5. For a list of ephemeral storage units that get provided at boot time, see the following link:
    /dev/sda1 is automatically provided and is sized according to the storage of the S3 bucket you made the AMI from
    /dev/sdb and /dev/sdc are provided as unpartitioned disks. They can be mounted at boot time by putting them in /etc/fstab, or they can be manipulated (fdisk) and mounted later.
    This makes the issue of swap space interesting. Since /dev/sdb and /dev/sdc are each 450GB on an m1.large instance, using either one as unpartitioned swap space is not advisable. It would be necessary to create a swap partition of the appropriate size, and then issue swapon, both procedures which need to be scripted or performed manually. Swap space is allocated automatically at boot time from /etc/rc.d/rc.sysinit (search the file for the text "Start up swapping", line 900 in default CentOS5.6), so you would need to create a command that adds a swap partition to the drive, add the mount point for this partition to /etc/fstab, and insert the command in the rc.sysinit file.

    Add the following lines immediately prior to "Start up swapping" in /etc/rc.d/rc.sysinit (how many cylinders you designate depends on the geometry of the disk provided by Amazon):
    # Create a swap partition of size 974 cylinders on one of the ephemeral disks
    # Amazon AWS - Michael Martinez May 2011
    /bin/echo ",974,S" | /sbin/sfdisk /dev/sdb
    # need to sleep otherwise mkswap runs before partition table has time to finish
    sleep 5
    /sbin/mkswap /dev/sdb1

    Replace /mnt/ec2-fs/etc/fstab with the following contents:

    # /dev/sda1 automatically provided as the root filesystem
    /dev/sda1   /     ext3    defaults 1 1
    /dev/sdb1   swap  swap  defaults 0 0
    # you may mount this if you like
    /dev/sdc  /mnt/sdc   ext3  default  0 0
    none      /dev/pts devpts  gid=5,mode=620 0 0
    none      /proc proc    defaults 0 0
    none      /sys  sysfs   defaults 0 0
    # glibc needs this for message sharing
    none /dev/shm tmpfs defaults 0 0
  6. edit /mnt/ec2-fs/etc/sysconfig/network-scripts/ifcfg-eth0 as follows:
  7. Make sure /mnt/ec2-fs/etc/sysconfig/network has the following (and only the following) contents:
  8. since you are using dhcp, /etc/resolv.conf will get populated when you boot up
  9. in /mnt/ec2-fs/etc/inittab, comment out mingetty lines 2-6
  10. Create /mnt/ec2-fs/etc/ as follows:
    hwcap 0 nosegneg
  11. Edit /mnt/ec2-fs/boot/grub/menu.lst as shown below. Note that root must be hd0 (not hd0,0). Use the xen kernel and ramdisk.
    # hiddenmenu
    title CentOS-5.6-xen-bundle
            root (hd0)
    # for booting from bare metal
    #       kernel /boot/xen.gz-2.6.18-238.el5
    #       module /boot/vmlinuz-2.6.18-238.el5xen ro root=/dev/sda1 console=xvc0
    #       module /boot/initrd-2.6.18-238.el5xen.img
    # for booting within Amazon EC2
            kernel /boot/vmlinuz-2.6.18-238.9.1.el5xen ro root=/dev/sda1 console=hvc0
            initrd /boot/initrd-2.6.18-238.9.1.el5xen.img
  12. Leave the partition mounted and bundle it as follows:
    ec2-bundle-vol -rx86_64 -p centos -u <your user id number> -k /root/.ec2/pk-aws.pem -c /root/.ec2/cert-aws.pem --kernel aki-427d952b --no-inherit -v /mnt/ec2-fs

    "no-inherit" means we're not bundling from within a running instance so there's no need to inherit any metadata
    "kernel" needs to be the correct aki for an S3 image, the regions you want, and either i386 or x86_64. For a list of PV-GRUB aki's, see the "Amazon Kernel IDs" section of the following document: Amazon User Specified Kernels
    Note that /mnt/ec2-fs/dev, /mnt/ec2-fs/proc, and /mnt/ec2-fs/sys should not contain anything (and the bundling process should remove them anyway)
  13. Upload the bundle to S3 as follows
    ec2-upload-bundle -b centos5.6 -m /tmp/<manifest>.xml -a <access-key> -s <secret-key>
  14. Go onto the Management Console, register the AMI
  15. Our kernel won't boot without doing a little more work first. Specifically, we need to make an initrd (ramdisk) suitable for running under EC2.
    There's two ways this can be done. One way is to use "mkinitrd" within a running instance of our system, as described below.
    The second way is to use "mkinitrd" on your local system. This is the cleaner method if you don't want to boot any of the amazon kernels. On your local system, you can run /sbin/mkinitrd within a chroot'ed environment on our /mnt/ec2-fs partition, before we bundle the AMI in step 12 (above.) If this generates errors, then don't chroot - just run your local copy of /sbin/mkinitrd and don't forget to copy the resulting img file to /mnt/ec2-fs/boot/ and update the menu.lst accordingly.
  16. So launch the AMI with an appropriate, non-default Amazon-provided kernel and ramdisk. Do not use the "default" because it will use the PV-GRUB kernel and try to run our own kernel. I selected the aki-f006f399 and ari-f406f39d based on my 64-bit operating system. Choosing these will circumvent the kernels on the filesystem and instead boot Amazon's kernel and then mount our filesystem as the root directory. For a list of kernels and ramdisks and their descriptions, run the following command:

    ec2-describe-images -o amazon --filter "image-type=kernel" (or "image-type=ramdisk")
  17. Log into the instance.
  18. yum update kernel-xen (optional)
  19. Create an initrd (ramdisk) suitable for running under EC2:
    mkinitrd -f -v --allow-missing --builtin uhci-hcd --builtin ohci-hcd --builtin ehci-hcd --preload xennet --preload xenblk --preload dm-mod --preload linear --force-lvm-probe /boot/initrd-2.6.x-x.x.el5xenramdisk.img 2.6.x-x.x
  20. Make sure /boot/grub/menu.lst uses the new ramdisk by updating as follows:
    initrd /boot/initrd-2.6.x-x.x.x.el5xenramdisk.img
  21. If you'd like to install the API tools do the following (optional)
    yum install java
    cd /usr/local/
    unzip the file
    create a soft link ec2-api-tools ---> to the unzipped directory
    printf 'export JAVA_HOME=/usr\nexport EC2_HOME=/opt/ec2-api-tools\nexport PATH=$PATH:$EC2_HOME/bin/\n' > /etc/profile.d/
    . /etc/profile.d/
    add the following two lines to /root/.bashrc:
    export EC2_PRIVATE_KEY=~/.ec2/pk-aws.pem
    export EC2_CERT=~/.ec2/cert-aws.pem

    source /root/.bashrc
  22. If you'd like to install AMI tools (optional)
    yum install ruby
    rpm -ivh ec2-ami-tools.noarch.rpm
  23. Now bundle a new AMI from within our running instance. The bundling process needs to mount a loop device, and in order to do this we need ec2-modules installed. So temporarily install the appropriate ec2-modules in /lib/modules
    for example: wget
    unpack it and move the lib/modules/xxxxxx under /lib/modules
  24. ec2-authorize default -p 22
  25. Bundle the running image. If you want you can first shut off services. Exclude the /lib/modules/xxxx directory that was installed. You don't need to exclude /proc/ /sys or /dev because the bundling process already excludes them. Choose the PV-GRUB kernel for S3 and x86_64:
    ec2-bundle-vol -r x86_64 -p <newname> -u ###### -k /root/.ec2/pk-aws.pem -c /root/.ec2/cert-aws.pem -e /mnt,/lib/modules/xxxxx --kernel aki-427d952b --no-inherit
  26. ec2-upload-bundle -b <newbucketname> -m /tmp/<newname>.manifest.xml -a XXXX -s XXXX
  27. Register and launch the new S3-backed image using the "Default" kernel and ramdisk, which will boot the PV-GRUB aki specified during the bundle process. This aki will in turn examine our /boot/grub/menu.lst and boot the kernel and new initrd specified in this file.
  28. Examine the console output. There should be no error messages about /lib/modules. The correct kernel should have loaded.

You now have the S3-backed instance running vanilla Centos with the xen kernel that came with the distribution.

Build Your Own EBS-Backed AMI

Most of the work has already been done. Now it's just a matter of copying our S3-backed installation onto an EBS volume and creating an AMI from the volume. The key points to realize are as follows:

  • According to Amazon documentation, the /boot files need to be located on the first partition of the block device. The block device is the EBS volume. Therefore it needs to be partitioned. You can't use the whole volume as a single filesystem. (This is okay if you are booting Amazon-provided aki images but not if using our own kernel)
  • The PV-Grub bootloader is basically Grub, so you need to specify "root (hd0,0)" like you do on your data-center hardware (not  "root (hd0)" like with S3 where there is no parition requirement for the block device), since grub is loading the first partition
  • The EBS volume can be any size you like
  • If you wanted to boot and use one of Amazon's kernels instead of your own, you wouldn't care about partitioning the EBS volume because their kernels don't seem to require it
Therefore, the menu.lst is the same as above, except use "root (hd0,0)"
  1. Create a 5GB EBS volume
  2. Attach it to the running S3-backed instance that we launched in the first section above. You may attach the volume as  /dev/sdf
  3. Log into the S3-backed instance
  4. create partition /dev/sdf1
  5. mkfs.ext3 /dev/sdf1
  6. tune2fs -c 0 /dev/sdf1
  7. mkdir /mnt/target
  8. mount /dev/sdf1 /mnt/target
  9. nice -20 rsync -avHx / /mnt/target
    (Note that there is no need to copy /dev, /proc/ or /sys)
  10. edit /mnt/target/boot/grub/menu.lst and change "root (hd0)" to "root (hd0,0)"
  11. sync && umount /mnt/target
  12. Now for swap space, you can either make a partition on the EBS volume, or you can use ephemeral storage as in the previous example. I'll do ephemeral storage here, so there is no need to change the /etc/fstab.
  13. detach the EBS volume
  14. create a snapshot of the volume
  15. Use the following command to register the snapshot as an AMI, exposing the block devices as follows, where "false" indicates that the snapshot should be preserved upon termination if you so desire, and the kernel id corresponds to the PV-GRUB kernel for 64-bit, US-EAST-1A region, EBS-backed instances:
  16. ec2-register -b "/dev/sda=<snapshot id>::false" -b "/dev/sdb=ephemeral0" -n "AMI name" -d "description" -a x86_64 --kernel aki-4e7d9527 -K /root/.ec2/pk-aws.pem -C /root/.ec2/cert-aws.pem
Note that the block device mappings can also be specified from ec2-run-instances command line. So the choice here is whether to permanently put it in the AMI via ec2-register, or specify it at runtime via ec2-run-instances.

You now have the EBS-backed instance running vanilla 64-bit Centos5.6 with the xen kernel that came on the CentOS installation media.

Backup your EC2 server to an S3 bucket

Here's a script that creates a snapshot of your running EC2 server's filesystem, compresses it, gives the compressed file a datestamp, and uploads it to an S3 bucket. 
This file is a point-in-time backup, and can later be downloaded to any Linux server, uncompressed, and mounted with "mount -o loop".At this point, files can be extracted from
it as needed.

Note: normally the ec2-bundle-vol command is run from an EBS-backed instance and the resulting manifest file can be registered as a brand new AMI.
However, this does not work when running that command from an S3-backed instance, so in this case only the filesystem image, not the manifest file, is used.

 Modify the following script for the appropriate paths as suitable to your environment. This script uses your server's ephemeral drive for temporary files.


# this script produces a gzip file and transfers it to an S3 bucket for
# safe storage. If you take this file, put it on any Linux box, and unzip it,
# what you have is a file that can be mounted with "mount -o loop" and it
# contains a snapshot of the filesystem. Files can be restored as needed.

# Note that using ec2-bundle-vol from an S3-backed instance creates a manifest file
# that cannot be registered as an AMI, so the only useful thing is the
# filesystem image created as part of the bundling.
NAME="server-backup-`date +%m%d%y`"

# make the new backup files
/bin/nice +10 /opt/aws/bin/ec2-bundle-vol -rx86_64 -p ${NAME} -u <your amazon user id> -k <your X509 private pem>  -c <your X509 public pem> --kernel < your server's aki> -d /media/eph
echo Return code from bundling: $?

# compress the filesystem image
/bin/nice +10 /bin/gzip --best /media/ephemeral0/${NAME}

echo Return code from gzip: $?

# upload it to the bucket

/usr/bin/s3put -a <your access key> -s <your secret key> -b server-backup /media/ephemeral0/${NAME}.gz

Once you have the above script, be sure to change permissions make it executable and read-only by root (you don't want other users knowing your Amazon credentials). Then add it to root's crontab to do the backup every three days, as shown in the following cron example. Modify
to suit your environment. Then make sure you create your S3 bucket called "server-backup." Since an S3 bucket has finite size (10GB), you should set a bucket lifecycle rule to automatically purge older backup files. If you want to permanently keep certain files (like,
for example, monthly backups), simply create a second backup script that designates a different NAME for the permanent backups, create a separate crontab entry for this script, and exclude these backup files from the bucket lifecycle rule.

MAILTO=<your email address>
0 2 */3 * * /root/bin/