blog dds

2012.06.01

Why are AWS Command-Line Tools so Slow?

Amazon's Elastic Compute Cloud command-line tools are useful building blocks for creating more complex shell scripts. They allow you to start and stop instances, get their status, add tags, manage storage, IP addresses, and so on. They have one big disadvantage: they take a long time to run. For instance, running ec2-describe-instances for six instances takes 19 seconds on an m1.small AWS Linux instance. One answer given, is that this is caused by JVM startup overhead. I found that hard to believe, because on the same machine a Java "hello world" program executes in 120ms, and running ec2-describe-instances --help takes just 321ms. So I set out to investigate, and, using multiple tracing tools and techniques, this is what I found.

By running the ec2-describe-instances shell script using the shell's -x flag I found that ec2-describe-instances calls ec2-cmd.

$ bash -x `which ec2-describe-instances`
+ __ZIP_PREFIX__EC2_HOME=/opt/aws/apitools/ec2
+ __RPM_PREFIX__EC2_HOME=/usr/local/aes/cmdline
+ /opt/aws/apitools/ec2/bin/ec2-cmd DescribeInstances
Using the same technique I found that ec2-cmd runs Java on the class com.amazon.aes.webservices.client.cmd.DescribeInstances.
$ bash -x /opt/aws/apitools/ec2/bin/ec2-cmd DescribeInstances
[...]
+ /usr/lib/jvm/java/bin/java -classpath :/opt/aws/apitools/ec2/lib/activation-1.1.jar:[...] com.amazon.aes.webservices.client.cmd.DescribeInstances
So from then on I invoked Java directly to avoid the noise caused by the shell scripts, as their time overhead was negligible.

I then checked the interaction between the command and the AWS web services. I did this by running tcpdump and strace in parallel and looking at the logs' timestamps. The following command starts logging packets, runs the Java program logging, and then stops logging packets. On an otherwise idle machine this gives you a very clear packet trace.

(sudo tcpdump -n >tcpdump.out &) ; \
strace -f -o strace.out -T -tt \
/usr/lib/jvm/java/bin/java \
-classpath :/opt/aws/apitools/ec2/lib/activation-1.1.jar:[...] \
com.amazon.aes.webservices.client.cmd.DescribeInstances ; \
sudo killall tcpdump
Ignoring some ssh interactions, the first packets came out at 18:51:33
18:51:33.053700 IP 10.39.93.124.36632 > 172.16.0.23.domain: 10259+ A? ...
18:51:33.053985 IP 10.39.93.124.36632 > 172.16.0.23.domain: 17669+ AAAA? ...
18:51:33.175941 IP 172.16.0.23.domain > 10.39.93.124.36632: 10259 1/0/0 A (51)
18:51:33.209711 IP 172.16.0.23.domain > 10.39.93.124.36632: 17669 0/1/0 (113)
18:51:33.211305 IP 10.39.93.124.57985 > 172.16.0.23.domain: 64553+ PTR? ...
This was 33 seconds after the program had started running.
30775 18:51:00.087534 execve("/usr/lib/jvm/java/bin/java",
  ["/usr/lib/jvm/java/bin/java", "-XX:-UsePerfData", "-classpath",
  "/opt/aws/apitools/ec2/lib/activa"...,
  "com.amazon.aes.webservices.clien"...],
  [/* 31 vars */]) = 0 <0.000417>
30775 18:51:00.088415 brk(0)            = 0x1e9d000 <0.000019>
30775 18:51:00.088576 mmap(NULL, 4096, PROT_READ|PROT_WRITE,
  MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f915f52a000 <0.000021>
30775 18:51:00.088679 readlink("/proc/self/exe",
  "/media/ephemeral0/opt/jdk1.7.0_04/jre/bin/java", 4096) = 46 <0.000045>
Therefore, I had to concentrate on Java's startup performance and ignore the AWS interactions. In retrospect that should have been a warning that I should have noticed.

I then run strace with the -c flag, which produces a cumulative count of the system calls and the time spent in them. The results were interesting.

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.34   18.562341        4428      4192       571 futex
  1.07    0.201412           1    178548           gettimeofday
  0.27    0.051676          47      1094           mprotect
  0.12    0.023182           4      6428      5687 stat
  0.11    0.020168         160       126           madvise
  0.06    0.011662           1     11292           lseek
  0.01    0.002735           0     13208           read
Reasoning that futex was simply showing the synchronization between threads I decided to try to eliminate gettimeofday, because of a reference I found regarding its effect on virtualized machines. After some searching and reading the command's invocation code I found I could eliminate the gettimeofday calls by setting an environment variable.
export EC2_JVM_ARGS=-XX:-UsePerfData
This reduced the clutter in the strace file, but didn't improve the performance.

I then tested how long it took to open and read all the files, by writing a command sequence that got the file names from the trace log and concatenated their contents.

 fgrep ' open(' strace.out |
 grep -v ENOENT |
 sed 's/",.*//;s/.*"//' |
 egrep -v '^/(proc|dev)' |
 xargs cat |
 wc -c
The total size of the output was 25MB and the command run swiftly. So simply reading the files was not costly.

Another thing that struck me in the trace file were thousands of failed calls to open class files in the current directory.

stat("/home/dds/org/codehaus/xfire/fault/XFireFault.class", 0x7f82cb39ffd0) = -1 ENOENT (No such file or directory)
stat("/home/dds/org/codehaus/xfire/XFireException.class", 0x7f82cb39e280) = -1 ENOENT (No such file or directory)
stat("/home/dds/com/amazon/aes/webservices/client/ImageAttribute.class", 0x7f82cb39ffd0) = -1 ENOENT (No such file or directory)
stat("/home/dds/com/amazon/aes/webservices/client/KernelAttribute.class", 0x7f82cb39ffd0) = -1 ENOENT (No such file or directory)
stat("/home/dds/com/amazon/aes/webservices/client/ImageValueAttribute.class", 0x7f82cb39e280) = -1 ENOENT (No such file or directory)
stat("/home/dds/com/amazon/aes/webservices/client/RamdiskAttribute.class", 0x7f82cb39ffd0) = -1 ENOENT (No such file or directory)
stat("/home/dds/com/amazon/aes/webservices/client/BlockDeviceMappingAttribute.class", 0x7f82cb39ffd0) = -1 ENOENT (No such file or directory)
stat("/home/dds/com/amazon/aes/webservices/client/ImageMapAttribute.class", 0x7f82cb39e280) = -1 ENOENT (No such file or directory)
stat("/home/dds/com/amazon/aes/webservices/client/ProductCodesAttribute.class", 0x7f82cb39ffd0) = -1 ENOENT (No such file or directory)
stat("/home/dds/com/amazon/aes/webservices/client/ImageListAttribute.class", 0x7f82cb39e280) = -1 ENOENT (No such file or directory)
stat("/home/dds/com/amazon/aes/webservices/client/LaunchPermissionAttribute.class", 0x7f82cb39ffd0) = -1 ENOENT (No such file or directory)
I traced the cause to be the way the invocation shell script built the class path by concatenating all jar files in the library.
for jar in "${LIBDIR}"/*.jar ; do
        CP="${CP}:${jar}"
done
This sequence results in the class path, stored in the CP variable, starting with a colon, which is taken to mean to search in the current directory. I fixed that problem by adding the following line after the for loop:
CP=$(echo $CP | sed 's/://')
This seemed to improve the command's execution time, but only by about a couple of seconds.

While examining the strace files I found many long read/seek sequences, like the following.

13190 15:07:29.981602 lseek(3, 56113794, SEEK_SET) = 56113794 <0.000034>
13190 15:07:29.981713 read(3, "PK\3\4\n\0\0\0\0\0\345)p@\216a\310\346\f\3\0\0\f\3\0\0001\0\0\0", 30) = 30 <0.000034>
13190 15:07:29.981848 lseek(3, 56113873, SEEK_SET) = 56113873 <0.000034>
13190 15:07:29.981953 read(3, "\312\376\272\276\0\0\0001\0%\n\0\4\0\26\n\0\27\0\30\7\0\31\7\0\32\1\0\6"..., 780) = 780 <0.000036>
13190 15:07:29.986162 lseek(3, 56112935, SEEK_SET) = 56112935 <0.000040>
13190 15:07:29.986288 read(3, "PK\3\4\n\0\0\0\0\0\345)p@\261\311$\200\f\3\0\0\f\3\0\0001\0\0\0", 30) = 30 <0.000033>
13190 15:07:29.986410 lseek(3, 56113014, SEEK_SET) = 56113014 <0.000031>
13190 15:07:29.986507 read(3, "\312\376\272\276\0\0\0001\0%\n\0\4\0\26\n\0\27\0\30\7\0\31\7\0\32\1\0\6"..., 780) = 780 <0.000043>
13190 15:07:29.990264 lseek(3, 56112080, SEEK_SET) = 56112080 <0.000036>
13190 15:07:29.990379 read(3, "PK\3\4\n\0\0\0\0\0\345)p@\32\17|\317\10\3\0\0\10\3\0\0001\0\0\0", 30) = 30 <0.000042>
13190 15:07:29.990515 lseek(3, 56112159, SEEK_SET) = 56112159 <0.000034>
13190 15:07:29.990626 read(3, "\312\376\272\276\0\0\0001\0%\n\0\4\0\26\n\0\27\0\30\7\0\31\7\0\32\1\0\6"..., 776) = 776 <0.000040>
13190 15:07:29.994931 lseek(3, 56111220, SEEK_SET) = 56111220 <0.000047>
13190 15:07:29.995065 read(3, "PK\3\4\n\0\0\0\0\0\345)p@$\333\211\31\f\3\0\0\f\3\0\0002\0\0\0", 30) = 30 <0.000044>
13190 15:07:29.995204 lseek(3, 56111300, SEEK_SET) = 56111300 <0.000035>
13190 15:07:29.995316 read(3, "\312\376\272\276\0\0\0001\0%\n\0\4\0\26\n\0\27\0\30\7\0\31\7\0\32\1\0\6"..., 780) = 780 <0.000043>
13190 15:07:29.999541 lseek(3, 56110361, SEEK_SET) = 56110361 <0.000043>
13190 15:07:29.999667 read(3, "PK\3\4\n\0\0\0\0\0\345)p@2u\340\335\v\3\0\0\v\3\0\0002\0\0\0", 30) = 30 <0.000047>
13190 15:07:29.999809 lseek(3, 56110441, SEEK_SET) = 56110441 <0.000037>
This got me thinking about the performance implications of these, and whether the seek calls were somehow interacting with Amazon's Simple Storage Service (S3), which was used as the machine's backing storage. Searching for data on AWS's performance, I came across an excellent technical report written by Simson L. Garfinkel of Harvard's Computer Science Group. There he mentioned that S3 performance depended on the size of the objects stored. Although the operating system's file system and buffer cache should isolate the seek system calls from S3, I decided to test whether using local storage would improve the performance. I therefore modified the instance's startup command to attach some local storage to the machine
ec2-run-instances $AMI -b /dev/sdb=ephemeral0 ...
and moved most of the files Java was opening into it.
tar cf - /opt/aws | sudo tar -C /media/ephemeral0/  -xf -
sudo ln -s /media/ephemeral0/opt/ /opt
tar cf - /usr/lib/jvm | sudo tar -C /media/ephemeral0/ -xf -
sudo ln -s /media/ephemeral0/usr/lib/jvm/ /usr/lib/jvm
tar cf - /usr/share | sudo tar -C /media/ephemeral0/ -xf -
sudo ln -s /media/ephemeral0/usr/share/ share
However, I didn't see any performance benefit.

I then considered whether all those seek/read calls were an inefficiency of the OpenJDK that comes with Amazon's AWS Linux. Could Sun's (now Oracle's) Java runtime be more efficient? Based on instructions by Dmitry Leskov I downloaded and run Orcacle's Java 1.7, but the performance appeared to deteriorate a bit. Another dead-end.

In the end it turned out that all the solutions I tried produced negligible or (in the case of Oracle's JVM negative) results:

  • removing thousands of calls to gettimeofday,
  • optimizing Java's library path,
  • using local storage,
  • using Oracle's JVM.

Apparently, AWS command-line tools are indeed slow due to Java's long startup time. Dynamic class loading ensures that running small programs or even an AWS command with the -help flag isn't slow, but when the command needs to do some actual work and thus loads all the required class files, it takes many seconds to run.

So what can one do? Here are some things that do work.

  • Use a faster multi-core machine: an m1.large instance runs the tools twice as fast as an m1.small one.
  • Make the command work harder for you. For instance, pass multiple instance ids to ec2-terminate-instances, or multiple tags to ec2-create-tags, instead of calling each command multiple times.
  • Use the API from another language. As an example, you can obtain your machine's instance id in just 10ms by using the following shell command.
    curl -s http://169.254.169.254/2011-01-01/meta-data/instance-id
If you're looking for the answer on why the AWS command-line tools are slow I hope that this blog entry has saved you the expense of running down a few blind alleys. Otherwise, I hope you found out how you can use tracing tools to investigate performance problems.

Read and post comments    AddThis Social Bookmark Button


Creative Commons License Last modified: Friday, June 1, 2012 11:14 pm
Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-Share Alike 3.0 Greece License.