Sankasaurus

Just another tech blog – ranting since 2006

JSON SerDe for Hive

I have added another open source project to my list – a JSON SerDe for Hive. You can check it out here:

http://code.google.com/p/hive-json-serde/

This SerDe (serializer/deserializer) will let you read JSON files as input for Hive tables. In the future, it will also support writing JSON data, but that is for another day.

Please let me know if you have any comments or questions about it.

Google Takes First Step

Google have taken a step in the right direction by offering people a monetary reward when they find a bug in Chromium. I am very excited by this news, not because I am likely to find a bug and cash in, but because it shows Google is starting to take some accountability.

Step 2 would be public bug trackers for ALL of their systems, particularly the online ones such as Gmail, Docs, Sites, etc.

Step 3 would be Googlers actually paying attention to them, maybe even hiring a customer support team.

Fingers crossed!

Google - Hexus One

When I first told my wife about Google releasing a phone, the very first words out of her mouth were:

I would never buy a phone from Google. If something went wrong, I’m screwed! Its not like I can take it into a shop and get it fixed

How right she was. According to Slashdot, Google is facing a deluge of customer complaints about the Nexus One. Are you having problems? Use the hash tag #fixgoogle if you are on Twitter. All tweets with that tag will appear on fixgoogle.com when I get the site up and running.

Backups by design on AWS EC2

My good mate* Joel Spolsky wrote a nice piece about backups (or rather, restoration), and I wanted to echo his remarks and how they relate to using AWS EC2.

If you are using EC2, you will quickly find that if an instance is terminated, any data on that instance is gone – lost forever. At first, this seems like a terrible idea, but in fact, it encourages you to get into best practices, and discover the awesome benefits of EBS.

We have many instances running of different types. We have built a “custom” Debian AMI for each of the instance types we use (web, database, management, etc). If you were to launch an instance with one of these AMIs, you would not have a fully working system. That is because these AMIs have sym-links for important and/or dynamic data. For example, on the web AMI we have created, /etc/apache2, /etc/php5/ and /var/www are all sym-links. To where? A directory that an EBS volume is mounted to. That’s right, all of the web configuration and website code only lives in an EBS volume. It is simple enough to write a little script that creates a nightly Snapshots of each EBS volume.

Now for the power of this setup. Every time you want to bring up another instance of the same type (say, for horizontally scaling), you are in fact doing a restoration from backup. Take a Snapshot (your backup), create an EBS volume, attach it to the new instance, and make it live! This doesn’t just work for scaling, it works for bringing up staging servers that are mirrors of production or running experiments without affecting production.

We can even take it a step further! Those AMIs and Snapshots are all stored in S3 – data available to the whole Region. An instance and EBS volume exist in only 1 of the Availability Zones within that Region. You can use your backups to restore into a new Availability Zone which you can use to create a high-availability solution.

Happy scaling!

* I don’t know Joel personally – we have never met – but I do follow his work, like his company and LOVE Fogbugz!

Motally Video and Contest

I am please to announce to all my fans (haha) the AWS Start-up Challenge video shot at Motally is finally live! This was my first ever professional video shoot – they spent 5 hours at our office to edit it down to 2 minutes. Motally was one of the 7 finalists, out of more than 1000 entries. Congratulations to GoodData and Bizo for taking the top prizes!

I also want to call upon all of the mobile application developers out there. Motally is running a mobile analytics contest called Trackappalooza. Here you can win a pass to MWC in Barcelona, or up to $15,000 just for tracking your Android, iPhone or Blackberry app. Motally is the mobile analytics powerhouse providing tracking capabilities for mobile websites and mobile applications.

Good luck!

Innovate 09

If anyone is interested in going to the PayPal X (Innovate 09) conference, I have come across is discount. A special rate of $149 (50% off the price) if you enter promotion/coupon code: PayPalHR

Enjoy!

Dependency Nightmare for Tomcat on Debian

I would love to not have to install the real Java and Tomcat manually on Debian, but I have little choice in the matter. Take a look at this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
$ apt-get install tomcat5.5
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
ant ant-gcj ant-optional ant-optional-gcj antlr build-essential debhelper
default-jdk default-jre default-jre-headless defoma dpkg-dev ecj ecj-gcj fastjar
file fontconfig fontconfig-config g++ g++-4.3 gappletviewer-4.3 gcj-4.3
gcj-4.3-base gettext gettext-base gij-4.3 gjdoc hicolor-icon-theme html2text
intltool-debian java-common java-gcj-compat java-gcj-compat-dev
java-gcj-compat-headless jsvc libantlr-java libantlr-java-gcj libasound2
libatk1.0-0 libatk1.0-data libbcel-java libcairo2 libcommons-beanutils-java
libcommons-collections-java libcommons-collections3-java libcommons-daemon-java
libcommons-dbcp-java libcommons-digester-java libcommons-el-java
libcommons-launcher-java libcommons-logging-java libcommons-modeler-java
libcommons-pool-java libcompress-raw-zlib-perl libcompress-zlib-perl libcups2
libdatrie0 libdb4.5 libdigest-hmac-perl libdigest-sha1-perl libdirectfb-1.0-0
libecj-java libecj-java-gcj libexpat1 libfile-remove-perl libfontconfig1
libfontenc1 libfreetype6 libgcj-bc libgcj-common libgcj9-0 libgcj9-0-awt
libgcj9-dev libgcj9-jar libgcj9-src libglib2.0-0 libglib2.0-data libgtk2.0-0
libgtk2.0-bin libgtk2.0-common libice6 libio-compress-base-perl
libio-compress-zlib-perl libio-stringy-perl libjaxp1.3-java libjaxp1.3-java-gcj
libjpeg62 liblog4j1.2-java liblog4j1.2-java-gcj libmagic1 libmail-box-perl
libmail-sendmail-perl libmailtools-perl libmime-types-perl libmx4j-java
libobject-realize-later-perl libpango1.0-0 libpango1.0-common libpixman-1-0
libpng12-0 libregexp-java libservlet2.3-java libservlet2.4-java libsm6 libsqlite3-0
libstdc++6-4.3-dev libsys-hostname-long-perl libthai-data libthai0 libtiff4
libtimedate-perl libtomcat5.5-java libts-0.0-0 liburi-perl libuser-identity-perl
libxcb-render-util0 libxcb-render0 libxcomposite1 libxcursor1 libxdamage1
libxerces2-java libxerces2-java-gcj libxfixes3 libxfont1 libxft2 libxi6
libxinerama1 libxrandr2 libxrender1 libxtst6 make mime-support patch po-debconf
python python-central python-minimal python2.5 python2.5-minimal ttf-dejavu
ttf-dejavu-core ttf-dejavu-extra x-ttcidfont-conf xfonts-encodings xfonts-utils
...
0 upgraded, 146 newly installed, 0 to remove and 0 not upgraded.
Need to get 101MB of archives.
After this operation, 288MB of additional disk space will be used.
Do you want to continue [Y/n]?

WTF? I understand that Tomcat needs some kind of Java, but this is ridiculous. It is installing ant, fonts, compilers and worst of all, the most evil Java ever.

Ubuntu has the sense to make Sun Java available, but even if you do have Sun Java installed, the above is true on Ubuntu.

For shame!

I’ll stick to downloading from java.sun.com and tomcat.apache.org.

Rackspace Cloud API PHP Library

I am pleased to annouce my very first open source project hosted at github. The project, called Rackspace Cloud PHP Library is a simple, single PHP file to easily make Cloud Server API calls. Rackspace have not yet released any libraries for their API, possibly because it is still kind of in beta. If they do, I believe there will be of little use for my project, but right now, it has value.

So what was my motivation for this?

Well the Rackspace Cloud Management Console is severely lacking in features. Things such as creating an image of a server (just like you can in AWS EC2), and sharing an IP address between servers (something you cannot do in EC2 – an IP address can only be attached to a single instance at a time). It is this second feature that I am most interested in because it means I can use a virtual IP address (floating IP) to create a HA (highly available) “cluster” of Tomcat servers. I plan on using keepalived to do the IP switching, and Apache with mod_proxy_balancer and mod_proxy_ajp to talk to multiple Tomcat servers. Without reading the poorly written API documention and learning that I could create a shared IP group, I would not have known this was possible.

This project is very much a work in progress, and what is in there now represents only about 8 hours of work. I welcome any feedback, and anyone else who wants to join.

Download Chrome OS now – it’s called cl33n

For some strange reason, Chrome OS is getting a lot of press. Is it a slow news day?

They say that it is direct competition to Microsoft, that it makes Linux less relevant… are they serious? Chrome OS is a non-announcement. There is a project that has existed for over 2 years called “cl33n”. From the creator:

Chrome OS is “Google Chrome running within a new windowing system on top of a Linux kernel.”
cl33n is “Mozilla Firefox running in a little-used windowing system on top of a Linux kernel.”

This “OS” is due to the released mid 2010. Is that how slowly things move inside Google? Why would it take them 12 months to create nothing more than cl33n?

What I am trying to say, is that Chrome OS is nothing new. Cl33n is not alone in this space either – other project like Webconverger share my view.

While on the subject of Google’s non-annoucements, did you hear that Gmail, Doc, etc are out of beta. Big news huh? So what is their excuse now for daily “Server error” dialogs?

Fighting with SELinux and Nagios

I can’t believe it, but I won! I have been trying to set up Nagios on a RHEL5 machine running SELinux and have been loosing the fight for the last 3 days. But today, I win! This is such a win, it is worth sharing.

Now that I have won though, I believe this is not Nagios specific at all, and if I had bothered to learn about SELinux, this may have been obvious. Anyway, the error Nagios was giving me was:

Error: Could not stat() command file ‘/usr/local/nagios/var/rw/nagios.cmd’!
The external command file may be missing, Nagios may not be running, and/or Nagios may not be checking external commands.
An error occurred while attempting to commit your command for processing.
Return from whence you came

As you may have already guess, the solution has nothing to do with the location or permissions of the file, the file was not missing, Nagios was running, and Nagios was checking external commands. The final line of the message is great though, and I can only hope we start to see more old English in error messages.

The problem of course, was that SELinux was enabled and stopping this blatant security violation. You can check to see if SELinux is on by running:

1
2
$ /usr/sbin/getenforce
Enforcing

If you got “Permissive” or “Disabled”, then this post is not for you. To see SELinux’s side of things, check out /var/log/messages:

1
2
3
4
5
setroubleshoot: SELinux is preventing ping (ping_t) "read write" to /usr/local/nagios/var/spool/checkresults/checkrXH96b (usr_t). For complete SELinux messages. run sealert -l 1ffc2533-42b5-4e04-b7ab-a81bb7d02040

setroubleshoot: SELinux is preventing ping (ping_t) "read write" to /usr/local/nagios/var/spool/checkresults/checkrZxsA1 (usr_t). For complete SELinux messages. run sealert -l 178ba2d4-0822-47eb-9e32-bfaa19ee3c4b

setroubleshoot: SELinux is preventing cmd.cgi (httpd_sys_script_t) "getattr" to /usr/local/nagios/var/rw/nagios.cmd (httpd_sys_content_t). For complete SELinux messages. run sealert -l 4df0946e-8816-4b90-a7d1-37e743697b9c

As you can see, SELinux is trying to give you a hint with that sealert bit, so you should take it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ sealert -l 1ffc2533-42b5-4e04-b7ab-a81bb7d02040
Summary:
SELinux is preventing ping (ping_t) "read write" to
/usr/local/nagios/var/spool/checkresults/checkrXH96b (usr_t).

Detailed Description:

... (removed from post)

Raw Audit Messages

host=myhost.myisp.host type=AVC msg=audit(1241217029.141:125305): avc: denied { read write } for pid=32379 comm="ping" path="/usr/local/nagios/var/spool/checkresults/checkrXH96b" dev=sda3 ino=52894945 scontext=user_u:system_r:ping_t:s0 tcontext=user_u:object_r:usr_t:s0 tclass=file

host=myhost.myisp.host type=SYSCALL msg=audit(1241217029.141:125305): arch=c000003e syscall=59 success=yes exit=0 a0=153952a0 a1=15395330 a2=7fff75c5eb40 a3=0 items=0 ppid=32378 pid=32379 auid=503 uid=508 gid=508 euid=0 suid=0 fsuid=0 egid=508 sgid=508 fsgid=508 tty=(none) ses=1392 comm="ping" exe="/bin/ping" subj=user_u:system_r:ping_t:s0 key=(null)

That raw audit message is GOLD! There is some other information in there, but nothing about what the next step should be to create a policy and make it permanent. Using chron I have heard is a temporary fix. The solution is copying that raw audit message into an empty file and running audit2allow to create a policy:

1
2
3
4
5
6
7
8
9
10
11
$ cat > /tmp/tmp-nagiosping
host=myhost.myisp.host type=AVC msg=audit(1241217029.141:125305): avc: denied { read write } for pid=32379 comm="ping" path="/usr/local/nagios/var/spool/checkresults/checkrXH96b" dev=sda3 ino=52894945 scontext=user_u:system_r:ping_t:s0 tcontext=user_u:object_r:usr_t:s0 tclass=file
host=myhost.myisp.host type=SYSCALL msg=audit(1241217029.141:125305): arch=c000003e syscall=59 success=yes exit=0 a0=153952a0 a1=15395330 a2=7fff75c5eb40 a3=0 items=0 ppid=32378 pid=32379 auid=503 uid=508 gid=508 euid=0 suid=0 fsuid=0 egid=508 sgid=508 fsgid=508 tty=(none) ses=1392 comm="ping" exe="/bin/ping" subj=user_u:system_r:ping_t:s0 key=(null)
* Ctrl-D *

$ audit2allow -M NagiosPing < /tmp/tmp-nagiosping

******************** IMPORTANT ***********************
To make this policy package active, execute:

semodule -i NagiosPing.pp

This creates a file call NagiosPing.pp which contains the SELinux policy needed to make these errors go away. The only thing left to do is to install this policy:

1
$ semodule -i NagiosPing.pp

If your setup was like mine, SELinux was actually preventing 3 different actions, needing 3 different policies. HA! That is easy now – just repeat the steps until Nagios is doing your bidding.