Google have taken a step in the right direction by offering people a monetary reward when they find a bug in Chromium. I am very excited by this news, not because I am likely to find a bug and cash in, but because it shows Google is starting to take some accountability.
Step 2 would be public bug trackers for ALL of their systems, particularly the online ones such as Gmail, Docs, Sites, etc.
Step 3 would be Googlers actually paying attention to them, maybe even hiring a customer support team.
If you are using EC2, you will quickly find that if an instance is terminated, any data on that instance is gone – lost forever. At first, this seems like a terrible idea, but in fact, it encourages you to get into best practices, and discover the awesome benefits of EBS.
We have many instances running of different types. We have built a “custom” Debian AMI for each of the instance types we use (web, database, management, etc). If you were to launch an instance with one of these AMIs, you would not have a fully working system. That is because these AMIs have sym-links for important and/or dynamic data. For example, on the web AMI we have created, /etc/apache2, /etc/php5/ and /var/www are all sym-links. To where? A directory that an EBS volume is mounted to. That’s right, all of the web configuration and website code only lives in an EBS volume. It is simple enough to write a little script that creates a nightly Snapshots of each EBS volume.
Now for the power of this setup. Every time you want to bring up another instance of the same type (say, for horizontally scaling), you are in fact doing a restoration from backup. Take a Snapshot (your backup), create an EBS volume, attach it to the new instance, and make it live! This doesn’t just work for scaling, it works for bringing up staging servers that are mirrors of production or running experiments without affecting production.
We can even take it a step further! Those AMIs and Snapshots are all stored in S3 – data available to the whole Region. An instance and EBS volume exist in only 1 of the Availability Zones within that Region. You can use your backups to restore into a new Availability Zone which you can use to create a high-availability solution.
* I don’t know Joel personally – we have never met – but I do follow his work, like his company and LOVE Fogbugz!
I am please to announce to all my fans (haha) the AWS Start-up Challenge video shot at Motally is finally live! This was my first ever professional video shoot – they spent 5 hours at our office to edit it down to 2 minutes. Motally was one of the 7 finalists, out of more than 1000 entries. Congratulations to GoodData and Bizo for taking the top prizes!
I am pleased to annouce my very first open source project hosted at github. The project, called Rackspace Cloud PHP Library is a simple, single PHP file to easily make Cloud Server API calls. Rackspace have not yet released any libraries for their API, possibly because it is still kind of in beta. If they do, I believe there will be of little use for my project, but right now, it has value.
So what was my motivation for this?
Well the Rackspace Cloud Management Console is severely lacking in features. Things such as creating an image of a server (just like you can in AWS EC2), and sharing an IP address between servers (something you cannot do in EC2 – an IP address can only be attached to a single instance at a time). It is this second feature that I am most interested in because it means I can use a virtual IP address (floating IP) to create a HA (highly available) “cluster” of Tomcat servers. I plan on using keepalived to do the IP switching, and Apache with mod_proxy_balancer and mod_proxy_ajp to talk to multiple Tomcat servers. Without reading the poorly written API documention and learning that I could create a shared IP group, I would not have known this was possible.
This project is very much a work in progress, and what is in there now represents only about 8 hours of work. I welcome any feedback, and anyone else who wants to join.
For some strange reason, Chrome OS is getting a lot of press. Is it a slow news day?
They say that it is direct competition to Microsoft, that it makes Linux less relevant… are they serious? Chrome OS is a non-announcement. There is a project that has existed for over 2 years called “cl33n”. From the creator:
Chrome OS is “Google Chrome running within a new windowing system on top of a Linux kernel.” cl33n is “Mozilla Firefox running in a little-used windowing system on top of a Linux kernel.”
This “OS” is due to the released mid 2010. Is that how slowly things move inside Google? Why would it take them 12 months to create nothing more than cl33n?
What I am trying to say, is that Chrome OS is nothing new. Cl33n is not alone in this space either – other project like Webconverger share my view.
While on the subject of Google’s non-annoucements, did you hear that Gmail, Doc, etc are out of beta. Big news huh? So what is their excuse now for daily “Server error” dialogs?
I can’t believe it, but I won! I have been trying to set up Nagios on a RHEL5 machine running SELinux and have been loosing the fight for the last 3 days. But today, I win! This is such a win, it is worth sharing.
Now that I have won though, I believe this is not Nagios specific at all, and if I had bothered to learn about SELinux, this may have been obvious. Anyway, the error Nagios was giving me was:
Error: Could not stat() command file ‘/usr/local/nagios/var/rw/nagios.cmd’! The external command file may be missing, Nagios may not be running, and/or Nagios may not be checking external commands. An error occurred while attempting to commit your command for processing. Return from whence you came
As you may have already guess, the solution has nothing to do with the location or permissions of the file, the file was not missing, Nagios was running, and Nagios was checking external commands. The final line of the message is great though, and I can only hope we start to see more old English in error messages.
The problem of course, was that SELinux was enabled and stopping this blatant security violation. You can check to see if SELinux is on by running:
If you got “Permissive” or “Disabled”, then this post is not for you. To see SELinux’s side of things, check out /var/log/messages:
setroubleshoot: SELinux is preventing ping (ping_t) "read write" to /usr/local/nagios/var/spool/checkresults/checkrXH96b (usr_t). For complete SELinux messages. run sealert -l 1ffc2533-42b5-4e04-b7ab-a81bb7d02040
setroubleshoot: SELinux is preventing ping (ping_t) "read write" to /usr/local/nagios/var/spool/checkresults/checkrZxsA1 (usr_t). For complete SELinux messages. run sealert -l 178ba2d4-0822-47eb-9e32-bfaa19ee3c4b
setroubleshoot: SELinux is preventing cmd.cgi (httpd_sys_script_t) "getattr" to /usr/local/nagios/var/rw/nagios.cmd (httpd_sys_content_t). For complete SELinux messages. run sealert -l 4df0946e-8816-4b90-a7d1-37e743697b9c
As you can see, SELinux is trying to give you a hint with that sealert bit, so you should take it.
That raw audit message is GOLD! There is some other information in there, but nothing about what the next step should be to create a policy and make it permanent. Using chron I have heard is a temporary fix. The solution is copying that raw audit message into an empty file and running audit2allow to create a policy: