Sankasaurus

Just another tech blog – ranting since 2006

Creating large empty files quickly

Posted by Peter Sankauskas on October 16th, 2012

I have seen many people on the interwebs using dd and /dev/zero to create empty files. This is great if the file is small, but for a 50Gb file, it simply takes too long, particularly on EC2. The solution? Truncate!

truncate -s 50G my-large-file

Boom – instant.

This is great for doing things like mounting /tmp on EC2 to the ephemeral storage so /tmp is not limited to 10Gb (or whatever your root image size is)

cd /mnt
truncate -s 50G big-tmp
mkfs.xfs /mnt/big-tmp
mount -o loop /mnt/big-tmp /tmp

			

Tags: , , , ,
Posted in Blog | No Comments »

Git for OSX

Posted by Peter Sankauskas on December 9th, 2011

Hi have just discovered a wonderful GUI for Git on Mac OSX. It is called GitX (L) (which is different to GitX).

You can get it here: http://gitx.laullon.com/

Tags: , ,
Posted in Blog | No Comments »

I no longer recommend Rackspace Cloud

Posted by Peter Sankauskas on July 29th, 2011

Regular readers of this blog (currently 6 of you madmen) will notice I removed the “PAS Recommends” section linking to Rackspace Cloud. That is because I can simply no longer recommend them. It is a sad day.

They are wasting resources by creating pointless iPhone and Android applications, and sending out survey after survey, all while the Management Console continues to get worse and worse. Right now in fact, I get a “white screen” after logging in.

Amazon’s Web Services are kicking Rackspace Cloud’s arse by continuously improving and upgrading their services. AWS has it all, and is only getting better while Rackspace Cloud sits dormant. I cannot believe we still cannot create an image of a machine larger than 75Gb, and that backups fail silently if the size does increase.

RIP Rackspace Cloud. I had much higher hopes for you.

Tags: , ,
Posted in Blog | No Comments »

Installing CDH3 on OS X

Posted by Peter Sankauskas on July 29th, 2011

Installing Cloudera’s version of Hadoop on an OSX Macbook Pro is not difficult, if you get the steps right.

Go to: https://ccp.cloudera.com/display/SUPPORT/CDH3+Downloadable+Tarballs
and download the Hadoop tarball.

We are going to run Hadoop in pseudo-distributed mode, which is nice in a dev environment.

Open up a Terminal window, and run:

tar xvzf ~/Downloads/hadoop-0.20.2-cdh3u1.tar.gz
cd hadoop-0.20.2-cdh3u1/conf
cp ../example-confs/conf.pseudo/* .

Now we need to edit 2 files so that Hadoop knows where to write it’s data. This is when you decide where to write it. I did:

mkdir -p ~/hadoop-data/cache/hadoop/dfs/name

Edit core-site.xml

<property>   <name>hadoop.tmp.dir</name>   <value>/Users/${user.name}/hadoop-data/cache/${user.name}</value> </property>

Next, edit hdfs-site.xml

<property>   <name>dfs.name.dir</name>   <value>/Users/${user.name}/hadoop-data/cache/hadoop/dfs/name</value> </property>

Finally, format HDFS, and start up the nodes:

cd ../bin
./hadoop namenode -format
./start-all.sh

If you are typing in your password a lot, try this (assuming you have your SSH keys set up):

cd ~/.ssh
cp id_rsa.pub authorized_keys

If you have upgraded to OS X Lion (v 10.7), then you might see this every time you do something:

2011-07-29 21:22:05.997 java[7690:1903] Unable to load realm info from SCDynamicStore

You can ignore it. It has something to do with Kerberos authentication (I think), but I don’t yet have a solution to getting rid of it.

Tags: , , , , ,
Posted in Blog | No Comments »

Git and SVN working together

Posted by Peter Sankauskas on January 12th, 2011

I have been using Subversion for a long time, and am relatively new to Git. This post is a little tutorial of what I have learnt getting Git and SVN to play nicely together, primarily using git-svn.

My goal is to maintain the code in the original SVN repository while transitioning the team to Git. This means changes to either repository get reflected into the other one.

First steps is to take the create an empty Git repository to import SVN into (this is your remote Git repository):

cd $HOME/git-repo
mkdir project
cd project
git --bare init

Now clone the empty Git repository so you have a working directory, and the import the SVN repository into Git. The directory names need to match (in this case, they are both “project”):

cd $HOME
git clone file://$HOME/git-repo/project
git svn clone -s file://$HOME/svn-repo/project

If all is going well, you should have a “project” directory with all of your files imported from SVN in it. This directory is also your Git clone. Now you can “push” these changes to your remote Git repository

cd $HOME/project
git push origin master

Now make some changes to your working directory

echo "This is a change in Git" > git-change.txt
git add git-change.txt
git commit -m "Adding a new file"
git push

We now have a change in Git that is not in SVN. To copy the change over to SVN we do a “dcommit”

git svn dcommit

Lets check out the SVN repository, and make a change in there

cd $HOME
svn checkout file://$HOME/svn-repo/project/trunk svn-project
cd svn-project
echo "Here is a change in SVN" > svn-change.txt
svn add svn-change.txt
svn commit -m "Adding a new change in SVN"

To get this change in Git, we need to “rebase”. This is not as scary as it sounds

cd $HOME/project
git pull
git svn rebase
git push

Horay! We now know how to make changes go both ways between SVN and Git.

Tags: , , ,
Posted in Blog | No Comments »

Android Market links

Posted by Peter Sankauskas on August 25th, 2010

For those of you like me that have totally missed the Publishing page, here is how to create a link to your Android application in the Android Market:

market://details?id=<packagename>

OR

http://market.android.com/details?id=<packagename>

So for Remembory, I have this:

market://details?id=com.gbott.remembory

The “old” way was to link to the search page by using this:

market://search?q=pname:<package>

… but the details page method saves the user a click, or a tough decision when there are two applications both called Remembory.

Tags: , , , ,
Posted in Blog | No Comments »

Rackspace Cloud images to Cloud Files

Posted by Peter Sankauskas on July 26th, 2010

For doing anything serious on Rackspace Cloud, you need to be able to use machines larger than 2Gb of RAM. Problem is, machines larger than 2Gb of RAM cannot be imaged (or backed up) – until now. About a month ago, they announced the ability to snapshot a machine into Cloud Files. Today I decided to take it for a spin and hit Snag number one: there was no way to do an image on my new 16Gb machine. After talking talking to one person at Rackspace Cloud, I was none the wiser – Snag two was the lack of training for their support staff. After asking for the supervisor, he created a ticket for the approval for the terms (letting me know I would be charged for the storage in Cloud Files) and off I went. Until I hit Snag 3 – the “Images” tab for the server details still showed no way of performing an image. I was told to use the “My Server Images” and HORAY I could make an image of my 16Gb machine.

Finally the feature that prevented my company from using Rackspace Cloud, and instead using AWS, was fixed! Congratulations to the Rackspace Cloud team.

Another thing I learnt today is that there are two data centers for Rackspace Cloud machines – DFW and ORD. The first server you provision gets assigned to one of the two data centers, and which ever one it gets put into is the data center that all of your other servers will be put into as well. So if the first server gets put into DFW, then all of the other ones you create will be. That is until you delete all of your servers. Then once again, the first server can get put into any data center.

I am hoping that in the future, we will have a choice about which data center a server is provisioned in, particularly since it will help for geographical distribution.

Tags: , ,
Posted in Blog | 2 Comments »

JSON SerDe for Hive

Posted by Peter Sankauskas on February 16th, 2010

I have added another open source project to my list – a JSON SerDe for Hive. You can check it out here:

http://code.google.com/p/hive-json-serde/

This SerDe (serializer/deserializer) will let you read JSON files as input for Hive tables. In the future, it will also support writing JSON data, but that is for another day.

Please let me know if you have any comments or questions about it.

Tags: , , , , ,
Posted in Blog | No Comments »

Google Takes First Step

Posted by Peter Sankauskas on February 4th, 2010

Google have taken a step in the right direction by offering people a monetary reward when they find a bug in Chromium. I am very excited by this news, not because I am likely to find a bug and cash in, but because it shows Google is starting to take some accountability.

Step 2 would be public bug trackers for ALL of their systems, particularly the online ones such as Gmail, Docs, Sites, etc.

Step 3 would be Googlers actually paying attention to them, maybe even hiring a customer support team.

Fingers crossed!

Tags:
Posted in Blog | No Comments »

Google – Hexus One

Posted by Peter Sankauskas on January 8th, 2010

When I first told my wife about Google releasing a phone, the very first words out of her mouth were:

“I would never buy a phone from Google. If something went wrong, I’m screwed! Its not like I can take it into a shop and get it fixed”

How right she was. According to Slashdot, Google is facing a deluge of customer complaints about the Nexus One. Are you having problems? Use the hash tag #fixgoogle if you are on Twitter. All tweets with that tag will appear on fixgoogle.com when I get the site up and running.

Tags: , ,
Posted in Blog | No Comments »