Posted by Peter Sankauskas on October 16th, 2012
I have seen many people on the interwebs using dd and /dev/zero to create empty files. This is great if the file is small, but for a 50Gb file, it simply takes too long, particularly on EC2. The solution? Truncate!
truncate -s 50G my-large-file
Boom – instant.
This is great for doing things like mounting /tmp on EC2 to the ephemeral storage so /tmp is not limited to 10Gb (or whatever your root image size is)
cd /mnt
truncate -s 50G big-tmp
mkfs.xfs /mnt/big-tmp
mount -o loop /mnt/big-tmp /tmp
Tags: aws, cloud, linux, tech, ubuntu
Posted in Blog | No Comments »
Posted by Peter Sankauskas on December 9th, 2011
Hi have just discovered a wonderful GUI for Git on Mac OSX. It is called GitX (L) (which is different to GitX).
You can get it here: http://gitx.laullon.com/
Tags: git, osx, tech
Posted in Blog | No Comments »
Posted by Peter Sankauskas on July 29th, 2011
Regular readers of this blog (currently 6 of you madmen) will notice I removed the “PAS Recommends” section linking to Rackspace Cloud. That is because I can simply no longer recommend them. It is a sad day.
They are wasting resources by creating pointless iPhone and Android applications, and sending out survey after survey, all while the Management Console continues to get worse and worse. Right now in fact, I get a “white screen” after logging in.
Amazon’s Web Services are kicking Rackspace Cloud’s arse by continuously improving and upgrading their services. AWS has it all, and is only getting better while Rackspace Cloud sits dormant. I cannot believe we still cannot create an image of a machine larger than 75Gb, and that backups fail silently if the size does increase.
RIP Rackspace Cloud. I had much higher hopes for you.
Tags: aws, rackspace cloud, rant
Posted in Blog | No Comments »
Posted by Peter Sankauskas on July 29th, 2011
Installing Cloudera’s version of Hadoop on an OSX Macbook Pro is not difficult, if you get the steps right.
Go to: https://ccp.cloudera.com/display/SUPPORT/CDH3+Downloadable+Tarballs
and download the Hadoop tarball.
We are going to run Hadoop in pseudo-distributed mode, which is nice in a dev environment.
Open up a Terminal window, and run:
tar xvzf ~/Downloads/hadoop-0.20.2-cdh3u1.tar.gz
cd hadoop-0.20.2-cdh3u1/conf
cp ../example-confs/conf.pseudo/* .
Now we need to edit 2 files so that Hadoop knows where to write it’s data. This is when you decide where to write it. I did:
mkdir -p ~/hadoop-data/cache/hadoop/dfs/name
Edit core-site.xml
<property> <name>hadoop.tmp.dir</name> <value>/Users/${user.name}/hadoop-data/cache/${user.name}</value> </property>
Next, edit hdfs-site.xml
<property> <name>dfs.name.dir</name> <value>/Users/${user.name}/hadoop-data/cache/hadoop/dfs/name</value> </property>
Finally, format HDFS, and start up the nodes:
cd ../bin
./hadoop namenode -format
./start-all.sh
If you are typing in your password a lot, try this (assuming you have your SSH keys set up):
cd ~/.ssh
cp id_rsa.pub authorized_keys
If you have upgraded to OS X Lion (v 10.7), then you might see this every time you do something:
2011-07-29 21:22:05.997 java[7690:1903] Unable to load realm info from SCDynamicStore
You can ignore it. It has something to do with Kerberos authentication (I think), but I don’t yet have a solution to getting rid of it.
Tags: cdh3, cloudera, hadoop, mac, osx, tutorial
Posted in Blog | No Comments »
Posted by Peter Sankauskas on January 12th, 2011
I have been using Subversion for a long time, and am relatively new to Git. This post is a little tutorial of what I have learnt getting Git and SVN to play nicely together, primarily using git-svn.
My goal is to maintain the code in the original SVN repository while transitioning the team to Git. This means changes to either repository get reflected into the other one.
First steps is to take the create an empty Git repository to import SVN into (this is your remote Git repository):
cd $HOME/git-repo
mkdir project
cd project
git --bare init
Now clone the empty Git repository so you have a working directory, and the import the SVN repository into Git. The directory names need to match (in this case, they are both “project”):
cd $HOME
git clone file://$HOME/git-repo/project
git svn clone -s file://$HOME/svn-repo/project
If all is going well, you should have a “project” directory with all of your files imported from SVN in it. This directory is also your Git clone. Now you can “push” these changes to your remote Git repository
cd $HOME/project
git push origin master
Now make some changes to your working directory
echo "This is a change in Git" > git-change.txt
git add git-change.txt
git commit -m "Adding a new file"
git push
We now have a change in Git that is not in SVN. To copy the change over to SVN we do a “dcommit”
git svn dcommit
Lets check out the SVN repository, and make a change in there
cd $HOME
svn checkout file://$HOME/svn-repo/project/trunk svn-project
cd svn-project
echo "Here is a change in SVN" > svn-change.txt
svn add svn-change.txt
svn commit -m "Adding a new change in SVN"
To get this change in Git, we need to “rebase”. This is not as scary as it sounds
cd $HOME/project
git pull
git svn rebase
git push
Horay! We now know how to make changes go both ways between SVN and Git.
Tags: git, svn, tech, tutorial
Posted in Blog | No Comments »
Posted by Peter Sankauskas on August 25th, 2010
For those of you like me that have totally missed the Publishing page, here is how to create a link to your Android application in the Android Market:
market://details?id=<packagename>
OR
http://market.android.com/details?id=<packagename>
So for Remembory, I have this:
market://details?id=com.gbott.remembory
The “old” way was to link to the search page by using this:
market://search?q=pname:<package>
… but the details page method saves the user a click, or a tough decision when there are two applications both called Remembory.
Tags: advice, android, market, mobile, tech
Posted in Blog | No Comments »
Posted by Peter Sankauskas on July 26th, 2010
For doing anything serious on Rackspace Cloud, you need to be able to use machines larger than 2Gb of RAM. Problem is, machines larger than 2Gb of RAM cannot be imaged (or backed up) – until now. About a month ago, they announced the ability to snapshot a machine into Cloud Files. Today I decided to take it for a spin and hit Snag number one: there was no way to do an image on my new 16Gb machine. After talking talking to one person at Rackspace Cloud, I was none the wiser – Snag two was the lack of training for their support staff. After asking for the supervisor, he created a ticket for the approval for the terms (letting me know I would be charged for the storage in Cloud Files) and off I went. Until I hit Snag 3 – the “Images” tab for the server details still showed no way of performing an image. I was told to use the “My Server Images” and HORAY I could make an image of my 16Gb machine.
Finally the feature that prevented my company from using Rackspace Cloud, and instead using AWS, was fixed! Congratulations to the Rackspace Cloud team.
Another thing I learnt today is that there are two data centers for Rackspace Cloud machines – DFW and ORD. The first server you provision gets assigned to one of the two data centers, and which ever one it gets put into is the data center that all of your other servers will be put into as well. So if the first server gets put into DFW, then all of the other ones you create will be. That is until you delete all of your servers. Then once again, the first server can get put into any data center.
I am hoping that in the future, we will have a choice about which data center a server is provisioned in, particularly since it will help for geographical distribution.
Tags: backup, rackspace cloud, tech
Posted in Blog | 2 Comments »
Posted by Peter Sankauskas on February 16th, 2010
I have added another open source project to my list – a JSON SerDe for Hive. You can check it out here:
http://code.google.com/p/hive-json-serde/
This SerDe (serializer/deserializer) will let you read JSON files as input for Hive tables. In the future, it will also support writing JSON data, but that is for another day.
Please let me know if you have any comments or questions about it.
Tags: hadoop, hive, json, open source, serde, tech
Posted in Blog | No Comments »
Posted by Peter Sankauskas on February 4th, 2010
Google have taken a step in the right direction by offering people a monetary reward when they find a bug in Chromium. I am very excited by this news, not because I am likely to find a bug and cash in, but because it shows Google is starting to take some accountability.
Step 2 would be public bug trackers for ALL of their systems, particularly the online ones such as Gmail, Docs, Sites, etc.
Step 3 would be Googlers actually paying attention to them, maybe even hiring a customer support team.
Fingers crossed!
Tags: google
Posted in Blog | No Comments »
Posted by Peter Sankauskas on January 8th, 2010
When I first told my wife about Google releasing a phone, the very first words out of her mouth were:
“I would never buy a phone from Google. If something went wrong, I’m screwed! Its not like I can take it into a shop and get it fixed”
How right she was. According to Slashdot, Google is facing a deluge of customer complaints about the Nexus One. Are you having problems? Use the hash tag #fixgoogle if you are on Twitter. All tweets with that tag will appear on fixgoogle.com when I get the site up and running.
Tags: android, google, rant
Posted in Blog | No Comments »