Zen and the Art of Code Maintenance

Wednesday, August 28, 2013

Datomicity

I just got back from a FP-Syd (that's Functional Programming Sydney user group) meeting where one of the speakers talked about Datomic. It's basically, in the speaker's own words, "a NoSQL database with a rewind button". Datomic shines where other NoSQL databases have failed:

1. It's fully ACID compliant, which means you can have transactions. What I'm not sure about is whether it does distributed transactions as well. But regardless, I don't know of any other NoSQL database that supports ACID transactions, be they local or distributed.

2. Unlike most NoSQL databases, it actually has a first-class query language based on Datalog which is itself based on Prolog. Some say it's even more expressive than SQL.

3. Unlike most NoSQL databases, it lets you store data hierarchically and do joins.

4. Out of the box it uses what I believe to be HSQLDB (think SQLite for Java), which allows data to be in memory or on disk. If that doesn't cut it, you can hook it up to a relational database, Riak, CouchBase or Dynamo DB.

5. Clojure is somewhat native to Datomic as it understands Clojure data structures. If you do functional programming in Clojure (or understand Lisp), Datomic is perfect for you as it minimizes if not completely eliminates impedance mismatch.

6. Data is stored as time-based facts, which means that it's append-only. The fact that data is immutable enables things like event sourcing (and with it all the wonderful business scenarios), not to mention avoids a host of concurrency issues that typically plague database systems.

The downsides:

1. It's closed source

2. It costs money if you want to hook it up to other data storages or integrate with memcached

Saturday, June 29, 2013

First GitHub pull request

Tonight while twiddling with a new pet project, I made my first GitHub pull request ever. To add icing on the cake, it's for an Elixir project, and I'm a complete Elixir noob!

Here's to hoping the original author will accept my pull request and doesn't think that I'm way out of my depth.

Sunday, June 23, 2013

Farewell mocks and stubs, here comes JavaScript

This isn't exactly a revelation to many developers out there, but I finally got my head around unit testing JavaScript code! I knew that the dynamic nature of JavaScript would help a great deal, but it wasn't until I actually wrote the tests that I realized how easy stubbing was.

There are a couple of key things to note though:

Dependency injection (a fancy schmancy term for passing arguments) improves testability, even in JavaScript. Being able to just stub arguments negates the need to use a faking framework like Sinon.
While it's not absolutely necessary to use a faking framework, one has to be wary of not changing the behavior of the module being tested across multiple tests. A naive way of getting around this is to cache the actual function that is being faked in setup and assigning it back in teardown. A safer, albeit more convoluted way is to use libraries like NodeUnit and Sinon which provide sandboxing capabilities.

Tuesday, June 18, 2013

Let's Kinect!

My company's upcoming hackathon presents a perfect opportunity to finally do something useful with that Kinect unit that I bought more than a year ago. We're planning to spend a couple of days building a game. The only requirement is it's not supposed to be work-related. How cool is that?

Time to dust off that Kinect. Hot or Not game with motion capture and voice recognition, anyone?

Saturday, June 15, 2013

Where art thou, expm packages?

I had a second look at how the expm package collection is going, and boy progress has been rather slow. Popular frameworks such as Cowboy, MochiWeb and WebMachine are evidently missing.

Although I don't mind using rebar to 'git pull' the dependencies and compile them with a single 'make deps', I still think it'd be more handy to be able to just do an 'expm spec webmachine' or some such.

Note to self: add more packages while grokking Elixir.

Good riddance, parameterized modules!

After previous attempts at upgrading FilmoMeter to rid itself of the ungodly shackles of parameterized modules, I finally managed to pull it off today! WebMachine 1.10.1 and MochiWeb 1.5(?) finally fix this issue by removing all traces of parameterized modules. All hail functional gods!

Now I can finally upgrade to Erlang R16B and Elixir 0.9.2 in peace.

To Elixir or not to Elixir, that's the question

Elixir has been gaining a lot of attention of late. Even the venerable Joe Armstrong himself has taken it for a spin and is impressed by some of its features.

I myself have been mildly frustrated by the lack of 'proper' macros and monads in Erlang. Maybe it's time to succumb to temptation and take the plunge?

Monday, June 18, 2012

Erlang Concurrency Model FTW?

While catching up on my reading list, I came across this performance benchmark of C10k web servers.

Implementation	Connection Time (mean)	Latency (mean)	Messages	Connections	Connection Timeouts
Erlang (Cowboy)	865ms	17ms	2849294	10000	0
Haskell (Snap)	168ms	227ms	1187413	4996	108
Java (Webbit)	567ms	835ms	1028390	4637	157
Go	284ms	18503ms	2398180	9775	225
Node.js (websocket)	768ms	42580ms	1170847	5701	4299
Python (ws4py)	1561ms	34889ms	1052996	4792	5208

With the exception of Connection Time, Cowboy totally owns other web servers' asses in all other categories.

In the realm of high concurrency and availability, Erlang still rules supreme.

There are a couple of observations that I want to make:

Despite all the hype, Node.js still fails to deliver. Don't get me wrong, I really like the idea of running JavaScript on the server. Most web developers already know JavaScript and can't be bothered to pick up a functional language just to write asynchronous and event-driven software, so the barrier to entry is significantly reduced. I just wish that all the attention and community participation would have resulted in something better in terms of scalability and performance.
Although Cowboy's average Connection Time is second worst and eight times Snap's average, practically speaking most users won't even notice the difference as the former still takes less than a second. When it comes to performance, perceived responsiveness matters more than raw numbers. There's a big difference between comparing 100ms to 800ms and 1s to 8s.

Saturday, June 16, 2012

AWS Automation Using PowerShell - Part Un (Starting and Stopping EC2 Instances)

This is the first post in what will hopefully be a series of posts (good luck with that) on automating Amazon Web Services (AWS) deployments using PowerShell.

Pre-requisites

Before you get started, there are a few things that you need to do:

Gain basic understanding of AWS especially on EC2, S3, EBS, EIP and SNS.
Sign up for an AWS account.
Install PowerShell 2.0.
Install the latest version of the AWS SDK for .NET.
Create a directory in which you're going to create the PowerShell scripts.
Copy the AWSSDK.dll file from the bin directory of the SDK installation (e.g. C:\Program Files (x86)\AWS SDK for .NET\bin) to the new directory. This is to avoid having to deal with machine-specific installation paths and lets you deploy a self-contained set of automata.
Buckle up for a joy ride ahead!

AWS SDK for .NET

The SDK comes with a Visual Studio 2008/2010 plugin that lets you create .NET projects that you can then easily deploy to AWS using a wizard-based GUI. On top of that, it also includes a .NET library (duh) that wraps calls to the AWS REST APIs.

Having said that, the IDE plugin is pretty much useless for our purposes since a good engineer always attempts to automate repetitive tasks, and using a GUI-based tool does not a good automaton make. Instead, we want to be able to write a program, or better yet, a script that can be executed from the command line interface. You can choose to write the automata in C#, but bear in mind that they will ultimately be owned and maintained by a DevOps team. Most DevOps engineers I know can't be bothered to learn C#, so PowerShell is probably be a more preferred option.

Spinning EC2 instances up and down

While the best practice in managing an AWS infrastructure is to leverage CloudFormation and deployment platforms like Puppet or Chef, doing so requires significant investment in understanding the technology and underlying concepts. What I'm going to do here is present a simple scenario where you need to start and stop specific EC2 instances so that we can gain basic understanding of AWS and its APIs before delving into more advanced scenarios. Think of it as getting your feet wet before diving in head first.

The following is a sample PowerShell script that starts and stops EC2 instances. Bear in mind that I'm a relative virgin when it comes to Powershell, having only been to the second base. So keep the laughter and scorn to yourself for now.

The code is actually pretty self-explanatory, but I'm going to walk you through it anyway.

The first line exposes four parameters, namely the operation (start or stop), the AWS account secret key, access key and a list of EC2 instance identifiers that you want to start or stop.
Line 3 is where you include a reference to the standalone copy of the SDK.
Lines 5-23 represent the meat of the script that starts and stops the instances. As you can probably gather, most AWS APIs follow the request-response pattern. You create a request to do something, send the request to AWS and get a response in return. It doesn't get any simpler than that.
Lines 25-29 refer to a function that creates and returns a 'proxy' to the EC2 API based on the specified secret and access keys.
Lines 31-41 contain a function that returns a generic list of EC2 instance identifiers based on the instanceIds parameter.
Lines 43-50 refer to a simple switch that executes either the start or stop function based on the operation parameter passed into the script.

You can then use the script as follows:

Go to your AWS console and lo and behold, the instances are spinning up!

There are many ways of automating this process. One option is to create builds in your favorite continuous integration platform (TeamCity, Go, TFS, etc.) that make use of this script to spin the instance up or down based on a specific schedule. Alternatively, you can configure the builds to not execute automatically so that you can manually trigger them with a single click of a button.

What's next?

The next step is to bootstrap the EC2 instances during startup so that each instance is responsible for configuring and installing its own resources dynamically. We can do fancy things like leveraging the Simple Notification Service (SNS) to notify interested parties (e.g. the build workflow) when the bootstrapping process is complete instead of polling each EC2 instance to see if it's up and running. This and more will be covered in the next post.

Till next time, adios.