Four years ago, I started the weekend hacking project that would become Attachments.me, I had no clue what I was getting myself into. The user-base grew rapidly, and I had to learn a lot on my feet about scaling and architecture. It was a frequently stressful, often humbling, and ultimately wonderful learning experience.

Attachments.me was recently acquired by the email analytics company Yesware. In the aftermath, I've been taking some time to step back and reflect. In my career so far, I've had the opportunity to work at three rapidly growing technology startups: FreshBooks, Attachments.me, and Yesware. A commonality that I've noticed is that the engineering decisions made in the first few months (good and bad) have an incredibly lasting effect:

  • Taking shortcuts early on can result in exponential amounts of future work.
  • Rushing technology into production, without taking time to learn its ins and outs, can also result in months of future engineering work.
  • A good engineering culture becomes self-perpetuating.

In an industry where terms like lean, agile, and MVP are sprinkled into every third sentence, it sometimes feels like there's too much emphasis placed on shipping product, and not enough emphasis placed on building good infrastructure (infrastructure that, in-turn, allows the company to grow).

But releasing rapidly and building robust, scalable infrastructure need not be at odds with each other. The software development world is currently experiencing a Cambrian explosion of cloud computing technologies and there are a plethora of tools that allow developers to build faster than ever before:

  • Amazon's Elastic Computing Cloud (EC2), allows a small startup to provision hardware without the costs once associated with it.
  • Cloud computing platforms, like Heroku, provide a further abstraction. They allow a company to deploy applications without the operational expertise once required.
  • There are hosted solutions available for every infrastructure requirement under the sun: queues, databases, full-text search engines. This further alleviates past operational headaches.
  • These technologies are incredible, but they're not a magic bullet. In this article, I discuss some of the infrastructure choices available to a fledgling technology startup, reflecting on patterns I've seen that work. The conversation is centered around several hot topics of debate:
  • Is it a good decision to use Heroku to bootstrap your startup?
  • At what scale might you consider moving some infrastructure onto EC2?
  • Is it wise to use hosted solutions for critical pieces of infrastructure?
  • I'd like to put forward the argument that the technology choices themselves are not the most important part. You should instead strive to answer a few key questions about architectural decisions:
  • What's the strategy for scaling a piece of infrastructure as your user base grows?
  • How does a piece of infrastructure handle failure?
  • What alternative choices could have been made?
  • How difficult is it to migrate away from the technology, in the future?
  • To provide a foundation for some of the abstract concepts discussed in this article, the conversation is centered around building a personal diary application. I'll take this application through various stages of development: bootstrapping the app as a single Heroku instance, adding Heroku Postgres as a data-store, addressing scaling concerns, and supplementing functionality using AWS. From this thought experiment, I'll look at what patterns can be extracted.

A Personal Diary App

To frame some of the key concepts discussed in this article, I've built a personal diary application. I chose this example, because it's simple enough to build in a few hours, but has enough complexity to demonstrate key concepts, such as:

  • The steps involved in deploying an application to Heroku.
  • The decision-making process that goes into choosing external dependencies (databases, APIs, etc.).
  • How you might go about using Heroku in conjunction with AWS.

To get started, I laid out some requirements for the personal diary application that I wanted to build:

  • I should be able to email a diary entry to a special email address and have it stored in a database.
  • I should be able to log into the application and view my past diary entries.
  • I should be able to perform full-text search and retrieve old diary entries.
  • Before diving right in, let's take a look at some of the key philosophies that the Heroku platform is built around–these serve as a good foundation for your own architectural decisions.

Heroku is an application-hosting platform based around the Twelve-Factor App Manifesto.

Introducing Heroku and the Twelve-Factor Manifesto

For getting an application off the ground, it's hard to beat Heroku (For getting an application off the ground, it's hard to beat Heroku (). As their website states, “Heroku is cloud computing designed and built for developers.” It allows you to concentrate on writing code, abstracting away many of the system administration burdens once associated with software development. This abstraction is made possible by the guidelines outlined in the Twelve-Factor App Manifesto (guidelines outlined in the Twelve-Factor App Manifesto.

The 12 factors described in the document outline architectural considerations that help make an application stable, scalable, and extensible. They are:

  • Codebase: There is a single codebase per application, which development and deployment centers around.
  • Dependencies: Dependencies for the codebase should be explicitly declared (a codebase contains all the information necessary to get it up and running).
  • Config: An application's configuration information should be set via environment variables, rather than being hardcoded.
  • Backing Services: Services supporting an application, e.g., databases and queues, should be referenced via environment variables, and should be independent from the application itself.
  • Build, Release, Run: Building the application (for example, compiling Java files), releasing the application, and running the application should be separate steps.
  • Processes: Running the application consists of running one or more distinct processes (Heroku refers to these processes as Dynos). Processes are stateless, and can only interoperate via backing services.
  • Port Binding: An application exposes itself to the outside world via binding to a single outward-facing port.
  • Concurrency: An application's processes should be stateless and scale linearly via executing multiple processes.
  • Disposability: An application should shut-down gracefully and boot-up quickly.
  • Dev/Prod Parity: The state of the development codebase and the state of the production codebase should not diverge. Continuous deployment helps ensure this.
  • Logs: Logs should be treated as a stream of events and are aggregated together for all running processes.
  • Admin Processes: Admin processes, e.g., bundle exec rails console, should be run as one-off short-lived processes.

The 12-factor guidelines make deploying applications on Heroku incredibly straight-forward. As you build out the personal diary application, these rules provide a great litmus test for further technology decisions that you'll make.

Let's look at the steps involved in getting the personal diary project off the ground, and how they relate to the 12-factor app manifesto.

A Single Codebase

I created a single repository for the diary application on Github; it can be found here: http://github.com/bcoe/personal_diary. Editor's note: This page is no longer available on GitHub. The entire development process centers around this single repository. When a developer wants to make a change to the application, he simply checks out the repository and runs the codebase locally. Because dependencies are explicitly declared in a Gemfile, it's painless for the contributor to run an environment identical to production. Here's the Gemfile for the diary application:

source 'https://rubygems.org'
gem 'pg'
gem 'sinatra'
gem 'sinatra-activerecord'
group :development do
gem "bundler", "~> 1.5"
gem "rake"
end
gemspec

The homogeneity of environments, promoted by the declarative approach used in the codebase (using a Gemfile, database.yml, etc, rather than constants), gives me confidence that code running on staging will also run in production.

Along with the repository on Github, a git repository is created on Heroku for each development environment (staging, production, etc.). Deploying code to one of these environments is as easy as pushing code to one of the remote Heroku git repositories.

The typical path for a developer to get a feature from inception to production is as follows:

  • A developer creates a local branch of the Github repository and makes changes.
  • A pull request is created for the new feature. Other developers comment on the pull request on Github, and revisions are made accordingly.
  • The new feature is merged from the feature branch onto the master branch.
  • The master branch is deployed to the Heroku staging environment, via git push heroku-staging master.
  • If testing on staging is successful, the application is deployed to the production Heroku environment.

The workflow described above is one of the main reasons I advocate Heroku as a great technology choice for a fledgling startup. The codebase-centric approach to development makes it dead simple to get features into production, speeding up release cycles. At the same time, it provides a good framework for collaboration.

As the diary application matures, I may eventually migrate from Heroku toward hosting some of my own infrastructure. Take heed of the benefits presented by a codebase-centric approach to development, and emulate them.

Backing Services

The personal diary app relies on two external services:

  • Heroku Postgres: a hosted postgres instance that diary entries are stored in.
  • Mailgun: an API that allows emails sent to an email address to be redirected to an HTTP endpoint.

As advocated in the 12-factor-app manifesto, I set up the variables necessary to connect to these backing services as environment variables on Heroku. When I run the command, heroku config -a personal-diary, here's what I see:

=== personal-diary Config Vars
DATABASE_URL:postgres://user:pass@host/database.
MAILGUN_API_KEY:mailgun-api-key

The application references these environment variables rather than having any configuration information hard-coded:

ENVIRONMENT = ENV['RACK_ENV'] || "development"
db_conf = YAML::load(
    ERB.new(
        File.read("#{Dir.pwd}/config/database.yml")
    ).result
)

ActiveRecord::Base.establish_connection(
    db_conf[ENVIRONMENT]
)

The beautiful thing about this approach is that configuring separate deployment environments, e.g., staging versus production, is as simple as setting up different environment variables (perhaps the staging database is Heroku Postgres, and the production database runs on EC2). The above approach to describing dependencies provides a good foundation for ensuring development/production parity, but you're not quite there yet.

Development Parity

One problem that's frequently a hassle when working from a local development environment is ensuring that all appropriate environment variables are set. An approach that works quite well, and which is advocated by Heroku, is to create a local .env file, which mirrors production configuration variables. Foreman (https://github.com/ddollar/foreman) is an open-source tool for running applications locally, using a Procfile and .env file. Here's the .env file used by the diary application:

DATABASE_URL=postgres://user:pass@host/database
MAILGUN_API_KEY=mailgun-api-key
ELASTIC_SEARCH_URL=http://localhost:9200

The .env file should reference dependencies specific to the developer's local environment, e.g., a development Postgres database. This brings up another question: what's the best way for a developer to get local dependencies up and running? Declarative files, like a Gemfile, ensure that a developer gets the appropriated library dependencies installed, but what ensures that they have the appropriate service dependencies: databases, queues, etc. This is a hard problem to solve, and as a result is often ignored. I'd argue that development/production parity is one of the most important points of the 12-factor-app manifesto; it ensures that new developers can hit the ground running.

What are some approaches for helping a developer to get the appropriate service dependencies installed?

  • A developer can create his own personal accounts on hosted services, such as Heroku Postgres. Many of these services have a free tier, for just this use-case.
  • Virtualization software, like Vagrant, or VMWare can be used to create and share disk images. These images can be distributed with complex dependencies already installed.
  • Frameworks, like Puppet, and Chef, allow for the describing of complex infrastructure dependencies in a declarative manner.
  • Github has open-sourced a tool called Boxen (https://github.com/boxen). For OSX users, this tool simplifies the process of installing, and describing, service dependencies.

For the diary application, I opted to create developer accounts on Mailgun and Heroku Postgres for each developer environment. As the application grows and dependencies become more complex, I plan to pull in a tool like Boxen, to ensure that service dependencies are described in a declarative manner.

Scaling the App

On Heroku, your application is divided into a set of processes. A single application might have several types of processes associated with it, e.g., a Web process and a background worker process for consuming from queues. As with other dependencies, processes are described in a declarative manner. A Procfile is included in the repository, which lists the different types of processes available:

web: bundle exec unicorn -c config/unicorn.rb
worker: env QUEUE=* bundle exec rake resque:work

Heroku processes are stateless, and run in isolation from one another (storing persistent state information is facilitated by backing services). Working within this stateless paradigm encourages design that easily scales. You can just as easily run five of the Web processes described above as one. This allows you to scale services linearly.

As you move forward with the diary application and add further dependencies (both on and off of Heroku), continue to strive to eliminate state because linear scalability is the goal. Speaking of which, there's an elephant in the room…

The Problem with Backing Services

Heroku's process model provides a great foundation for deploying and scaling the app itself. Unfortunately, it tends to be backing services (databases, queues, APIs) that cause downtime. For any external service that the application depends on, you should be asking several important questions:

  • What's the recovery strategy if the service becomes unavailable?
  • What's the upgrade path if you outgrow the service?
  • What alternatives are available?

For any external dependency that you add to the diary application, you should be able to field these questions, with the goal of eliminating single points of failure. Let's look at the two backing services that our diary app currently relies on: Heroku Postgres and Mailgun.

Heroku Postgres

Heroku Postgres is used to store all inbound diary entires. The disaster recovery, and upgrade stratregy is as follows:

  • Recovery: Heroku postgres allows you to set up a follower database. This read-only database is kept in sync with the primary database. If the primary database dies, you can failover to the follower.
  • Upgrade Path: A follower database can be used to upgrade an existing database. Simply create a follower database larger than the primary and swap.
  • Alternatives: You could opt to move to another hosted Postgres solution. Amazon's cloud platform, Amazon Web Services AWS,offers a hosted Postgres solution. You could also choose to run your own server (perhaps on EC2). In both cases, set up replication to facilitate upgrading and recovery.

Mailgun

Mailgun is a hosted API (Mailgun is a hosted API () that allows diary entries to be emailed to the app. As with Postgres, I had recovery strategies in place for this key piece of infrastructure:

  • Recovery: If the Mailgun API becomes unavailable for an extended period of time, you can look at using an alternative such as SendGrid.
  • Upgrade Path: Mailgun is metered and you can pay for more throughput as your system scales.
  • Alternatives: You can move to a similar API, such as SendGrid. You might also consider hosing your own comparable service on AWS.

It's important that you go through a vetting process like this for any dependency that you pull into the system. For the diary app, adding a follower database imposes restrictions on the tier of database that you can use. It's good to identify these sorts of restrictions up front.

Using Heroku with Amazon's Elastic Computing Cloud (EC2)

As an advanced feature, I wanted my diary app to support full-text search. Having had some experience with it, I opted to use ElasticSearch for this. Rather than using a hosted solution, I opted to host this ElasticSearch myself, on EC2:

  • The hosted solutions available were quite pricey, compared to other external dependencies, such as Heroku Postgres.
  • There was recently a major outage with the predominant hosted ElasticSearch provider, giving me concerns about its stability.

As discussed throughout this article, making good technology choices for your company is not about Heroku versus EC2. Instead, it's about being pragmatic, thoughtful, and asking the right questions. With this in mind, let's look at some of the questions that were on my mind while deploying ElasticSearch on EC2.

Dependencies Should Be Declarative

As with the Web app, I wanted any server infrastructure I deployed to be declarative. In the past, you might deploy a server; tweak 1,000 settings (to get a service, such as ElasticSearch, running nicely); and hope to never have to do it again. When a server inevitably dies at some point in the future, the manual approach outlined above results in a lot of pain.

For deploying ElasticSearch on EC2, I opted to use http://getchef.com, a hosted Chef solution. This allowed me to configure the ElasticSearch server in a declarative manner.

Configuration

As with other backing services, the URL of the ElasticSearch service on EC2 was added to the Heroku app's configuration: heroku config:set ELASTIC_SEARCH_URL=http://ec2-url.

Recovery

All of the data indexed in ElasticSearch for the diary app is also stored in Postrgres. If the ElasticSearch server dies, the recovery strategy is as follows:

  • Bootstrap a new server using Chef.
  • Re-index the diary entries into ElasticSearch from Postgres.
  • Update the Heroku configuration to point to the new ElasticSearch URL.

Upgrade Path

Chef makes it easy to spin up a new EC2 instance. If you need to upgrade the ElasticSearch server, you can simply:

  • Use Chef to start a larger EC2 instance.
  • Re-index the diary entries from Postgres onto the new ElasticSearch instance.
  • Update the ElasticSearch URL in Heroku's config.
  • Shut down the old ElasticSearch Server.

Development Parity

For staging and development environments, I opted to use http://bonsai.io, a hosted ElasticSearch solution. When possible, I tend to advocate using hosted services, and I see no problem with using them in conjunction with running your own infrastructure.

Releasing Minimal-Reasonable-Products

The diary application has a varied set of dependencies: Heroku hosts the Web app, Mailgun handles accepting inbound diary entries, Heroku Postgres stores the user information and diary entries, and ElasticSearch is deployed on an EC2 instance.

By using as many hosted solutions as possible, you can get an initial product out the door quickly.

By using as many hosted solutions as possible, you can get an initial product out the door quickly. However, releasing an MVP is not an excuse for being negligent, it's important that you're mindful about every technology choice that you make, and that you ask some important questions:

  • How do you recover from failure? When a key piece of infrastructure dies, you should have a clearly defined strategy for recovery. An effort should be made to periodically run fire drills to test the approach.
  • How do you upgrade? As you add users to the app, it's inevitable that a piece of infrastructure will need to be upgraded. How can you do this with minimal disruption?
  • What alternatives are available? As a company grows, pieces of infrastructure will need to be swapped out for alternatives, and cost can often be a big driver of this. Consider alternatives from the get go.
  • Is development and production in parity? It should be painless to get a development environment up and running. If something runs in a development environment, a developer should be confident that it will run in production.

When making decisions about your company's technology stack, it's not about Heroku versus EC2 because the technology is interchangeable. What is important is that you walk the line between pragmatism and perfectionism. You need to sufficiently understand the technology that you've deployed, as to be able to scale it, and recover from failure; at the same time you need to make technology decisions that help get infrastructure up and running quickly, e.g., opting to use hosted solutions. I've coined a term for this: Ship an MRP (a minimally-reasonable-product), rather than an MVP.