Dance Computer, Dance

by Ray Grasso

Posts

Pieces I've written.

Finding Open Web Pages with Alfred

There are a handful of web pages that I use regularly throughout the day. Some are web apps that I keep pinned in Chrome while others come and go as I work.

I tend to close tabs when I’m done with them but I still end up with many open tabs. I’ve created an Alfred Workflow that opens a page I’m looking for so I don’t have to pick through my Chrome tabs by hand to find it.

The Find Page workflow takes a URL from a predefined list, runs an AppleScript that finds and activates the associated page if it’s already open in Chrome, otherwise it opens it in a new tab.

Find Page Workflow definition

Find Page Workflow example

You can download the workflow and try it yourself.

I’m giving Audible a go again. I like it for listening to non-fiction books. Fiction ones, not so much. 🤷‍♂️

Update: I’m reversing this. I actually prefer listening to fiction on Audible. I like highlighting things too much in non-fiction so Audible doesn’t really work as well for me for that.

Designing Data-Intensive Applications 📚

Designing Data-Intensive Applications by Martin Kleppmann

This book surveys data storage and distributed systems and is a fantastic primer for all software developers.

It starts with naive approaches to storing data, quickly builds up to how transactions work, and works up to the complexities of building distributed systems.

I particularly enjoyed the chapter on stream processing and event sourcing. It contrasts stream processing to batch processing and highlights many of the challenges of these approaches and explores options for addressing them.

Using Netlify for Hosting

I recently moved the hosting of my various blogs and websites off my own server to Netlify.

I was originally going to set up an S3 bucket and Cloudfront distribution for each of my sites but Netlify provides me the CDN and hosting features I need all bundled up already. You can upload files directly for serving or hook your site up to run a static site generator when you push to a branch of a Github repository.

In short, I’m not longer paying hosting costs and they handle all of the SSL certificate renewal from Let’s Encrypt for me.

Next up I plan to clean up the tooling I use for some of my sites and tweak things on here so I have more variety in my posts.

2019 is the year of the blog baby.

Forgetting Data in Event Sourced Systems

GDPR’s right to be forgotten means we have to be able to erase a person’s data from our systems. Event sourced systems work from an immutable log of events which makes erasure difficult. You probably want to think hard about storing data you need to delete in an immutable event log but sometimes that choice is already made and you need to make it work, so let’s dig in.

Erasing user data from current state projections

This is relatively straightforward. A RightToBeForgottenInvoked event is added to the event store for the person. All projectors that depend on personal data listen for this event and prune or scrub the appropriate data for the person from their projections.

Erasing data from the event stream itself

This case is trickier. We need to rewrite history in a way that doesn’t break things. Let’s look at an option for erasing data without rebuilding the event stream. This approach is also applicable for projections that are immutable change logs.

We can store personal data outside of events themselves in a separate storage layer. Each event instead stores a key for retrieving the data from this layer and any event consumers request the data when they need it. Given this data is personal the storage layer should probably encrypt the data at rest.

Once a RightToBeForgottenInvoked event is added to the event store all data for that person can be erased from the storage layer. All subsequent requests for data from the secure storage layer for that person’s data will return null objects rather than the actual data. This should make life easier for all consumers and avoid you null checking yourself to death all over the place.

Let’s see what this secure storage layer might look like.

Sketch of a secure storage layer

Our secure storage layer stores data that is scoped to a person and has a type (so we can return null objects). The store allows all data for a specific person to be erased.

Let’s start with two main models: a Person1 and a Data model.

      Data                 Person
  ┌──────────┐        ┌───────────────┐
  │    id    │   ┌───>│      id       │
  ├──────────┤   │    ├───────────────┤
  │person_id │───┘    │encryption_key │
  ├──────────┤        ├───────────────┤
  │   type   │        │   is_erased   │
  ├──────────┤        └───────────────┘
  │ciphertext│
  └──────────┘

The interface to the secure storage layer is outlined below.

class SecureStorage
  def add(person_id, data_id, type, data)
    # Find the Person model for person_id (lazily create one if needed).
    #
    # Encrypt the data using the person's encryption_key and store the
    # ciphertext in the data table using the client supplied data_id and type.
    #
    # Clients will store this data_id in an event body and use it to retrieve
    # the data later.
  end

  def erase_data_for_person(person_id)
    # Mark the corresponding record in the person table as erased
    # and delete the encryption key.
  end

  def get(data_id)
    person = Person.find_non_erased(person_id)
    if person
      # Look up the row from the data table, decrypt ciphertext using the
      # key on the person model, and return the data.
    else
      # Look up the row from the data table and return a null object for
      # that data type.
    end
  end
end

Where does that leave us?

After a person has invoked their right to be forgotten all current state projections will be updated to erase that person’s data. The event store will return null objects for any events that contain data for the person which means that any event processors won’t see that data as they build their projections. It will also contain the RightToBeForgottenInvoked event for the person so consumers can handle that explicitly if required.

  1. This could be expanded to be more general but we’ll stick with person for the purpose of this post. 

Replacing Google Analytics with GoAccess

I removed Google Analytics from my sites1 but still wanted access to some simple request statistics on them. Turns out GoAccess gives me most of what I need by analysing my Apache access logs.

The main challenges I ran into were, working out the correct flags for GoAccess, feeding compressed and uncompressed access logs at the same time, and ignoring junk requests from internet pests.

I pulled together a bash script that handles this, the essence of which is below.

# Analyse all log files
{ cat /var/www/mysite/logs/access.log; zcat /var/www/mysite/logs/access.log.*.gz; } | \
  # Strip out junk requests
  grep -v -E '\.php|jmx-console|\.cgi|phpmyadmin|dbadmin' | \
  # Fire up goaccess using the correct log file format for consolidated apache logs
  goaccess --log-format='%h %^[%d:%t %^] \"%r\" %s %b \"%R\" \"%u\"' --date-format='%d/%b/%Y' --time-format='%H:%M:%S' --ignore-crawlers
  1. A) It does way more than I need. B) I still don’t understand how to use it properly. C) It needlessly collects and sends your information off to the Big G. 

Remote Working Strategies

I’m almost two years into working remotely full time. It affords me flexibility and focus but it also comes with its challenges. I have a few strategies that help make it work for me and maybe they’ll help you too.

  • I mostly work from a room with a closable door. At the end of my work day I walk away and close said door. I find this helps me disconnect and keep my home and work contexts separate.
  • Change up where you work. It’s good to work in different rooms and different locations. I like to go somewhere where there are people around; even if I’m not speaking to them e.g. I’m often that rando working on his laptop in the food court.
  • I spend a lot of time on video calls. I have this headset by Jabra which has a decent microphone that doesn’t pick up much background noise. It also has a hardware mute button on the cord always within reach. As a bonus, people throw lots of “you look like you work in a call centre” gags at me.
  • Regular lunches in the city with friends helps top up my face to face human interaction stores.
  • Your energy levels will vary, do your best to ride it out. Sometimes I am a storm of energy and rip through my work. Other times I struggle to lock in and focus. Stick with it. Hold strong.
  • Get outside regularly. The dark side of not having a commute is that you can end up barely moving all day. I regularly walk around my neighbourhood to get some steps under my belt and sunshine on my face.
  • Enjoy the flexibility.

Event Sourcing Libraries

Creating an event sourced, CQRS application is simple enough conceptually but there is a lot of hidden detail when it comes to building them. There are a couple of event sourcing libraries I’ve used that can help.

The first, Event Sourcery, is in Ruby and created by my colleagues at Envato. You can use Postgres as your data store and it gives you what you need to build aggregates and events and projectors and process managers.

The immutability and process supervision baked into Elixir makes it a compelling option for implementing these kind of applications as well. Commanded is written in Elixir and follows a very similar approach to Event Sourcery and works a treat.

The Convenience of _.chain Without Importing the World

I’ve been meaning to work out how to maintain the convenience of the Lodash’s _.chain function whilst only including the parts of Lodash that I actually need.

Turns out you can cherry pick the fp version of the functions you need and compose them together with _.flow.

import sortBy from 'lodash/fp/sortBy';
import flatMap from 'lodash/fp/flatMap';
import uniq from 'lodash/fp/uniq';
import reverse from 'lodash/fp/reverse';
import flow from 'lodash/fp/flow';

const exampleData = [
  {
    "happenedAt": "2017-06-15T19:00:00+08:00",
    "projects": [
      "Project One"
    ],
  },
  {
    "happenedAt": "2017-06-16T19:00:00+08:00",
    "projects": [
      "Project One",
      "Project Two"
    ],
  },
];

const listOfProjectsByTime = (entries) => {
  return flow(
    sortBy('happenedAt'),
    reverse,
    flatMap('projects'),
    uniq,
  )(entries);
}

You can read more in Lodash’s FP Guide.

Consistent Update Times for Middleman Blog Articles with Git

The default template for an Atom feed in Middleman Blog uses the last modified time of an article’s source file as the article’s last update time. This means that if I build the site on two different machines I will get different last updated times on articles in the two atom feeds. I’d rather the built site look the same regardless of where I build it.

The source code for the site lives in a Git repository which means I have a consistent source for update times that I can rely on. So, I’ve added a helper that asks Git for the last commit time of a file and falls back to its last modified time if the file isn’t currently tracked in Git.

helpers do
 def last_update_time(file)
    Time.parse `git log -1 --format=%cd #{file} 2>/dev/null`
  rescue
    File.mtime(file)
  end
do

I now use this helper in my Atom template for each article.

xml.entry do
  xml.published article.date.to_time.iso8601
  xml.updated last_update_time(article.source_file).iso8601
  xml.content article.body, "type" => "html"
end

Adding Webpack to Middleman's External Pipeline

I use Middleman to build most of my content-focused websites. With the upgrade to version 4 comes the opportunity to move the asset pipeline out to an external provider such as Webpack.

I struggled to find good examples of how to integrate Webpack 2 with Middleman 4 so I’m documenting the approach I used here. For example code refer to middleman-webpack on Github.

Points of Interest

Build and development commands for webpack are in package.json.

"scripts": {
  "start": "NODE_ENV=development ./node_modules/webpack/bin/webpack.js --watch -d --color",
  "build": "NODE_ENV=production ./node_modules/webpack/bin/webpack.js --bail -p"
},

The external pipeline configuration in Middleman just calls those tasks.

activate :external_pipeline,
           name: :webpack,
           command: build? ? "yarn run build" : "yarn run start",
           source: ".tmp/dist",
           latency: 1

set :css_dir, 'assets/stylesheets'
set :js_dir, 'assets/javascript'
set :images_dir, 'images'

Assets are loaded by Webpack from the assets folder outside of the Middleman source directory1. Webpack includes any JS and CSS imported by the entry point files in webpack.config.js and generates bundle files into the asset paths Middleman uses.

module.exports = {
  entry: {
    main: './assets/javascript/main.js',
  },

  output: {
    path: __dirname + '/.tmp/dist',
    filename: 'assets/javascript/[name].bundle.js',
  },

  // ...

}

The config for Webpack itself is fairly straightforward. The ExtractText plugin extracts any included CSS into a file named after the entry point it was extracted from.

module.exports = {
  // ...

  plugins: [
    new ExtractTextPlugin("assets/stylesheets/[name].bundle.css"),
  ],

  // ...
}

This means you can include your styles from your JS entry file like normal and Webpack will extract the styles properly2.

Using the standard Middleman helpers to include the generated JS and CSS bundles allows Middleman to handle asset hashing at build time.

<head>
  <%= stylesheet_link_tag "main.bundle" %>
</head>

<body>
  <%= javascript_include_tag "main.bundle" %>
</body>

Finally

If you want to add modern JS and CSS to a bunch of statically generated pages then Middleman and Webpack works fine.

If, however, you are looking for a boilerplate for building a React SPA then something like react-boilerplate or create-react-app is likely a better fit.

  1. To avoid asset files being processed by both Webpack and Middleman. 

  2. Images are currently managed via Middleman and not Webpack. 

Structuring a Large Elm Application

I’m building an application in Elm and have been working on a strategy for breaking it down into smaller pieces.

My preferred approach is a few minor tweaks to the pattern used in this modular version of the Elm TodoMVC application1.

The Structure

The file structure is as follows.

$ tree src
src
├── Global
│   ├── Model.elm
│   ├── Msg.elm
│   └── Update.elm
├── Main.elm
├── Model.elm
├── Msg.elm
├── TransactionList
│   ├── Components
│   │   ├── FilterForm.elm
│   │   └── TransactionTable.elm
│   ├── Model.elm
│   ├── Msg.elm
│   ├── Update.elm
│   └── View.elm
├── Update.elm
└── View.elm

Global contains global state and messages, and TransactionList is a page in the application.

The top level Model, Msg, Update, and View modules stitch together the lower level components into functions that are passed into the top level Elm application (as shown below).

--
-- Main.elm
--
import Html.App as Html
import Model
import Update
import View

main : Program Never
main =
    Html.program
        { init = Model.init
        , update = Update.updateWithCmd
        , subscriptions = Update.subscriptions
        , view = View.rootView }

--
-- Model.elm
--
module Model exposing (..)

import Global.Model as Global
import TransactionList.Model as TransactionList

type alias Model =
    { global : Global.Model
    , transactionList : TransactionList.Model
    }

init : ( Model, Cmd msg )
init =
    ( initialModel, Cmd.none )

initialModel : Model
initialModel =
    { global = Global.initialModel
    , transactionList = TransactionList.initialModel
    }

--
-- Msg.elm
--
module Msg exposing (..)
import Global.Msg as Global
import TransactionList.Msg as TransactionList

type Msg
    = MsgForGlobal Global.Msg
    | MsgForTransactionList TransactionList.Msg

One of the things I like about this pattern is how readable each top level module is with import aliases.

View, Update, and Global State

The view and update functions compose similarly but I pass the top level model down to both so that they can cherry pick whatever state they need.

The lower level update functions can look at all the state and just return the piece of the model they are responsible for. For example the Global model can have common entities and state specific to the transaction list live in the TransactionList model.

Views are similar in that they can take state from the global model as well as their own model and render as necessary.

--
-- Update.elm
--
module Update exposing (..)

import Msg exposing (Msg)
import Model exposing (Model)
import Global.Update as Global
import TransactionList.Update as TransactionList

updateWithCmd : Msg -> Model -> ( Model, Cmd Msg )
updateWithCmd msg model =
    ( update msg model, updateCmd msg )

update : Msg -> Model -> Model
update msg model =
    { model
        | global = Global.update msg model
        , transactionList = TransactionList.update msg model
    }

updateCmd : Msg -> Cmd Msg
updateCmd msg =
    Cmd.batch
       [ TransactionList.updateCmd msg
       ]

--
-- View.elm
--
module View exposing (..)

import Model exposing (Model)
import Msg exposing (Msg)
import TransactionList.View as TransactionListView
import Html exposing (..)
import Html.Attributes exposing (..)

rootView : Model -> Html Msg
rootView model =
    div [ class "container" ]
        [ TransactionListView.view model ]

This approach seems to be working pretty well so far and it seems like adding routing shouldn’t be too difficult.

A Step in the Right Direction

I pulled the pin on working in the React/Redux space a few months ago after I became tired of the churn. Things were moving quickly and I found myself spending more time wiring together framework code than writing application code. This kind of thing sneaks up on you.

One glaring ommission was a preconfigured and opinionated build chain. I moved from starter kit to starter kit chasing the latest webpack-livereload-hot-swap-reload shine. Each kit was subtly different to the one before it. Not just their build components either. I missed having agreed-upon file conventions on where to store your actions, reducers, stores, and friends. It made me appreciate the curation provided by the Ember team in their toolchain.

The creation of Create React App (triggered by Emberconf no less) is a step in the right direction. Bravo.

Infrastructure Koans

Envato teams are responsible for the operation of the systems they build.

My team is trying something different to help onboard new people. We’re creating a set of infrastructure koans for them to complete. The koans are tasks that—once completed—will help folks navigate our infrastructure and systems, and thereby acquire skills that are essential for supporting our services.

When someone joins the team a new issue is created in one of our team’s Github repos using the koans document as a template. Once the new team member has completed all of the koans they are added to the on-call rota and assigned a buddy who can help if things get tricky whilst on call.

The koans are not meant to be layed out step by step unless the task is complex or requires unusual knowledge. We hope this encourages folks to explore and internalise more than they would if following a todo list.

Some Example Koans

Set yourself up on PagerDuty and read through past incidents.

View metrics for each of our systems in New Relic.

  • What is the average response time?
  • What does CPU, Memory, and I/O utilisation look like on each server?
  • What are the slowest transactions for the service? Dig into each transaction and see where the time is spent.
  • Check the error analytics tab and look for any relationships between errors and past deployments.
  • Check the availability and capacity reports.
  • Look for trends in the web transactions and database reports.

Look up each of our services in Rollbar.

  • What are the two most common errors being reported?
  • Drill into the details of a recorded error.
  • Are these errors we can live with? Should we create a task to fix them?

Open the AWS CloudWatch console.

  • Look through the available dashboards and metrics.
  • What CloudWatch alerts do we have configured for our production systems?

Open the AWS ECS console.

  • How many task definitions do we have? How many available versions exist for each of them?
  • Which systems make up each of our service clusters?
  • How many repositories do we have in ECR?

Look through our Stackmaster templates and find the results of building stacks from them in CloudFormation.

Access our ELK cluster and run some queries.

Run queries against our production database replicas.

Decrypt some database backups.

SSH into various servers in our infrastructure.

A Docker Container for River5

I’m rebuilding a VPS that I use for a bunch of my websites, side projects, and experiments. It hosts some static sites via Apache, a few rails apps, a node app, Postgres, and other bits and bobs. I want maintenance of configuration to be simpler in the future so I’m giving Docker a crack.

One of the side effects of using Docker should be that I can mess about with different tools and rollback to a clean state easily.

My first effort in learning Docker has been to create a container for hosting the River5 river-of-news aggregator by Dave Winer.

It’s up on Github and ripe for pull requests.