Visualizing Every Elm Package

Recently, I came across Code Galaxies, a website that visualizes several popular package managers. Every published package is displayed as an orb in 3D space with lines connecting the packages to their dependencies. Arrow keys are used to fly around the packages, similar to flying a spaceship in a video game. Great idea!

Each of these orbs is a package that someone took the time write and publish. Zooming out such that all packages are within view reveals a unique shape for each package manager, reflecting the shape of the community.

Take a look at Bower, which aims to be "a package manager for the web." Having such a broad, inclusive goal, you'll see all kinds of variety in the packages, and you'll quickly find distinct communities around jQuery and AngularJS. You might also find smaller communities like Polymer and PureScript, disconnected from the broader community and hurtling off into space.

Elm is a small community with its own package manager. It aims to be "a delightful language for reliable webapps." That reliability comes from static types, which are also used by the language's tooling to enforce a great deal of homogeneity, in addition to their Design Guidelines which are "meant to promote consistency and quality across all Elm packages." We should expect something far more uniform than Bower.

In this post, I'll walk through the key steps in adding Elm's packages to Code Galaxies which include scraping the necessary package data, building the dependency graph, and positioning these packages in 3D space.

tl;dr: Explore all of Elm's packages.

Crawling Packages

Elm has an official package manager that is primarily used with commands like elm install and elm publish. Elm Packages also provides access to these packages and their documentation.

All packages are published with a JSON file describing the package and listing its dependencies. Prior to v0.19, this file was named elm-package.json, but now, every package is published with an elm.json. Here's an example of an elm.json file.

{
    "type": "package",
    "name": "brandly/elm-dot-lang",
    "summary": "Parse DOT Language files",
    "license": "BSD-3-Clause",
    "version": "1.1.3",
    "exposed-modules": [
        "DotLang"
    ],
    "elm-version": "0.19.0 <= v < 0.20.0",
    "dependencies": {
        "elm/core": "1.0.0 <= v < 2.0.0",
        "elm/parser": "1.0.0 <= v < 2.0.0",
        "hecrj/html-parser": "2.3.0 <= v < 3.0.0"
    },
    "test-dependencies": {
        "elm-explorations/test": "1.2.1 <= v < 2.0.0"
    }
}

Each dependency is listed with a range of supported versions [1]. Each package range can be resolved to a specific package version, and those packages have their own dependencies, forming a sprawling graph. Acquiring the elm.json or elm-package.json file from each and every package would provide full knowledge of this dependency graph.

The Elm Package website is open-source, and its server provides an API that responds with JSON. The /all-packages endpoint delivers details about the package registry including package names and published versions. In addition, adding ?elm-package-version=0.18 to the request retrieves packages for older versions of Elm. Finding unique package names in both lists results in a combined total of 2,123 packages.

The API responses are structured differently, but they both provide package names and a list of available version numbers for each package. To boost the size of the graph, every major version release of each package will be included in the final graph, and every node will be connected to a compatible version of each of its dependencies.

GitHub's API is used to download an elm.json file for a given package name and version. This works because every Elm package is first tagged and pushed to GitHub before getting added to the package manager.

const getFileAtVersion = async (repo, filename, version) => {
  const url = `https://api.github.com/repos/${repo}/contents/${filename}?ref=${version}&access_token=${token}`
  const res = await axios.get(url)
  return Buffer.from(res.data.content, 'base64').toString('utf-8')
}

const repo = 'brandly/elm-dot-lang'
const version = '1.1.3'
const elmJson = await getFileAtVersion(repo, 'elm.json', version)

// If it were an older package...
const elmPackageJson = await getFileAtVersion(repo, 'elm-package.json', version)

Every package, its major versions, and each of their package definitions are now within reach. The rest of the logic is the orchestration of these requests to download the files into ./packages, resulting in an organized tree.

$ tree packages 
packages
├── 0ui
│   └── elm-task-parallel@1.0.1.json
├── 1602
│   ├── elm-feather@1.0.2.json
│   ├── elm-feather@2.3.5.json
...
└── zwilias
    ├── elm-avl-dict-exploration@1.2.2.json
    ├── elm-bytes-parser@1.0.0.json
    ...
    ├── json-decode-exploration@6.0.0.json
    └── json-encode-exploration@1.0.0.json

814 directories, 3875 files

Due to how these files are laid out, those numbers at the bottom indicate that 814 unique accounts have published 3,875 major package releases.

Connecting the Dots

Code Galaxies is built on top of ngraph, a suite of graph-related tools. These tools help build a pipeline of sorts, taking the downloaded packages and transforming them into the format the app expects.

With the downloaded JSON files, each file represents a node in the graph and contains a list of dependencies, linking the node to some number of other nodes. This information needs to be plugged into ngraph. I'll highlight the interesting bits.

First, the list of files is piped into a script.

$ find ./packages -name '*.json' | node build-graph.js

find spits out a list of filenames, each on their own line. This list is consumed by the script, and the lines are chopped up into an array of file paths.

const fs = require('fs')

const filePaths = fs
  .readFileSync(0)
  .toString()
  .trim()
  .split('\n')

These files are read into memory and looped over, building up the graph data structure.

const createGraph = require('ngraph.graph')
const graph = createGraph()

// For each package
const node = `${package.name}@${package.version}`
graph.addNode(node)

// For each dependency in a package
graph.addLink(node, dependency)

At this point, graph holds the connections between all available Elm packages!

Moving In

In the app, each node sits somewhere in space. These locations are calculated by simulating a physical system, with a force pushing nodes away from one another, like reverse gravity, while the links in the graph tie dependent nodes together. The outcome is that packages with common dependencies end up near one another.

The app isn't a live simulation however. The positions are calculated upfront, and these fixed values are used in the app. Calculating these positions is once again made simple with ngraph.

const layout = require('ngraph.offline.layout')(graph)
layout.run()

const save = require('ngraph.tobinary')
save(graph, {
  outDir: './data'
})

This writes a variety of files to ./data describing every detail about the graph, but the app only cares about a few of them: labels.json, links.json, and positions.json. Feeding these files into Code Galaxies reveals the relationship between every published Elm package. (On mobile, hold two fingers to zoom out.)

What a view! Every Elm package depends on elm/core (formerly elm-lang/core) which forms the backbone of this structure. The core has been updated as the language has evolved creating a series of clustered packages like moments on a timeline. Some rare packages are tied to multiple versions of this core chaining the clusters together. Each cluster is bigger than the last as the Elm community continues to grow!

Within a cluster, the core is the dominant node alongside other sizable nodes like elm/html, elm/json, and elm/browser. These general packages dwarf the size of most other packages. Due to Elm's focus on building web apps, this is no surprise.

Elm strives for consistency in everything from API design to documentation formatting, encouraging package authors to publish general, stable solutions for well-defined problems. As a result, packages don't often "compete" with one another, and the community doesn't fracture easily. This, combined with the common core package, results in the latest cluster forming a fairly uniform ball.

Conclusion

The Elm Package API provides a complete list of packages, and the mapping between an Elm package and a GitHub repo supplies the necessary JSON package definitions. The data is large enough to show something interesting, but still small enough that it can be explored with some for loops. ngraph handles all the graph-related details.

The resulting visualization looks great and is fun to explore! The structure reflects the history of major Elm updates and the growing community around it. The other package managers likely tell their own stories as this is a great way to explore graphs that should be applied to other datasets.

Take a look for yourself, and let me know if you find anything interesting!

Special thanks to Andrei Kashcha for creating ngraph and Code Galaxies.

View the full source code and the resulting visualization featured in this post.


Notes

1. Elm follows SemVer where new functionality can be added in a minor version but any breaking change requires a new major version. As a result, these ranges tend begin at a minor version and end before the next major version. Elm's tooling can guarantee that packages follow SemVer (to some degree) since the exposed interface is statically typed.