Introduction

Usually, when we talk about refactoring, we think about small changes that we make to the codebase to improve its design. But what happens when we need to refactor a large codebase? How can we make sure that the changes we make don't break the application? And most importantly, how can we ensure that the refactoring is not blocking the development of new features? In this article, we will explore some strategies for refactoring at scale using the TypeScript Compiler API.

But before we dive into the details, let's take a step back and define what we mean by "at scale".

The problem with large codebases

When I am talking about a large codebase, I am imagining a codebase that has hundreds of thousands to millions of lines of code, hundreds of modules, and dozens of developers working on it. In such a codebase, refactoring can be a daunting task. Especially if there is a big throughput of merged pull requests, and the codebase is constantly changing. In such a codebase, manually refactoring cross-cutting concerns can be a nightmare, due to too many moving parts and too many developers working on it. Imagine refactoring something like that, where you have to change a function signature in a core module, and then you have to update all the function call invocations across the codebase. That would lead to hundreds of merge conflicts, which would be a nightmare to resolve - maybe even impossible due to the high throughput of merged pull requests.

So, how can we refactor such a codebase without blocking the development of new features and without breaking the application?

Automating refactoring with Code Mods

One way to tackle this problem is by automating the refactoring process using Code Mods. Code Mods are scripts that automatically refactor your codebase based on a set of rules. They can be used to make large-scale changes to your codebase without having to manually update every single file.

Instead of refactoring some code directly, you can write a script that will refactor the code for you. The major advantage is that you can check the script into your version control system and run it anytime you want - preferably at times when the codebase is not changing much. This way, you can ensure that the refactoring process is not blocking the development of new features.

To be fair, some code bases are so large, with developers working in all time zones, that it is impossible to find a time when the codebase is not changing. In such a case, code mods are still really useful, because the code mods can be applied to vertical slices time by time in batches, so that the refactoring process is not blocking the development of new features.

Code Mods can be written in any language, but in this article, we will focus on using the TypeScript Compiler API to write Code Mods for refactoring TypeScript at scale. There are other tools available for writing Code Mods, such as jscodeshift for JavaScript, but the TypeScript Compiler API is particularly well-suited for refactoring TypeScript codebases. It is important to have a tool that understands the TypeScript syntax and can help you navigate the codebase.

The TypeScript Compiler

TypeScript is a superset of JavaScript that adds static typing to the language and it is widely used in large codebases, yet it is not executed by the browser or Node.js runtime. Instead, TypeScript code is transpiled to JavaScript using the TypeScript Compiler. Using a tsconfig, you can add configurations for the compiler to check your code for errors, and to transpile it to various kinds of JavaScript.

TypeScript Compiler Process

The TypeScript Compiler has several phases that it goes through when processing your code. Here is an abstraction of the process to give you an idea how it kind of works:


  1. Lexical Analysis: The compiler reads the source code and converts it into a stream of tokens.
  2. Syntactic Analysis: The compiler parses the tokens into an Abstract Syntax Tree (AST).
  3. Semantic Analysis: The compiler checks the AST for type errors and other semantic issues.
  4. Transformation: The compiler transforms the AST into a new AST that represents the transpiled JavaScript code.
  5. Emit: The compiler generates the JavaScript code from the transformed AST.

The most interesting phase for us is the Transformation phase. This is where the TypeScript Compiler transforms the AST into a new AST that represents the transpiled JavaScript code. This is where we can write programs that cna understand programs and transform them. The TypeScript Compiler is written in TypeScript itself, and it exposes an API that allows you to programmatically interact with the compiler.

Therefore, you can write scripts that use the TypeScript Compiler API to analyze and transform TypeScript code. This might sound a bit strange at first, but it is actually quite common and many tools are built on top of the TypeScript Compiler API, such as typescript-eslint.

So, let's see how we can use the TypeScript Compiler API to write code that understands code.

The TypeScript Compiler API

The TypeScript Compiler API is a set of classes and interfaces that allow you to programmatically interact with the TypeScript Compiler.

In order to use the TypeScript Compiler API, you need to install the typescript package and import from it.

import * as ts from 'typescript';

Now, you can use the ts namespace to access the TypeScript Compiler API. Let's take a look at some of the key concepts of the TypeScript Compiler API:

1. Source Files

A ts.SourceFile represents a single TypeScript or JavaScript file. You can use the API to parse a file into a SourceFile object, which includes information about its syntax and structure.

const sourceFile = ts.createSourceFile(
  'example.ts',
  'const x: number = 1;',
  ts.ScriptTarget.Latest
);

2. Abstract Syntax Tree (AST)

The TypeScript Compiler API represents the code as an Abstract Syntax Tree (AST), which is a tree-like data structure that represents the structure of the code. Programming languages are usually parsed into an AST before being compiled or interpreted. Each node in the AST corresponds to a construct in your code, such as a variable declaration, function, or class. You can use the TypeScript Compiler API to traverse the AST and analyze or transform the code using the visitor pattern.

function visitNode(node: ts.Node) {
  if (ts.isVariableDeclaration(node)) {
    console.log('Found variable declaration:', node.getText());
  }
  ts.forEachChild(node, visitNode);
}

visitNode(sourceFile);

Pro Tip: ASTs are actually quite simple to understand when you get the hang of it. It is a good idea to print out the AST of some code snippets to get a feeling for it. For this, I recommend using the AST Explorer.

AST Explorer

3. Transformation

The TypeScript Compiler API allows you to transform the AST by creating new nodes and replacing existing nodes. Therefore, the important part is to write the correct conditions to find the nodes you want to transform. In other words, transforming an AST is usually just a bunch of if-else statements looking for certain nodes.

function transform(node: ts.Node): ts.Node {
  if (ts.isVariableDeclaration(node)) {
    return ts.createVariableDeclaration(
      node.name,
      undefined,
      ts.createLiteral(42)
    );
  }
  return ts.visitEachChild(node, transform, undefined);
}

const transformedSourceFile = ts.visitNode(sourceFile, transform);

This transformation would replace all variable declarations with the value 42, such that the code

const x = 1;

would be transformed into

const x = 42;

Okay, okay, I know, that is not a very useful transformation, but you get the idea. You can write more complex transformations by combining different conditions and creating new nodes. A better example of a useful transformation would be to automatically refactor all var declarations to let declarations. Or to refactor all function declarations to const arrow functions. Or to refactor all Promise invocations to async/await syntax. Or to prefer const over let where possible.

Conclusion

In this article, we have explored some strategies for refactoring at scale using the TypeScript Compiler API. We have seen how you can use the TypeScript Compiler API to programmatically analyze and transform TypeScript code. We have also seen how you can write Code Mods to automate the refactoring process and ensure that it is not blocking the development of new features.

Refactoring a large codebase can be a daunting task, but with the right tools and techniques, you can make the process more manageable. By using the TypeScript Compiler API, you can write scripts that understand your code and help you make large-scale changes without having to manually update every single file.

I hope this article has given you some insights into how you can refactor TypeScript at scale and make your codebase more maintainable and robust. If you have any questions or feedback, feel free to reach out to me on Twitter.

"Opinions are my own and not the views of my employer."