Avatar

To AST and back: magically crawl and transform your code with TypeScript API

← Back to list
Posted on 27.08.2021
Last updated on 04.12.2024
Image by doodlecarll on Flickr
Refill!

Table of contents

Some time ago I was solving a challenge. I had to write an application that crawls code and reports certain patterns. Obviously, I wanted this done like a pro, avoiding writing some insane regular expressions or something like that. That is why I was evaluating different software that can convert source code into AST (Abstract Syntax Tree), so later on I could traverse the node graph to find everything I wanted.

At the beginning I was evaluating a really good piece of software called Espree. While the project is amazing, sadly I faced some problems parsing JSX. And then I asked myself: «I am using TypeScript anyway, why don't I check out what it has to offer?»

# Step 0: Installation

If you already have TypeScript in your project, you don't have to install anything. It is just the same package, you will probably have to move it from dev dependencies to production dependencies.

$
yarn add typescript
The code is licensed under the MIT license

# Step 1: Overview

First things first, TypeScript API needs to have an instance of a program. For that matter, there is the createProgram factory.

import { createProgram } from 'typescript';
const indexFile = '/home/sergei/projects/ui-library/src/index.ts';
const program = createProgram([indexFile], {
allowJs: true,
jsx: JsxEmit.React,
});
The code is licensed under the MIT license

As you can see, the createProgram function accepts a list of source file names. There is also a second parameters that contains an option list. The options are matching the corresponding ones from a regular tsconfig.json.

# Nodes of AST

To get access to actual content, do the following:

const nodes = program.getSourceFile()?.statements ?? [];
The code is licensed under the MIT license

If I print the contents of the node variable, I may get something like that:

[
NodeObject {
pos: 0,
end: 65,
flags: 0,
modifierFlagsCache: 0,
transformFlags: 0,
parent: undefined,
kind: 262,
decorators: undefined,
modifiers: undefined,
symbol: undefined,
localSymbol: undefined,
locals: undefined,
nextContainer: undefined,
importClause: NodeObject {
pos: 6,
end: 51,
flags: 0,
modifierFlagsCache: 0,
transformFlags: 0,
parent: undefined,
kind: 263,
isTypeOnly: false,
name: undefined,
namedBindings: [NodeObject]
},
moduleSpecifier: TokenObject {
pos: 56,
end: 64,
flags: 0,
modifierFlagsCache: 0,
transformFlags: 0,
parent: undefined,
kind: 10,
text: 'react',
singleQuote: undefined,
hasExtendedUnicodeEscape: false
}
},
...
]
The code is licensed under the MIT license

The tree consists of nodes, each node has a type of Node or its descendant. A node has at least the following types, that can be interesting:

  • kind - indicates what kind of node is that
  • pos - where the code chunks represented by a node starts in the file
  • end - where it ends
  • parent - parent node, if any

Plus, almost each time there are type-specific properties containing sub-nodes.

# Checkers

The kind field holds a numeric value. Basically, for each symbol, type or syntax construct there is a corresponding kind value. However, I don't have to use the field itself to find out what type of node I got. For that purpose TypeScript offers a bunch of checkers.

For instance, to check if a node represents a variable declaration:

import { Node, VariableStatement, isVariableStatement } from 'typescript';
const doSomething = (variableNode: VariableStatement) => { /* do something cool */ };
if (isVariableStatement(node)) {
doSomething(node); // typescript already knows here that node is of type VariableStatement
}
The code is licensed under the MIT license

# Source files

The source file, in its turn, is also a Node of a different kind. Thus, it contains other useful custom properties:

  • path - holds an absolute path to that file.

  • resolvedModules - a list of all imports made in a file, with all paths resolved to absolute.

    That is very useful, since I won't have to do any manual work here:

    Map(7) {
    'react' => {
    resolvedFileName: '/home/sergei/projects/ui-library/node_modules/@types/react/index.d.ts',
    originalPath: undefined,
    extension: '.d.ts',
    isExternalLibraryImport: true,
    packageId: {
    name: '@types/react',
    subModuleName: 'index.d.ts',
    version: '16.9.46'
    }
    },
    '../type' => {
    resolvedFileName: '/home/sergei/projects/ui-library/src/components/type.ts',
    originalPath: undefined,
    extension: '.ts',
    isExternalLibraryImport: false,
    packageId: undefined
    },
    ...
    }
    The code is licensed under the MIT license

    This information can be used later on to get other source files and parse them too.

# Modifying a tree

The content of the source file is immutable by its nature. One can not just simply make an assignment (well, with @ts-ignore it is technically possible, but would totally defeat the concept).

Instead, there is a set of special methods available. Each method allows creating/modifying a node of a specific type. For example the following code makes the first property optional:

import { factory, isTypeLiteralNode, isPropertySignature, SyntaxKind } from 'typescript';
if (isTypeLiteralNode(node)) {
let firstMember = node.members[0];
if (isPropertySignature(firstMember)) {
firstMember = factory.updatePropertySignature(
member,
members.modifiers,
members.name,
factory.createToken(SyntaxKind.QuestionToken),
member.type
);
}
const newNode = factory.createTypeLiteralNode([firstMember, ...node.members.slice(1)]);
}
The code is licensed under the MIT license

Note that the update method does not make changes in place, it rather returns a new modified instance.

# Printer

And now the coolest part. The tree can be converted back to the actual code! It is extremely powerful, because I can make amends in the tree (for example turn all const into let) and then get the updated source code, which I can save for later usage.

Here is how it is done.

import {
createPrinter,
EmitHint,
NewLineKind,
Node,
} from 'typescript';
const printer = createPrinter({
newLine: NewLineKind.LineFeed,
removeComments: false,
});
export const print = (node: Node) =>
printer.printNode(
EmitHint.Unspecified,
node,
// @ts-ignore
'',
);
console.log(print(myAst));
The code is licensed under the MIT license

Yeh, you may notice a small @ts-ignore. This is because the printer needs to know the exact file name it is printing. In my case it worked with the file name set to an empty string.

I am able to print different kinds of nodes: a function declaration or a type declaration. It does not make any difference.

# Step 2: Show time

Okay, with all that being said, let me show you a really simple script that just traverses a tree in depth with an ability to propagate the possible changes back.

First of all, I have created a abstractions on top of the program and source files. For me it was reasonable, since, because I had additional logic to implement. That logic is omitted here, because of too business-specific nature.

👉 📃  src/parser/SourceFile.ts
import {
NodeArray,
SourceFile as TSSourceFile,
TypeNode,
} from 'typescript';
export class SourceFile {
constructor(private sourceFile: TSSourceFile) {}
public get nodes() {
// yeh, something is still not well-settled, I know
return (this.sourceFile.statements as unknown) as NodeArray<TypeNode>;
}
// ... some additional logic could be here
}
The code is licensed under the MIT license
👉 📃  src/parser/Program.ts
import { createProgram, JsxEmit, Program } from 'typescript';
import { join } from 'path';
import { SourceFile } from './SourceFile';
export class Project {
private program: Program;
private knownFiles: Record<string, SourceFile> = {};
constructor(private projectFolder: string) {
const rootFile = join(
this.projectFolder,
'src/components/index.ts',
);
this.program = createProgram([rootFile], {
// these options are the same as the ones in tsconfig.json
allowJs: true,
jsx: JsxEmit.React,
});
}
public getSourceFile(fileName: string) {
if (!(fileName in this.knownFiles)) {
const file = this.program.getSourceFile(join(
this.projectFolder,
fileName,
));
if (file) {
this.knownFiles[fileName] = new SourceFile(file);
}
}
return this.knownFiles[fileName] ?? null;
}
// ... some additional logic could be here
}
The code is licensed under the MIT license

Now the main class:

👉 📃  src/parser/Crawler.ts
import {
isArrayTypeNode,
isTypeAliasDeclaration,
isTypeLiteralNode,
isIntersectionTypeNode,
isUnionTypeNode,
isPropertySignature,
ArrayTypeNode,
factory,
TypeNode,
TypeReferenceNode,
TypeAliasDeclaration,
TypeLiteralNode,
TypeElement,
IntersectionTypeNode,
UnionTypeNode,
} from 'typescript';
import debug from 'debug';
import { Project } from './Project';
import { SourceFile } from './SourceFile';
const MAX_TRAVERSE_DEPTH = 10;
type ContextType = {
depthLevel: number;
file: SourceFile;
};
const d = debug('Crawler');
export class Crawler {
constructor(private project: Project) {}
public crawl(
fileName: string,
) {
const sourceFile = this.project.getSourceFile(fileName);
if (sourceFile) {
this.traverse(sourceFile, {
depthLevel: 0,
file: sourceFile,
});
}
}
private traverse(node: TypeNode, ctx: ContextType): TypeNode {
const { depthLevel } = ctx;
if (depthLevel > MAX_TRAVERSE_DEPTH) {
return node;
}
if (isArrayTypeNode(node)) {
return this.traverseArrayType(node, ctx);
}
if (isTypeAliasDeclaration(node)) {
return this.traverseTypeAlias(node, ctx);
}
if (isTypeLiteralNode(node)) {
return this.traverseTypeLiteral(node, ctx);
}
if (isIntersectionTypeNode(node)) {
return this.traverseIntersectionType(node, ctx);
}
if (isUnionTypeNode(node)) {
return this.traverseUnionType(node, ctx);
}
// some other cases to process
return node;
}
private traverseTypeLiteral(node: TypeLiteralNode, ctx: ContextType) {
const { members } = node;
const result: TypeElement[] = [];
for (let i = 0; i < members.length; i += 1) {
const member = members[i];
if (isPropertySignature(member)) {
if (member.type) {
const updatedMember = factory.updatePropertySignature(
member,
member.modifiers,
member.name,
member.questionToken,
this.traverse(
member.type,
this.dive(ctx),
),
);
// jsDoc falls out after being processed through factory.updatePropertySignature(). Putting it back again
// @ts-ignore
updatedMember.jsDoc = member.jsDoc;
result.push(updatedMember);
}
} else {
result.push(member);
}
}
return factory.createTypeLiteralNode(result);
}
private traverseArrayType(node: ArrayTypeNode, ctx: ContextType) {
const { elementType } = node;
const processedElementType = this.traverse(elementType, ctx);
const unionOrIntersection =
isUnionTypeNode(processedElementType) ||
isIntersectionTypeNode(processedElementType);
return factory.createArrayTypeNode(
unionOrIntersection
? factory.createParenthesizedType(processedElementType)
: processedElementType,
);
}
private traverseIntersectionType(
node: IntersectionTypeNode,
ctx: ContextType,
) {
const members = node.types;
const result: TypeNode[] = [];
for (let i = 0; i < members.length; i += 1) {
result.push(this.traverse(members[i], this.dive(ctx)));
}
return factory.updateIntersectionTypeNode(
node,
factory.createNodeArray(result),
);
}
private traverseUnionType(node: UnionTypeNode, ctx: ContextType) {
const members = node.types;
const result: TypeNode[] = [];
for (let i = 0; i < members.length; i += 1) {
result.push(this.traverse(members[i], this.dive(ctx)));
}
return factory.updateUnionTypeNode(
node,
factory.createNodeArray(result),
);
}
private traverseTypeAlias(node: TypeAliasDeclaration, ctx: ContextType) {
const { type } = node;
return this.traverse(type, this.dive(ctx));
}
private dive(
ctx: ContextType,
file?: SourceFile,
): ContextType {
let result = {
...ctx,
depthLevel: ctx.depthLevel + 1,
};
if (file) {
result = {
...result,
file,
};
}
return result;
}
}
The code is licensed under the MIT license

And finally how to run the thing:

import { Project } from './Project';
import { Crawler } from './Crawler';
const project = new Project('/home/sergei/projects/ui-library/');
const crawler = new Crawler(project);
crawler.crawl('src/components/Button/Button.tsx');
The code is licensed under the MIT license

All right, that was a brief intro into the TypeScript API. I barely scratched the surface here, yet I hope the information could be a real boost for your future project!


Avatar

Sergei Gannochenko

Business-oriented fullstack engineer, in ❤️ with Tech.
Golang, React, TypeScript, Docker, AWS, Jamstack.
20+ years in dev.