To AST and back: magically crawl and transform your code with TypeScript API
Some time ago I was solving a challenge. I had to write an application that crawls code and reports certain patterns. Obviously, I wanted this done like a pro, avoiding writing some insane regular expressions or something like that. That is why I was evaluating different software that can convert source code into AST (Abstract Syntax Tree), so later on I could traverse the node graph to find everything I wanted.
At the beginning I was evaluating a really good piece of software called Espree. While the project is amazing, sadly I faced some problems parsing JSX. And then I asked myself: «I am using TypeScript anyway, why don't I check out what it has to offer?»
If you already have TypeScript in your project, you don't have to install anything. It is just the same package, you will probably have to move it from dev dependencies to production dependencies.
yarn add typescript
First things first, TypeScript API needs to have an instance of a program. For that matter, there is the createProgram factory.
import { createProgram } from 'typescript';const indexFile = '/home/sergei/projects/ui-library/src/index.ts';const program = createProgram([indexFile], {allowJs: true,jsx: JsxEmit.React,});
As you can see, the createProgram function accepts a list of source file names. There is also a second parameters that contains an option list. The options are matching the corresponding ones from a regular tsconfig.json.
To get access to actual content, do the following:
const nodes = program.getSourceFile()?.statements ?? [];
If I print the contents of the node variable, I may get something like that:
[NodeObject {pos: 0,end: 65,flags: 0,modifierFlagsCache: 0,transformFlags: 0,parent: undefined,kind: 262,decorators: undefined,modifiers: undefined,symbol: undefined,localSymbol: undefined,locals: undefined,nextContainer: undefined,importClause: NodeObject {pos: 6,end: 51,flags: 0,modifierFlagsCache: 0,transformFlags: 0,parent: undefined,kind: 263,isTypeOnly: false,name: undefined,namedBindings: [NodeObject]},moduleSpecifier: TokenObject {pos: 56,end: 64,flags: 0,modifierFlagsCache: 0,transformFlags: 0,parent: undefined,kind: 10,text: 'react',singleQuote: undefined,hasExtendedUnicodeEscape: false}},...]
The tree consists of nodes, each node has a type of Node or its descendant. A node has at least the following types, that can be interesting:
- kind - indicates what kind of node is that
- pos - where the code chunks represented by a node starts in the file
- end - where it ends
- parent - parent node, if any
Plus, almost each time there are type-specific properties containing sub-nodes.
The kind field holds a numeric value. Basically, for each symbol, type or syntax construct there is a corresponding kind value. However, I don't have to use the field itself to find out what type of node I got. For that purpose TypeScript offers a bunch of checkers.
For instance, to check if a node represents a variable declaration:
import { Node, VariableStatement, isVariableStatement } from 'typescript';const doSomething = (variableNode: VariableStatement) => { /* do something cool */ };if (isVariableStatement(node)) {doSomething(node); // typescript already knows here that node is of type VariableStatement}
The source file, in its turn, is also a Node of a different kind. Thus, it contains other useful custom properties:
path - holds an absolute path to that file.
resolvedModules - a list of all imports made in a file, with all paths resolved to absolute.
That is very useful, since I won't have to do any manual work here:
Map(7) {'react' => {resolvedFileName: '/home/sergei/projects/ui-library/node_modules/@types/react/index.d.ts',originalPath: undefined,extension: '.d.ts',isExternalLibraryImport: true,packageId: {name: '@types/react',subModuleName: 'index.d.ts',version: '16.9.46'}},'../type' => {resolvedFileName: '/home/sergei/projects/ui-library/src/components/type.ts',originalPath: undefined,extension: '.ts',isExternalLibraryImport: false,packageId: undefined},...}The code is licensed under the MIT licenseThis information can be used later on to get other source files and parse them too.
The content of the source file is immutable by its nature. One can not just simply make an assignment (well, with @ts-ignore it is technically possible, but would totally defeat the concept).
Instead, there is a set of special methods available. Each method allows creating/modifying a node of a specific type. For example the following code makes the first property optional:
import { factory, isTypeLiteralNode, isPropertySignature, SyntaxKind } from 'typescript';if (isTypeLiteralNode(node)) {let firstMember = node.members[0];if (isPropertySignature(firstMember)) {firstMember = factory.updatePropertySignature(member,members.modifiers,members.name,factory.createToken(SyntaxKind.QuestionToken),member.type);}const newNode = factory.createTypeLiteralNode([firstMember, ...node.members.slice(1)]);}
Note that the update method does not make changes in place, it rather returns a new modified instance.
And now the coolest part. The tree can be converted back to the actual code! It is extremely powerful, because I can make amends in the tree (for example turn all const into let) and then get the updated source code, which I can save for later usage.
Here is how it is done.
import {createPrinter,EmitHint,NewLineKind,Node,} from 'typescript';const printer = createPrinter({newLine: NewLineKind.LineFeed,removeComments: false,});export const print = (node: Node) =>printer.printNode(EmitHint.Unspecified,node,// @ts-ignore'',);console.log(print(myAst));
Yeh, you may notice a small @ts-ignore. This is because the printer needs to know the exact file name it is printing. In my case it worked with the file name set to an empty string.
I am able to print different kinds of nodes: a function declaration or a type declaration. It does not make any difference.
Okay, with all that being said, let me show you a really simple script that just traverses a tree in depth with an ability to propagate the possible changes back.
First of all, I have created a abstractions on top of the program and source files. For me it was reasonable, since, because I had additional logic to implement. That logic is omitted here, because of too business-specific nature.
import {NodeArray,SourceFile as TSSourceFile,TypeNode,} from 'typescript';export class SourceFile {constructor(private sourceFile: TSSourceFile) {}public get nodes() {// yeh, something is still not well-settled, I knowreturn (this.sourceFile.statements as unknown) as NodeArray<TypeNode>;}// ... some additional logic could be here}
import { createProgram, JsxEmit, Program } from 'typescript';import { join } from 'path';import { SourceFile } from './SourceFile';export class Project {private program: Program;private knownFiles: Record<string, SourceFile> = {};constructor(private projectFolder: string) {const rootFile = join(this.projectFolder,'src/components/index.ts',);this.program = createProgram([rootFile], {// these options are the same as the ones in tsconfig.jsonallowJs: true,jsx: JsxEmit.React,});}public getSourceFile(fileName: string) {if (!(fileName in this.knownFiles)) {const file = this.program.getSourceFile(join(this.projectFolder,fileName,));if (file) {this.knownFiles[fileName] = new SourceFile(file);}}return this.knownFiles[fileName] ?? null;}// ... some additional logic could be here}
Now the main class:
import {isArrayTypeNode,isTypeAliasDeclaration,isTypeLiteralNode,isIntersectionTypeNode,isUnionTypeNode,isPropertySignature,ArrayTypeNode,factory,TypeNode,TypeReferenceNode,TypeAliasDeclaration,TypeLiteralNode,TypeElement,IntersectionTypeNode,UnionTypeNode,} from 'typescript';import debug from 'debug';import { Project } from './Project';import { SourceFile } from './SourceFile';const MAX_TRAVERSE_DEPTH = 10;type ContextType = {depthLevel: number;file: SourceFile;};const d = debug('Crawler');export class Crawler {constructor(private project: Project) {}public crawl(fileName: string,) {const sourceFile = this.project.getSourceFile(fileName);if (sourceFile) {this.traverse(sourceFile, {depthLevel: 0,file: sourceFile,});}}private traverse(node: TypeNode, ctx: ContextType): TypeNode {const { depthLevel } = ctx;if (depthLevel > MAX_TRAVERSE_DEPTH) {return node;}if (isArrayTypeNode(node)) {return this.traverseArrayType(node, ctx);}if (isTypeAliasDeclaration(node)) {return this.traverseTypeAlias(node, ctx);}if (isTypeLiteralNode(node)) {return this.traverseTypeLiteral(node, ctx);}if (isIntersectionTypeNode(node)) {return this.traverseIntersectionType(node, ctx);}if (isUnionTypeNode(node)) {return this.traverseUnionType(node, ctx);}// some other cases to processreturn node;}private traverseTypeLiteral(node: TypeLiteralNode, ctx: ContextType) {const { members } = node;const result: TypeElement[] = [];for (let i = 0; i < members.length; i += 1) {const member = members[i];if (isPropertySignature(member)) {if (member.type) {const updatedMember = factory.updatePropertySignature(member,member.modifiers,member.name,member.questionToken,this.traverse(member.type,this.dive(ctx),),);// jsDoc falls out after being processed through factory.updatePropertySignature(). Putting it back again// @ts-ignoreupdatedMember.jsDoc = member.jsDoc;result.push(updatedMember);}} else {result.push(member);}}return factory.createTypeLiteralNode(result);}private traverseArrayType(node: ArrayTypeNode, ctx: ContextType) {const { elementType } = node;const processedElementType = this.traverse(elementType, ctx);const unionOrIntersection =isUnionTypeNode(processedElementType) ||isIntersectionTypeNode(processedElementType);return factory.createArrayTypeNode(unionOrIntersection? factory.createParenthesizedType(processedElementType): processedElementType,);}private traverseIntersectionType(node: IntersectionTypeNode,ctx: ContextType,) {const members = node.types;const result: TypeNode[] = [];for (let i = 0; i < members.length; i += 1) {result.push(this.traverse(members[i], this.dive(ctx)));}return factory.updateIntersectionTypeNode(node,factory.createNodeArray(result),);}private traverseUnionType(node: UnionTypeNode, ctx: ContextType) {const members = node.types;const result: TypeNode[] = [];for (let i = 0; i < members.length; i += 1) {result.push(this.traverse(members[i], this.dive(ctx)));}return factory.updateUnionTypeNode(node,factory.createNodeArray(result),);}private traverseTypeAlias(node: TypeAliasDeclaration, ctx: ContextType) {const { type } = node;return this.traverse(type, this.dive(ctx));}private dive(ctx: ContextType,file?: SourceFile,): ContextType {let result = {...ctx,depthLevel: ctx.depthLevel + 1,};if (file) {result = {...result,file,};}return result;}}
And finally how to run the thing:
import { Project } from './Project';import { Crawler } from './Crawler';const project = new Project('/home/sergei/projects/ui-library/');const crawler = new Crawler(project);crawler.crawl('src/components/Button/Button.tsx');
All right, that was a brief intro into the TypeScript API. I barely scratched the surface here, yet I hope the information could be a real boost for your future project!
Sergei Gannochenko
Golang, React, TypeScript, Docker, AWS, Jamstack.
20+ years in dev.