Intro
Bundling is an indispensable part of building modern Javascript app. Webpack, Rollup, Parcel-bunder are some of the big name bundlers. For the most part, bundling has been a magical process: just give the bundler the entry, the output, add some other config, and POOF! - suddenly your bundle.js is ready.
In this post, I will explain what a bundler is and why it is a good thing to use one - we will do it by creating one from scratch.
What a bundler is and why we need it
A bundler is a tool that puts your entry code along with all its dependencies together in one JS file.
Why would we want to use it? Can't we just upload the entire files and directories of our project and not go through extra step?
Here are two reasons:
- Javascript initially had no standard/ built-in module system.
import
andexport
syntax is a recent convention in ES6. Not all browser supports it yet. - It is better to put everything together in one bundled file. Imagine a project where we have 5 different JS files. The client will have to make 5 requests to your server (or CDN, or both - btw, it is even better to bundle them and put them in CDN.) - but that is still 4 extra requests that client could have avoided if our project was bundled up in one JS file (client will only have to make one request). More requests = more overhead.
I hope these are enough reasons to want to use a bundler. Let's move on to understanding how a bundler works
Best way to understand how something works is to build/ tinker it.
Building bundler
Before we start, let's go through the basic of what our project will look like.
Introducing Bandler. The tiniest, cutest, awesomest bundler you have ever seen (ok, you can name it whatever. That's just what I named my bundler).
Bandler will have a structure like this:
entry.js
-> module1.js
-> module2.js
The entry will be called entry.js
. It will have one dependency, module1.js
, which has a dependency, module2.js
.
Our project will use ES6 module syntax(import
/export
). Our task is to extend the module support to older browser. We have to transpile the ES6 syntax into something all/ most browsers can understand.
Here are 8 steps how should do it:
- Read content of
entry.js
- Parse that content and make a list of all
import
declarations - Transpile the content from step 1 from ES6 to ES5
- Assign each dependency file with unique ID to be referenced later (for example, if we use
import module1 from './module1.js'
in entry,./module1.js
is a dependency and we will map this with a unique ID) - Put all of the info from steps 2-4 in one object
- Create a 'dependency graph' (by iterating through all dependencies, all dependencies of each dependency, and so on; repeat steps 1-5)
- Pack everything in step 6 together
- Celebrate because our bundler is done! 🎊🙌
If it looks complicated, don't worry, because it is not.
Starting Project
In this section we'll do the setup: start a new directory for our project, cd
into it, and install some libraries.
mkdir bundler-playground && cd $_
Start npm project.
npm init -y
Install some additional libraries:
@babel/parser
to parse our code and returns an AST object@babel/traverse
to traverse/ walk through our AST object; this will help us look for all import declarations@babel/core
to transpile ES6 -> ES5resolve
to get full path of each dependency (ex: turn./module1.js
into something like/User/iggy/project/bundler-playground/module1.js
)
npm install --save @babel/parser @babel/traverse @babel/core resolve
Create a new index.js
in root, and add import these guys:
const fs = require("fs");
const path = require("path");
const parser = require("@babel/parser");
const traverse = require("@babel/traverse").default;
const babel = require("@babel/core");
const resolve = require("resolve").sync;
Get module info
In this section, we will:
- Assign a particular
filePath
with unique ID (to be referenced later) - Get all dependencies used by this file (list all
import
s used) - Transpile ES code
Here is the code for this section.
let ID = 0;
function createModuleInfo(filePath) {
const content = fs.readFileSync(filePath, "utf-8");
const ast = parser.parse(content, {
sourceType: "module"
});
const deps = [];
traverse(ast, {
ImportDeclaration: ({ node }) => {
deps.push(node.source.value);
}
});
const id = ID++;
const { code } = babel.transformFromAstSync(ast, null, {
presets: ["@babel/preset-env"]
});
return {
id,
filePath,
deps,
code
};
}
We got the file content using readFileSync()
. Then we parsed the content to get AST information. Once AST is acquired, we traversed the AST and look for all import instances using ImportDeclaration
API. Lastly, we transpiled our code from ES6 using babel core's transformFromAstSync
.
For ID, we used a simple incrementing number (it's better to use random GUID, since it is a demo, ID++
will do)
With this, we have ourselves a nifty module information consisting of a unique ID, list of all dependencies (all imports), and the code inside that module. Next, we iterate the process for all relevant modules to create a dependency graph.
Creating Dependency Graph
Dependency graph is a collection of interrelated modules used in our app, starting from entry point.
Here is a code for this section.
function createDependencyGraph(entry) {
const entryInfo = createModuleInfo(entry);
const graphArr = [];
graphArr.push(entryInfo);
for (const module of graphArr) {
module.map = {};
module.deps.forEach(depPath => {
const baseDir = path.dirname(module.filePath);
const moduleDepPath = resolve(depPath, { baseDir });
const moduleInfo = createModuleInfo(moduleDepPath);
graphArr.push(moduleInfo);
module.map[depPath] = moduleInfo.id;
});
}
return graphArr;
}
We will be using an array type for our dependency graph. We start by pushing our entry info first.
Then we iterate through dependency graph elements (starting with entry).
const baseDir = path.dirname(module.filePath);
const moduleDepPath = resolve(depPath, { baseDir });
const moduleInfo = createModuleInfo(moduleDepPath);
graphArr.push(moduleInfo);
Here we use path.dirname
and resolve
to get full path of each module, get the info using the full path, and push that info into our dependency graph array.
Note these lines:
module.map = {};
...
module.map[depPath] = moduleInfo.id;
Here we add an additional attribute map
inside our moduleInfo
object. This attribute will be used on next step as a lookup to map each module with unique identifier. For example:
| module | ID | |------------|----| | entry.js | 0 | | module1.js | 1 | | module2.js | 2 | | etc | n |
In the end, we end up with an array of module infos of all dependency used in the entire project.
Packing them all together
Now that we have dependency graph, the last step is to pack them together.
function pack(graph) {
const moduleArgArr = graph.map(module => {
return `${module.id}: {
factory: (exports, require) => {
${module.code}
},
map: ${JSON.stringify(module.map)}
}`;
});
const iifeBundler = `(function(modules){
const require = id => {
const {factory, map} = modules[id];
const localRequire = requireDeclarationName => require(map[requireDeclarationName]);
const module = {exports: {}};
factory(module.exports, localRequire);
return module.exports;
}
require(0);
})({${moduleArgArr.join()}})
`;
return iifeBundler;
}
First, we create a factory pattern over the code of each module. It pass an export
and require
. Keep these 2 arguments in mind. We are keeping the map from previous step.
return `${module.id}: {
factory: (exports, require) => {
${module.code}
},
map: ${JSON.stringify(module.map)}
}`;
Second, we created an IIFE to run the entire dependency graphs together. The next part might be confusing - I struggled to understand this part initially, but with patience, it will make sense!
const iifeBundler = `(function(modules){
const require = id => {
const {factory, map} = modules[id];
const localRequire = requireDeclarationName => require(map[requireDeclarationName]);
const module = {exports: {}};
factory(module.exports, localRequire);
return module.exports;
}
require(0);
})({${moduleArgArr.join()}})
`;
- We are using IIFE pattern to scope the variables so they do not affect global variables
- The dependency graph we created earlier section is being passed as the argument (
${moduleArgArr.join()}
) - That dependency graph is being passed inside IIFE as
modules
- We created a
require(id)
function. This function has two effects:
- It recursively calls its own with the ID of other dependencies via
require(map[requireDeclarationName])
. This translates to something likerequire(1)
- recalling the mapping function earlier, turns intorequire('./module1.js')
- It executes the actual code from step 1 (createModuleInfo) step when it runs
factory(module.exports, localRequire)
- This function returns
module.exports
- although it is initially empty ({exports: {}}
), after runningfactory()
, the value of thismodule.exports
is theexports
value insidefactory
we created earlier (think about it)
Code Repo
The final code for this blog can be found here to compare code.
The full code will look something like this:
const fs = require("fs");
const path = require("path");
const parser = require("@babel/parser"); // parses and returns AST
const traverse = require("@babel/traverse").default; // AST walker
const babel = require("@babel/core"); // main babel functionality
const resolve = require("resolve").sync; // get full path to dependencies
let ID = 0;
/*
* Given filePath, return module information
* Module information includes:
* module ID
* module filePath
* all dependencies used in the module (in array form)
* code inside the module
*/
function createModuleInfo(filePath) {
const content = fs.readFileSync(filePath, "utf-8");
const ast = parser.parse(content, {
sourceType: "module"
});
const deps = [];
traverse(ast, {
ImportDeclaration: ({ node }) => {
deps.push(node.source.value);
}
});
const id = ID++;
const { code } = babel.transformFromAstSync(ast, null, {
presets: ["@babel/preset-env"]
});
return {
id,
filePath,
deps,
code
};
}
/*
* Given entry path,
* returns an array containing information from each module
*/
function createDependencyGraph(entry) {
const entryInfo = createModuleInfo(entry);
const graphArr = [];
graphArr.push(entryInfo);
for (const module of graphArr) {
module.map = {};
module.deps.forEach(depPath => {
const baseDir = path.dirname(module.filePath);
const moduleDepPath = resolve(depPath, { baseDir });
const moduleInfo = createModuleInfo(moduleDepPath);
graphArr.push(moduleInfo);
module.map[depPath] = moduleInfo.id;
});
}
return graphArr;
}
/*
* Given an array containing information from each module
* return a bundled code to run the modules
*/
function pack(graph) {
const moduleArgArr = graph.map(module => {
return `${module.id}: {
factory: (exports, require) => {
${module.code}
},
map: ${JSON.stringify(module.map)}
}`;
});
const iifeBundler = `(function(modules){
const require = id => {
const {factory, map} = modules[id];
const localRequire = requireDeclarationName => require(map[requireDeclarationName]);
const module = {exports: {}};
factory(module.exports, localRequire);
return module.exports;
}
require(0);
})({${moduleArgArr.join()}})
`;
return iifeBundler;
}
console.log("***** Copy code below and paste into browser *****");
/* create dependency graph */
const graph = createDependencyGraph("./entry.js"); // wherever your entry is
/* create bundle based on dependency graph */
const bundle = pack(graph);
console.log(bundle);
console.log("***** Copy code above and paste into browser *****");
If we run node ./index.js
, we'll get something like
(function(modules){
const require = id => {
const {factory, map} = modules[id];
const localRequire = requireDeclarationName => require(map[requireDeclarationName]);
const module = {exports: {}};
factory(module.exports, localRequire);
return module.exports;
}
require(0);
})({0: {
factory: (exports, require) => {
"use strict";
var _module = _interopRequireDefault(require("./module1.js"));
var _module2 = _interopRequireDefault(require("./module2.js"));
function _interopRequireDefault(obj) { return obj && obj.__esModule ? obj : { "default": obj }; }
(0, _module["default"])();
(0, _module2["default"])();
},
map: {"./module1.js":1,"./module2.js":2}
},1: {
factory: (exports, require) => {
"use strict";
Object.defineProperty(exports, "__esModule", {
value: true
});
exports["default"] = void 0;
var _module = _interopRequireDefault(require("./module2.js"));
function _interopRequireDefault(obj) { return obj && obj.__esModule ? obj : { "default": obj }; }
var module1 = function module1() {
(0, _module["default"])();
console.log("hello from module1!");
};
var _default = module1;
exports["default"] = _default;
},
map: {"./module2.js":3}
},2: {
factory: (exports, require) => {
"use strict";
Object.defineProperty(exports, "__esModule", {
value: true
});
exports["default"] = void 0;
var module2 = function module2() {
console.log("Hello from module2!");
};
var _default = module2;
exports["default"] = _default;
},
map: {}
},3: {
factory: (exports, require) => {
"use strict";
Object.defineProperty(exports, "__esModule", {
value: true
});
exports["default"] = void 0;
var module2 = function module2() {
console.log("Hello from module2!");
};
var _default = module2;
exports["default"] = _default;
},
map: {}
}})(function(modules){
const require = id => {
const {factory, map} = modules[id];
const localRequire = requireDeclarationName => require(map[requireDeclarationName]);
const module = {exports: {}};
factory(module.exports, localRequire);
return module.exports;
}
require(0);
})({0: {
factory: (exports, require) => {
"use strict";
var _module = _interopRequireDefault(require("./module1.js"));
var _module2 = _interopRequireDefault(require("./module2.js"));
function _interopRequireDefault(obj) { return obj && obj.__esModule ? obj : { "default": obj }; }
(0, _module["default"])();
(0, _module2["default"])();
},
map: {"./module1.js":1,"./module2.js":2}
},1: {
factory: (exports, require) => {
"use strict";
Object.defineProperty(exports, "__esModule", {
value: true
});
exports["default"] = void 0;
var _module = _interopRequireDefault(require("./module2.js"));
function _interopRequireDefault(obj) { return obj && obj.__esModule ? obj : { "default": obj }; }
var module1 = function module1() {
(0, _module["default"])();
console.log("hello from module1!");
};
var _default = module1;
exports["default"] = _default;
},
map: {"./module2.js":3}
},2: {
factory: (exports, require) => {
"use strict";
Object.defineProperty(exports, "__esModule", {
value: true
});
exports["default"] = void 0;
var module2 = function module2() {
console.log("Hello from module2!");
};
var _default = module2;
exports["default"] = _default;
},
map: {}
},3: {
factory: (exports, require) => {
"use strict";
Object.defineProperty(exports, "__esModule", {
value: true
});
exports["default"] = void 0;
var module2 = function module2() {
console.log("Hello from module2!");
};
var _default = module2;
exports["default"] = _default;
},
map: {}
}})
Copy/Paste that into browser and you'll see
Hello from module2!
hello from module1!
Hello from module2!
Congratulations! We have just built an entire bundler... from scratch!!
Bonus
In addition to creating an ES6 bundler, I attempted to create a bundler that bundles either CJS and ES6, Bandler (NPM)
I won't go too deep here - but in addition to using babel parser and babel traverse, I used detective
library that specifically searches and lists all CJS requires (ex: require('./your/lib.js')
) instances from a project. I saw that Babel does not have CJS syntax declaration here.
Can you think of some other ways to make CJS and ES6 bundler?
Resources, links, etc
Popular bundlers
Inspirations for this post
Readings on bundlers
- bundler overview
- create your own bundler - creator of wbpck-bundler mentioned above
- small list of popular js bundlers
- (Yet another) list of build tools