Learn Node.js, Unit 3: A tour of Node.js
Node is often described as “JavaScript on the server”, but that doesn’t quite do it justice. In fact, any description of Node.js I can offer will be unfairly reductionist, so let me start with the one provided by the Node team:
“Node.js is a JavaScript runtime built on Chrome’s V8 JavaScript engine.” (
Node.js Learning Path
This tutorial is part of the Node.js Learning Path. The units build on each other, so check out Learn Node.js, Unit 1: Overview of Node.js Learning Path to start at the beginning.That’s a fine description, but it kinda needs a picture, doesn’t it? If you look on the Node.js website, you’ll notice there are no high-level diagrams of the Node.js architecture. Yet, if you search for “Node.js architecture diagram” there are approximately 178 billion different diagrams that attempt to paint an overall picture of Node (I’ll refer to Node.js as Node from now on). After looking at a few of them, I just didn’t see one that fit with the way I’ve structured the material in this course, so I came up with this:
Figure 1. The Node.js architecture stack
In the following sections, I use Figure 1 as the basis for discussion. Let’s look at each of these, starting with the JavaScript runtime.
But first, a little housekeeping.
How to read this document
In short: iteratively, and as many times as necessary before it makes sense. You owe it to yourself to understand the concepts presented here because I build on them in units 4 and 5. A good understanding of what makes Node tick will make you a better Node developer.
Node with its asynchronous architecture is challenging to describe because in order to accurately describe some parts, it’s necessary to talk about other parts that haven’t yet been described. For example, to describe callbacks (we’ll get to that), I need to talk about what they’re for, and how the Event Loop (we’ll get to that, too) invokes them when it’s finished doing behind the scenes whatever you’ve asked it to do through the Node API.
Event-driven, asynchronous architectures are inherently circular in nature (hence the Event Loop), as are their descriptions (the chicken and the egg come to mind). Just be patient, and most importantly: learn iteratively, and the concepts will become clear.
Please allow me to make the following suggestion: to be the best Node developer you can be, work through this unit, then Units 4 and 5. Then come back to this unit and work through it again to make sure you have a strong understanding of the concepts.
A word about ECMAScript
The European Computer Manufacturers Association is the IT standards body responsible for many standards, including ECMAScript, of which JavaScript is the most popular implementation.
The ECMAScript standard (also know as ECMA-262) is currently ES2017 (short for Ecma Script 2017). Node supports ES2016, which is also called ES6, since it was the sixth version of the standard to be released. To learn more about ECMAScript check out the Wikipedia page.
Chrome’s V8 JavaScript Engine supports the current ECMA-262 (ES2017, aka ES8), and Node is always moving towards whatever ECMA-262 version is supported by V8. Node currently (at the time of this writing) supports ECMAScript through ES6 (ES2015), but will always be moving towards the latest version of ECMAScript supported by V8. The node.green page tracks the Node team’s progress in supporting the latest versions of ECMAScript.
While JavaScript is the language we use for writing Node applications, I want you to be aware that the ECMAScript specification governs the evolution of JavaScript as a programming language.
JavaScript in Node.js – maybe not the JavaScript you’re used to
One thing you’ll notice as you begin writing JavaScript applications in Node: on rare occasions, the JavaScript you’re used to writing just doesn’t work as you’d expect.
For example: global variables. I won’t get into the philosophical debate of “should I use global variables?” or not. You can find plenty of information on the web about that and make up your own mind.
Many JavaScript programmers are used to using global variables, and find when they make the switch to Node they don’t seem to work. The reason for this is that every JavaScript file in your Node application is its own scoped entity called a module. (Check out the “module wrapper” section in the Modules API documentation if you want to learn more about how Node does this).
Here’s a situation you might encounter: you have two JavaScript files, A.js
and B.js
, where you declare a variable within A.js
that you expect to be global, and try to reference it within B.js
:
A.js
var someNumber = 238;
B.js
function sayIt() {
alert('The value of the global variable "someNumber" is: ' + someNumber);
}
someNumber.html
<script src="./A.js"></script>
<script src="./B.js"></script>
<script>
// Invoke the sayIt() function from B.js:
sayIt();
</script>
When I run this in Chrome (Version 67.0.3396.87), I see the alert, just as I would expect:
However, this doesn’t work in Node. The someNumber
variable is scoped within A.js
and is not visible in B.js
. I’ll show you why it doesn’t work later in the course.
There are a handful of these subtle differences in the way you write JavaScript code for the browser, versus how you write JavaScript for Node. When we run across those in the course, I’ll make sure and point them out.
Now let’s get back to Figure 1.
The Node runtime
Node’s runtime is another term for the set executable programs that actually run your Node applications, and is a combination of (middle of the stack, see Figure 1):
- Node API: JavaScript utilities like file and network I/O, and a whole host of others, like cryptography and compression
- The Node core: a set of JavaScript modules that implement the Node API. (Apparently some of the modules depend on libuv and other C++ code but that’s an implementation detail).
- JavaScript engine: Chrome’s V8 Engine: A fast JavaScript-to-machine code compiler to load, optimize, and run your JavaScript code
- The event loop: implemented using an event-driven, non-blocking I/O library called libuv to make it lightweight and efficient (and scalable)
Node API
The Node API is a set of built-in modules provided by Node.js out of the box for you to build your applications. Many of these modules, like the File System (fs) API, sit atop lower-level programs (the Node Core) that communicate with the underlying OS. When you use a Node API built-in module, you include it in your project by require()
ing it in your code, and then invoke its functions.
Some built-in modules you use in this course include:
The Node API sits atop the Node Core, which we’ll talk about next.
Node Core
The Node Core is the Node API plus a C++ program, built on multiple libraries, that bind with libuv (which we’ll talk about shortly), and the JavaScript engine (Chrome V8, which we’ll talk about in the next section).
Infrastructure
The Node runtime’s infrastructure is comprised of two major components:
- JavaScript engine
- A non-blocking I/O library
JavaScript engine
The JavaScript engine used by Node is Chrome’s V8 engine, which runs all of the JavaScript code (yours, the Node API’s, and any JavaScript in packages you get from the npm registry). When you start Node, it runs a single instance of the V8 engine. This may seem like a severe limitation, but Node makes it work (very well, actually), as you’ll see.
The V8 engine can be embedded (or bound) into any C++ program like Node, or a web browser like Chrome. This means that in addition to the pure JavaScript library, V8 can be extended to create brand new functions (or function templates) by binding them with V8. When a new function is registered, a pointer to a C++ method is passed, and when V8 runs across one of these new, custom JavaScript functions, it invokes the corresponding Template method. This is how many of the I/O functions of they Node API are implemented by the Node Core (just in case you were wondering).
Just V8 as the JavaScript Engine?
In theory, Node can be modified to use any JavaScript engine, but V8 is the one you will use by default (and to be perfectly honest, it’s pretty tightly bound to V8). That said, others have explored alternatives to V8, and if you’re interested in Node on other leading JavaScript engines, check out the node-chakracore project, and the spidernode project. I expect someday that the JavaScript engine will be a pluggable component of Node (though that day may be a long way off).Event loop
A CPU chip can run your program’s instructions much faster than data can be retrieved from I/O devices like disk or the network. But, without the data these I/O devices provide, your program can’t do its job. Since V8 is running in a single thread, until the data is available, everything (your program, other programs, everything running in that instance – or context – of the V8 engine) is blocked until the I/O operation is complete.
Node uses libuv as the event loop implementation. To use a Node asynchronous API, you pass a callback function as an argument to that API function, and during the event loop your callback is executed.
The event loop consists of various phases where callbacks are invoked:
- Timers phase:
setInterval()
andsetTimeout()
expired timer callbacks are run - Poll phase: The OS is polled to see if any I/O operations are complete, and, if so, those callbacks are run
- Check phase:
setImmediate()
callbacks are run
The JavaScript code you write executes in one of two “lines” of execution:
- The mainline – this is the JavaScript that runs when Node first runs your program. It will run from start to finish and when it is finished, gives up control to the event loop
- The event loop – this is where all of your callbacks are run
A common misconception is that V8 and event loop callbacks run on different threads. That is not the case. All JavaScript code is run by V8 in the same thread.
Until you get your head wrapped around how Node uses V8 to run your JavaScript code, it will cause you no end of grief when trying to debug weird timing problems. Keep that in mind as you move through the learning path. If reality isn’t matching up with your mental model of the event loop, it’s probably because your mental model is wrong. It’s okay, just keep working at it (remember, learn iteratively).
The thread pool
The event loop is what allows Node to provide its impressive scalability, primarily through its non-blocking I/O model. But let’s suppose the Node API provides some functionality that is not I/O-intensive but, rather, is CPU-intensive. Can the event loop help? Absolutely! libuv uses a pool of threads called the worker pool (a.k.a., the thread pool to offload both I/O and CPU-intensive tasks.
Whenever V8 detects a bound Node API function call, it invokes the Node function template, which passes control to libuv, which can then offload the work to the worker pool. Again, when you make the Node API call, you pass it as a callback, and when the worker pool thread has completed the work, the Event Loop requests that V8 invoke your callback function with the results.
We will look much deeper at the thread pool in Unit 5 of this learning path.
Userland
The term “userland” refers to code that is outside of the platform, runtime, or library that we’re using to run that code. In the context of Node, userland code is anything that is not provided by the Node runtime, such as:
Application – At the top of the stack is your application, which you write in JavaScript using the Node API and (probably) some npm community contribs (packages).
npm contribs – While you can write an entire application from scratch, I don’t recommend it. Instead, you will leverage the work of thousands of your fellow developers in the Node community, whose contributions are the heart and soul of Node. As you become more expert at using Node, you will use these extensively.
We’ll talk about npm (in particular the npm utility) extensively in unit 7.
The REPL
Okay, enough of this theoretical stuff. Are you ready to write some JavaScript code?
When you install node, you automatically get a Read-Eval-Print-Loop (REPL) environment as well. It’s not some kind of Node emulator. It is Node, just a bit more laid back, and is the perfect place to start.
Open a terminal window or command prompt and type node
and press Enter
. You should see something like this:
$ node
>
Not very exciting, I admit. Type .help
and press Enter
and you’ll see this:
> .help
.break Sometimes you get stuck, this gets you out
.clear Alias for .break
.editor Enter editor mode
.exit Exit the repl
.help Print this help message
.load Load JS from a file into the REPL session
.save Save all evaluated commands in this REPL session to a file
>
You can type one of two things into a REPL line that it understands: a line of JavaScript code or one of the above commands. They’re pretty self-explanatory, but if you want to learn more about them, check out the Node REPL API doc.
Enter the following lines into the REPL, one at a time (make sure to press the Enter
key between each line):
Example 1. Hello World
let hello = "Hello"
let world = "world"
hello + ' ' + world
You should see the following output:
> let hello = "hello"
undefined
> let world = "world"
undefined
> hello + ' ' + world
'hello world'
>
The REPL prints undefined
for the first two statements (since they don’t evaluate to anything). When you type the expression hello + ' ' + world
the REPL evaluates that to the String literal 'hello world'
.
To exit the REPL. Type .exit
and press Enter
.
The REPL has an “editor” mode. Go back into the REPL, then type .editor
and press Enter
. Then enter the Example 1 code again. When you’re ready for the REPL to run it, press Ctrl+D
. You’ll see output like this:
$ node
> .editor
// Entering editor mode (^D to finish, ^C to cancel)
let hello = "Hello"
let world = "world"
hello + ' ' + world
'Hello world'
>
If you need to enter a multi-line statement, the REPL detects this when it runs across an unmatched left brace ({
) and prints an Ellipsis (...
) to indicate this. For example, enter the following code one line at at ime into the REPL:
Example 2. Multiline statements
var array = ['Hello' , ' ', 'there', ' ', 'REPL'];
var message = '';
for (let word of array) {
message += word;
}
message += '!';
You should see output like this:
$ node
> var array = ['Hello' , ' ', 'there', ' ', 'REPL'];
undefined
> var message = '';
undefined
> for (let word of array) {
... message += word;
... }
'Hello there REPL'
> message += '!';
'Hello there REPL!'
>
If you want to run a script for which you have a JavaScript file, you can use the .load
command to load it into the REPL.
Exit the REPL, navigate to the directory where you cloned the source code (see Unit 2 if you need a refresher) for the course, navigate to the IBM-Developer/Node.js/Course/Unit-3
directory, and then start the REPL (I cloned the code in the src/projects
directory immediately subordinate to my home folder):
$ cd src/projects/IBM-Developer/Node.js/Course/Unit-3/
$ node
> .load example2.js
var array = ['Hello' , ' ', 'there', ' ', 'REPL'];
var message = '';
for (let word of array) {
message += word;
}
message += '!';
'Hello there REPL!'
>
Note: When you run the .load
command on example2.js
, you’ll see an Apache 2.0 copyright statement that I’ve removed for space considerations from the listing above to save space.
It’s time to write your first program that uses the Node API. In this case, the File System API.
Exit and then start the REPL again to get a fresh Node instance.
Enter the following lines of code into the REPL, either one line at a time or in editor mode (or you can load it from example3.js
using the .load
command):
var fs = require('fs');
var fileContents = fs.readFileSync('../data/50Words.txt', 'utf8');
var numberOfWords = fileContents.split(/[ ,.\n]+/).length;
Have the REPL evaluate the numberOfWords
variable, then the fileContents
variable. You should see output like this (I’m using editor mode for this example):
$ node
> .editor
// Entering editor mode (^D to finish, ^C to cancel)
var fs = require('fs');
var fileContents = fs.readFileSync('../data/50Words.txt', 'utf8');
var numberOfWords = fileContents.split(/[ ,.\n]+/).length;
undefined
> numberOfWords
51
> fileContents
'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent scelerisque libero nec nulla aliquet, faucibus efficitur massa sollicitudin. Aliquam hendrerit hendrerit est, sed dictum lectus. Nulla eu placerat elit, at volutpat dui. Mauris gravida tortor quis tempus posuere. Vestibulum ultrices leo quis nisi suscipit pellentesque. Nullam congue maximus odio, eu.'
>
As you can see, the REPL prints the value of numberOfWords
and fileContents
when you ask it to (note: the return value of readFileSync()
is a String containing the file’s contents). What do you think happens if you ask the REPL to evaluate the fs
variable? If you answered that it will print the best possible string representation of the object in JavaScript Object Notation (JSON) format, you are correct. Go ahead and try it (I won’t show the output here to save space).
To summarize the REPL, it’s:
- A non-graphical, interactive environment
- Good for learning and prototyping
For more information about the REPL, see the REPL API documentation.
Non-Blocking I/O
We’ve looked briefly at the non-blocking I/O model and how the event loop invokes the callback you provide when it finishes whatever operation you’ve asked the Node API to perform.
In this section, you write some code. I recommend you browse the source code, and run the examples in VSCode. Make sure in VSCode to File > Open
on the directory that corresponds to the Unit you’re working on (Unit 3 in this case), and the examples will run just fine in VSCode. If you don’t, you’ll have to adjust the first argument to fs.readFile()
in the source code accordingly.
Asynchronous I/O
- In VSCode, choose File > Open, and select the
IBM-Developer/Node.js/Course/Unit-3
directory. Then openexample3.js
, which looks like this (I’ve removed the comments to save space, and added line numbers to the listing below):
Example 3. Read a file (synchronously)
01 var fs = require('fs');
02 console.log('Starting program...');
03 var fileContents = fs.readFileSync('../data/50Words.txt', 'utf8');
04 var numberOfWords = fileContents.split(/[ ,.\n]+/).length;
05 console.log('There are ' + numberOfWords + ' words in this file.');
06 console.log('Program finished.');
This output is produced (I’ve elided the prologue lines from the VSCode DEBUG OUTPUT window to save space):
. . .
Debugger attached.
Starting program...
There are 51 words in this file.
Program finished.
Again, let’s walk through the (relevant) code.
- Line 2: The
console.log()
call on the mainline shows up on the console, just as we would expect. - Line 3: Ask Node to read the file synchronously, so the V8 thread (of which there is but one, remember) will be blocked until the file has been read, and the file’s contents are returned from the
readFileSync()
call. - Lines 5-6: The
console.log()
statement on line 5 is executed next, followed by another on line 6.
So, that explains the order of the output. Contrast this to example4.js
below. Again, I’ve elided the comments to save space and added line numbers:
Example 4. Read a file (asynchronously)
```
01 var fs = require('fs');
02 console.log('Starting program...');
03 fs.readFile('../data/50Words.txt', 'utf8', function(err, fileContents) {
04 if (err) throw err;
05 let numberOfWords = fileContents.split(/[ ,.\n]+/).length;
06 console.log('There are ' + numberOfWords + ' words in this file.');
07 });
08 console.log('Program finished');
```
Click the Debug View on the left side of VSCode, and make sure
example4.js
is open and active in the editor.Then click the Run button. The Debugger window pops up at the bottom of VScode, and you should see output like this in the
DEBUG CONSOLE
window:/Users/sperry/.nvm/versions/node/v10.4.1/bin/node --inspect-brk=20417 example4.js Debugger listening on ws://127.0.0.1:20417/de1172f6-f226-44c7-9d0d-fd8bd1691544 For help, see: https://nodejs.org/en/docs/inspector Debugger attached. Starting program... Program finished There are 51 words in this file.
What’s going on? Let’s walk through example4.js
from top to bottom and see.
- Line 1: A reference to the
fs
module is retrieved through therequire()
function - Line 2: The
console.log()
call on the mainline shows up on the console, just as we would expect. - Line 3: The mainline call to
fs.readFile()
is made, passing the file to be read (../data/50Words.txt
), the file’s encoding ('utf8'
), and an anonymous callback function, which will be executed after the file has been read. - Line 8: Since all code on the mainline executes from start to finish before the event loop runs, the
console.log()
on line 8 executes on the mainline. - Lines 4-6: When the file has been read, the event poop invokes the callback, which eventually calls the
console.log()
on line 6.
I encourage you to play around with examples 3 and 4, run them, and get a good feeling for how they’re different, and why the console output comes out differently.
When to use synchronous I/O
Synchronous I/O blocks the V8 thread until the I/O operation completes, as you saw in example 3.
Why would you ever use an I/O call that blocks the V8 thread?
Well, there are times when doing synchronous I/O is just fine. In fact, more often than not, synchronous I/O is faster than asynchronous I/O because of the overhead involved in setting up and using callbacks, polling the OS for I/O status, and so on.
Let’s suppose you’re using Node to write a one-off utility that just processes a file (you do this in Unit 6). You fire up Node from the command line and pass it your JavaScript utility’s file name. Your utility is the only thing running, so if it blocks the V8 thread, who cares? In this case, using synchronous I/O is fine.
I suggest this rule of thumb: If other code needs to run in the background while your I/O operation is running, use asynchronous I/O. If not, use synchronous I/O. If you’re not sure, use asynchronous I/O. In fact, the Node design philosophy is that “an API should always be asynchronous even where it doesn’t have to be“.
Use synchronous Node API calls conscientiously, and you’ll be fine.
npm package ecosystem
In the early Node days, npm was an acronym that stood for the Node Package Manager. It’s probably still okay to think of npm that way (although to be clear, npm is NOT an acronym), but along the way, the acronym was dropped.
Not sure which modules to use in your application? Check out CloudNativeJS.io Module Insights page for more information.
According to the npmjs.com website: “npm is the package manager for JavaScript”.
The more you use Node, the more you come to rely on the contributions of your fellow Node developers, who have contributed thousands of modules to the central registry at npmjs.com. To use these modules (also called packages) in your Node project, you install them (using npm install
) and then require()
them in whichever of your JavaScript programs needs them.
We will look at npm and managing your Node project’s dependencies in more detail in Units 7 and 8.
Video
Conclusion to Unit 3
In this tutorial you learned about:
- The architecture of Node.js and how it is comprised of:
- The Node runtime
- Userland
- You saw how the Node runtime is comprised of:
- The event loop (libuv)
- A Chrome V8 JavaScript engine
- You worked with the REPL
- You learned a little about npm
In Unit 4, you dive deeper into Node’s asynchronous programming style.
Test your understanding
Take this quiz to test your Node knowledge. Answers are below.
True or False
True/False: Node runs your JavaScript code in multiple threads to take advantage of the Event Loop’s thread pool.
True/False: JavaScript callbacks are invoked by the Chrome V8 engine running in multiple, parallel threads to achieve JavaScript scalability.
Choose the best answer
Chrome V8 is:
A – A refreshing vegetable drink enjoyed by Node developers
B – A type of combustion engine used by Node developers so they are never late to meetings
C – A high-performance JavaScript engine used by Node
D – None of the above
The Event Loop:
A – Enables the Node non-blocking I/O model
B – Has a single, multi-threaded phase: the I/O sub-loop
C – Consists of multiple phases during which callbacks may be invoked
D – A only
E – A and B
F – A and C
G – None of the above
The REPL:
A – Is not really Node, but a separate program that you can use to build prototypes
B – Stands for REPLicate, and is used to make copies of, and compile JavaScript for the V8 engine to run
C – Doesn’t mean anything, Steve, you are just making stuff up.
D – A console-based, interactive Node environment where you can run JavaScript code
E – A and B
F – C and D
G – None of the above
The Event Loop:
A – Runs forever until the V8 engine explicitly instructs it to exit
B – Runs as long as there are callbacks to invoke, then exits
C – Runs once and then sleeps for 10 seconds, wakes up and runs again, repeat forever
D – None of the above
Fill in the Blank
I/O that causes the V8 thread of execution to wait until the results are available is referred to as _ I/O.
I/O that provides the Event Loop with a callback it will invoke when the I/O operation is complete is referred to as _ I/O.
Choose the best answer from the following
A. Synchronous B. Asynchronous
Blocking I/O is commonly referred to as _ I/O.
Non-Blocking I/O is commonly referred to as __ I/O.
Callbacks are used to support the __ programming technique.
Study the following listing and provide your answer
00 // Top of file
01 var fs = require('fs');
02 var fileContents = fs.readFileSync('../data/50Words.txt', 'utf8');
03 console.log('Synchronous read finished.');
04 //
05 fs.readFile('../data/50Words.txt', 'utf8', function(err, fileContents) {
06 if (err) throw err;
07 console.log('File contents: ' + fileContents);
08 });
09 console.log('Program finished');
10 // Bottom of file
Which lines of code run on the JavaScript mainline? _
Which lines of code run on the Event Loop? _
Which lines of code are run by the V8 engine? _
Check your answers
Answers to true or false questions
False. Your JavaScript code runs in a single thread. Behind the scenes the requests your JavaScript makes of the Node API (and other C++ extensions) may be (but not necessarily) run in parallel by libuv.
False. JavaScript callbacks are invoked by the Event Loop, and run by the V8 engine.
Answers to multiple choice questions
C – Chrome V8 is a high-performance JavaScript engine used by Node. All of your JavaScript code is run by the Chrome V8 JavaScript engine.
F – The event loop is what enables Node’s non-blocking I/O model, and consists of multiple phases, during which callbacks are invoked (that is, if there are any to invoke).
D – The REPL stands for Read-Eval-Print-Loop and is a fully functional Node environment that is console-based and interactive. You can use the REPL to quickly test JavaScript syntax, build prototypes, load and execute JavaScript files, and lots more.
B – The Event Loop runs only while there is something for it to do: namely, to execute any callbacks you have given it to invoke. When there are no more callbacks to invoke, it exits.
Answers for fill in the blank questions
Blocking – The V8 thread must wait for the I/O operation to complete, and is blocked during that time, until the results are available.
Non-Blocking – By providing a callback function, the Java Script code receives the results later (asynchronously) when the I/O operation completes and the Event Loop invokes the callback function.
Answers to Choose One questions
A – Blocking I/O is synchronous, since it blocks the V8 engine from running other JavaScript code.
B – Non-Blocking I/O is asynchronous in nature, since it does not block the V8 engine from running other JavaScript code while the I/O is performed in the background.
B – Callbacks are a coding technique used to support asynchronous (non-blocking) I/O.
Answers to code questions
00 // Top of file
01 var fs = require('fs');
02 var fileContents = fs.readFileSync('../data/50Words.txt', 'utf8');
03 console.log('Synchronous read finished.');
04 //
05 fs.readFile('../data/50Words.txt', 'utf8', function(err, fileContents) {
06 if (err) throw err;
07 console.log('File contents: ' + fileContents);
08 });
09 console.log('Program finished');
10 // Bottom of file
Lines 1-5 and 9 – These lines of JavaScript (including the call to
fs.readFile()
) will be executed as soon as the file is loaded by Node.Lines 6-7 – These lines of code are in an anonymous callback that is executed by the poll phase of the Event Loop as soon as the file has been read.
Lines 1-9 – The V8 engine runs all of the JavaScript code in your Node application.