Node.Js Architecture
Node, V8, libuv and C++
Node.Js architecture in terms of node's dependency. Node depends on couple of libraries in order to work properly.
So, Node run time has several dependencies and most important ones are the V8 engine and libuv.
Basically, Node.js is a javascript run time environment based on Google's V8 engine. If it was not V8 engine, Node cannot understand the javascript which we write and therefore V8 is fundamental in the Node architecture.
V8 engine converts javascript code to machine code that a computer can actually understand.
V8 engine is not enough to build a server side framework like Node. Therefore, we have libuv in Node.
libuv is an open source library with a strong focus on asynchronous IO(Input/output) .This layer gives access to the node for underlying computer operating system, filesystem and networking.
Besides this, libuv also implements two important features of Node.js, which are event loop
and thread pool
.
Event loop
is responsible to handle easy tasks like executing call backs and network IO while Thread pool
is responsible for heavy work like file access or compression.
libuv is completely written in C++ whereas V8 also uses C++ besides javascript. Therefore, Node itself is a program written in C++ in JavaScript. But Node.Js ties all libraries together whether written in C++ or JavaScript and gives developers access to their functions in pure javascript.
Therefore Node architecture allows us to write 100 percent javascript code, running in Node.Js and still access functions like file reading, which will be implemented in C++ in libuv.
Node not only rely on V8 and libuv, but also on
http-parser for parsing http,
c-ares for lookup (like DNS request stuff),
OpenSSL for keptography,
zlib for compression.
Processes, Threads and Thread pool
Node Process and Threads
When we use Node on a computer, that means that there is a node process on that computer. Process is just a program in execution.
In this process, Node.Js runs in a single thread and a thread is basically a sequence of instructions.
Node runs in just one thread, which makes it easy to block node applications.
When the program is initialized, all the top-level code is executed (code which is not inside any call back functions), all the modules that app requires, and then all the call backs registered and then event loop start running finally.
Initialize program | | Execute Top-level code | | Require modules | | Register event callbacks | | Event loop starts running
Event loop is where most of the work is done in the app. It is heart for entire node architecture.
Sometasks will be heavy to be executed in event loop as they could block the single thread. Thats when thread pool comes into action.
Thread pool is like event loop provided by libuv.
Thread pool gives 4 additional threads that are completely separate from the main single thread. We can configure thread pool to 128 additional threads but mostly provided 4 threads will be enough.
Event loop can automatically offload heavy tasks to the thread pool which conists of 4 additional threads.
The heavy tasks that do gets offloaded to thread pool are like :
- File system APIs
- Cryptography (hassing passwords)
- Compression
- DNS lookups.
Event loop
The event loop is where all the application code that is inside the callback functions is executed. Thus, all the code that is not top level code will run in the event loop.
Some heavy tasks will be offloaded to thread pool by event loop.
Node.Js is build around callback functions. Functions that are called as soon as some work is finished some time in future.
Node is an Event-driven Architecture.
The event loop receives events each time something important happens and will then call the necessary callbacks as we define in our code.
Event loop does the orchestration
, which means that it receives events, call their callback functions, and offloads the more heavy tasks to the thread pool.
Event loop in detail
When we start our node application, the event loop starts running. The event loop has multiple phases and each phase has a callback queue which are the callbacks coming from the events that the event loop receives.
The four important phase of Event loop:
- The first phase takes care of callbacks of expired timers.
Example: setTimeout() function. If there are callback functions from timers that just expired, these are the first ones to be processed by the event loop.
If the timer expires later during the time which one of the other phases are being processed, then the callback of the timer will only be called as soon as the event loop comes back to the first phase. - The second phase is I/O polling and execution of I/O callbacks.
polling means looking for new I/O events that are to be processed and putting them in callback queue. In node apps, IO meand networking and file accessing. - The third phase is setImmediate callbacks. setImmediate is a special type of timer that we use to process the callbacks immediately after the IO polling and execution phase.
- The fourth phase is close callbacks. In this phase all the close events are processed. For example when the web server or web sockets shut down.
Start | | Timers | | I/O Polling and callbacks | | setImmediate callbacks | | close callbacks
The two callback queues are process.nexttick()
queue and other microtasks
queue. if any of theses queues need to be executed, those will be executed right after the current phase without for entire loop to be finished.
Once event loop completes all the phase, it checks for any pending timers or I/O tasks. If there are no pending tasks, the event loop exists the application. If there are any pending tasks available, the event loop continues running and goes straight to the nexttick.
Thus, Event loop is the one which makes asynchronous possible in Node.Js.
Don't block the event loop
- Don't use sync versions of functions in fs, crypto and zlib modules in your callback functions.
- Don't perform complex calculations (loops inside loops)
- Be careful with JSON in large objects
- Don't use too complex regular expressions (eg: nested qualifiers)
Event-Driven architecture
Most of the modules like HTTP, file system and timers are built around event driven architecture.
In node, there are certain objects called event emitters, that emit named events as soon as something happens in the app, like a request hitting the server, or timer expiring, file finishing to read.
These events will be picked by the event listeners that we set up and which will fire off the callback functions that are attached to the listeners.
In short, Event emitters emit the events and these will be picked by event listners. Further Event listeners calls the callback functions attached to them.
For Exanple:
const server = http.createServer(); server.on('request', (req,res) => { console.log('Request Received'); res.end('Request Received'); });
In the above example, server.on method is the listener we created for the event 'request'.
When a new request is made, the server acts as a emitter and emits an event called 'request'. as we have a listener set up for this event, the listener will automaticall fire offs the callback function attached to it.
In the example, the callback function, send back some message back to the client.
*The event emitter and event listener logic is called OBSERVER PATTERN in the javascript programming in general.
OBSERVER PATTERN is designed to react rather than call.
Streams
Streams are used to process(read or write) data piece by piece(chunks), without completing the whole read or write operation, and therefore without keeping l the data in memory.
* Streams are perfect for handling arge volumes of data like videos.
* With streams, we can have more efficient data processing interms of memory(no need to keep all the data in memory) and time(we don't have to wait until all the data is available).
Node.JS Streams Fundamentals
Streams are also instances of eventEmitter class. That means all streams can emit and listen to named events.
There are four fundamental types of stream in Node.js * Readable streams * Writable streams * Duplex streams * Transform streams
Readable Streams
Streams from which we can read(consume) data.
Eg: http requests, fs read streams.
Important events:
data(when there is a piece of data to consume)
end(when there is no piece of data to consume)
Important Functions:
pipe() -- allows to plug streams together, to pass the data from one stream to another.
read()
Writable Streams
Streams to which we can write data.
Eg: http responses. fs write streams.
Important events:
drain
finish
Important Functions:
write()
end()
Duplex Streams
Streams that are both readable and writable.
Eg: Net web socket, a communication channel between clinet and server
Transform Streams
Duplex Streams that transform data as it is written or read.
Eg: zlib Gzip creation
NodeJs Modules
- In NodeJs Module system, each JavaScript file is treated as a separate module.
- Node.Js uses the Commom JS module system: require(),exports or module.exports.
- ES module system is used in browsers: import/export.
- There have been attempts to bring ES modules to node.js(.mjs), but these are not popular yet.
What happens when we require() a module
- The path to the required module is resolved and the file is loaded.
- Then the process called wrapping will be done.
- Then the module code is executed.
- Then module exports are returned.
- Finally, the entire module gets cached.
1. Resolving & Loading
We can load three different types of module:
* Core modules
* Developer modules
* 3rd-party modules (from NPM like express)
Path Resolving - How node decides which module to load'
1. Start with core modules.
2. If begins with './' or '../', try to load developer module.
3. If no file found, try to find folder with index.js in it.
4. Else, Go to node_modules/ and try to find module here.
If the file is not found any where, the error will be thrown and execution will be stopped.
2. Wrapping
After the module is loaded, the module code is wrapped into special function, which will give access to special objects.
One of the special objects is the require, from which we can import the modules. Each module will have this require object passed to the wrapper function.
Each module will have private scope and this will be acheived by wrapping code to special function.
wrapper fuynction will have the following objects:
* require - function to require modules.
* module - reference to the current module.
* exports - a reference to module.exports, used to export object from a module.
* __filename - absolute path of the current module's file.
* __dirname - diractory name of current module.
3. Execution
Code in the wrapper function is executed by Node.Js run time.
4. Returning Exports
* require function returns exports of the required module.
* module.exports is the returned object.
* use module.exports to export single variable like one class or one function.(module.exports = Calculator).
* Use exports to export multiple named variables.
5. Caching Modules are cached after they loaded for first time. If we want to execute the module next time, the results will be retreived fromn cache.