前言

由于近期需要基于OpenGL ES和Google的v8 engine开发Android平台的javascript游戏引擎,所以对v8的编译构建、嵌入和运行JS代码进行了较深入的研究和实践。目前在Github上已经放了一个简单的demo engine。不过在实现的过程中,出现一个严重的问题,就是多个游戏js文件(模块)相互require时,最初实现的require函数无法满足要求,因为不能解决循环依赖无限循环等问题。为了解决该问题,查阅了很多资料,最终决定参照NodeJS的require mechanism 和module system来实现游戏引擎内的require机制。要弄清楚NodeJS的模块加载和require机制,则必须从根本上分析NodeJS的启动流程和底层实现机制。因此,本文首先先对NodeJS的启动流程源码进行梳理和分析,然后再基于NodeJS的模块加载机制实现自己的require和模块加载机制。

NodeJS启动和加载分析

本文以目前NodeJS最新的源码的master分支(截止到2018.6.7日的commit,SHA1值为22c826f5aa3811e758686fd00a8fe15728f6fc37。即将NodeJS仓库从Github clone下来之后,在本地仓库目录下执行git checout 22c826f5aa3811e758686fd00a8fe15728f6fc37即可获取本文内容采用的源码版本)为基准,梳理NodeJS的启动流程和模块加载机制,即执行形如命令”node app.js”时的源码逻辑。

入口代码(node_main.cc)

1.node_main.cc:入口代码(这里是以Unix和Linux版本作为目标,Windows对应30行的int wmain(int argc, wchar_t* wargv[])),在NodeJS源码的src子目录下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
94     int main(int argc, char* argv[]) {
95 #if defined(__POSIX__) && defined(NODE_SHARED_MODE)
96 // In node::PlatformInit(), we squash all signal handlers for non-shared lib
97 // build. In order to run test cases against shared lib build, we also need
98 // to do the same thing for shared lib build here, but only for SIGPIPE for
99 // now. If node::PlatformInit() is moved to here, then this section could be
100 // removed.
101 {
102 struct sigaction act;
103 memset(&act, 0, sizeof(act));
104 act.sa_handler = SIG_IGN;
105 sigaction(SIGPIPE, &act, nullptr);
106 }
107 #endif
108
109 #if defined(__linux__)
110 char** envp = environ;
111 while (*envp++ != nullptr) {}
112 Elf_auxv_t* auxv = reinterpret_cast<Elf_auxv_t*>(envp);
113 for (; auxv->a_type != AT_NULL; auxv++) {
114 if (auxv->a_type == AT_SECURE) {
115 node::linux_at_secure = auxv->a_un.a_val;
116 break;
117 }
118 }
119 #endif
120 // Disable stdio buffering, it interacts poorly with printf()
121 // calls elsewhere in the program (e.g., any logging from V8.)
122 setvbuf(stdout, nullptr, _IONBF, 0);
123 setvbuf(stderr, nullptr, _IONBF, 0);
124 return node::Start(argc, argv);
125 }

可以看到,关键行在于124行。其中Start函数定义在node.cc里面。

启动主逻辑(node.cc)

2.node.cc:加载启动的关键逻辑所在,其在NodeJS源码的src子目录下。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
4180    int Start(int argc, char** argv) {
4181 atexit([] () { uv_tty_reset_mode(); });
4182 PlatformInit();
4183 performance::performance_node_start = PERFORMANCE_NOW();
4184
4185 CHECK_GT(argc, 0);
4186
4187 // Hack around with the argv pointer. Used for process.title = "blah".
4188 argv = uv_setup_args(argc, argv);
4189
4190 // This needs to run *before* V8::Initialize(). The const_cast is not
4191 // optional, in case you're wondering.
4192 int exec_argc;
4193 const char** exec_argv;
4194 Init(&argc, const_cast<const char**>(argv), &exec_argc, &exec_argv);
4195
4196 #if HAVE_OPENSSL
4197 {
4198 std::string extra_ca_certs;
4199 if (SafeGetenv("NODE_EXTRA_CA_CERTS", &extra_ca_certs))
4200 crypto::UseExtraCaCerts(extra_ca_certs);
4201 }
4202 #ifdef NODE_FIPS_MODE
4203 // In the case of FIPS builds we should make sure
4204 // the random source is properly initialized first.
4205 OPENSSL_init();
4206 #endif // NODE_FIPS_MODE
4207 // V8 on Windows doesn't have a good source of entropy. Seed it from
4208 // OpenSSL's pool.
4209 V8::SetEntropySource(crypto::EntropySource);
4210 #endif // HAVE_OPENSSL
4211
4212 v8_platform.Initialize(v8_thread_pool_size);
4213 V8::Initialize();
4214 performance::performance_v8_start = PERFORMANCE_NOW();
4215 v8_initialized = true;
4216 const int exit_code =
4217 Start(uv_default_loop(), argc, argv, exec_argc, exec_argv);
4218 v8_platform.StopTracingAgent();
4219 v8_initialized = false;
4220 V8::Dispose();
4221
4222 // uv_run cannot be called from the time before the beforeExit callback
4223 // runs until the program exits unless the event loop has any referenced
4224 // handles after beforeExit terminates. This prevents unrefed timers
4225 // that happen to terminate during shutdown from being run unsafely.
4226 // Since uv_run cannot be called, uv_async handles held by the platform
4227 // will never be fully cleaned up.
4228 v8_platform.Dispose();
4229
4230 delete[] exec_argv;
4231 exec_argv = nullptr;
4232
4233 return exit_code;
4234 }

其中,我们可以清楚地看到,NodeJS对v8 engine进行了初始化,然后在4217行关键代码,跳转进入到NodeJS的封装的事件循环之中(NodeJS是基于事件循环的单线程JS执行环境)。Start(uv_default_loop(), argc, argv, exec_argc, exec_argv);则定义在node.cc的4135行,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
4135    inline int Start(uv_loop_t* event_loop,
4136 int argc, const char* const* argv,
4137 int exec_argc, const char* const* exec_argv)
{

4138 std::unique_ptr<ArrayBufferAllocator, decltype(&FreeArrayBufferAllocator)>
4139 allocator(CreateArrayBufferAllocator(), &FreeArrayBufferAllocator);
4140 Isolate* const isolate = NewIsolate(allocator.get());
4141 if (isolate == nullptr)
4142 return 12; // Signal internal error.
4143
4144 {
4145 Mutex::ScopedLock scoped_lock(node_isolate_mutex);
4146 CHECK_NULL(node_isolate);
4147 node_isolate = isolate;
4148 }
4149
4150 int exit_code;
4151 {
4152 Locker locker(isolate);
4153 Isolate::Scope isolate_scope(isolate);
4154 HandleScope handle_scope(isolate);
4155 std::unique_ptr<IsolateData, decltype(&FreeIsolateData)> isolate_data(
4156 CreateIsolateData(
4157 isolate,
4158 event_loop,
4159 v8_platform.Platform(),
4160 allocator.get()),
4161 &FreeIsolateData);
4162 if (track_heap_objects) {
4163 isolate->GetHeapProfiler()->StartTrackingHeapObjects(true);
4164 }
4165 exit_code =
4166 Start(isolate, isolate_data.get(), argc, argv, exec_argc, exec_argv);
4167 }
4168
4169 {
4170 Mutex::ScopedLock scoped_lock(node_isolate_mutex);
4171 CHECK_EQ(node_isolate, isolate);
4172 node_isolate = nullptr;
4173 }
4174
4175 isolate->Dispose();
4176
4177 return exit_code;
4178 }

而该函数的关键点则在4166行Start(isolate, isolate_data.get(), argc, argv, exec_argc, exec_argv);。这个函数定义在node.cc的4029行,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
4029    inline int Start(Isolate* isolate, IsolateData* isolate_data,
4030 int argc, const char* const* argv,
4031 int exec_argc, const char* const* exec_argv) {
4032 HandleScope handle_scope(isolate);
4033 Local<Context> context = NewContext(isolate);
4034 Context::Scope context_scope(context);
4035 Environment env(isolate_data, context, v8_platform.GetTracingAgent());
4036 env.Start(argc, argv, exec_argc, exec_argv, v8_is_profiling);
4037
4038 TRACE_EVENT_METADATA1("__metadata", "version", "node", NODE_VERSION_STRING);
4039 TRACE_EVENT_METADATA1("__metadata", "thread_name", "name",
4040 "JavaScriptMainThread");
4041
4042 const char* path = argc > 1 ? argv[1] : nullptr;
4043 StartInspector(&env, path, debug_options);
4044
4045 if (debug_options.inspector_enabled() && !v8_platform.InspectorStarted(&env))
4046 return 12; // Signal internal error.
4047
4048 env.set_abort_on_uncaught_exception(abort_on_uncaught_exception);
4049
4050 if (no_force_async_hooks_checks) {
4051 env.async_hooks()->no_force_checks();
4052 }
4053
4054 {
4055 Environment::AsyncCallbackScope callback_scope(&env);
4056 env.async_hooks()->push_async_ids(1, 0);
4057 LoadEnvironment(&env);
4058 env.async_hooks()->pop_async_id(1);
4059 }
4060
4061 env.set_trace_sync_io(trace_sync_io);
4062
4063 {
4064 SealHandleScope seal(isolate);
4065 bool more;
4066 env.performance_state()->Mark(
4067 node::performance::NODE_PERFORMANCE_MILESTONE_LOOP_START);
4068 do {
4069 uv_run(env.event_loop(), UV_RUN_DEFAULT);
4070
4071 v8_platform.DrainVMTasks(isolate);
4072
4073 more = uv_loop_alive(env.event_loop());
4074 if (more)
4075 continue;
4076
4077 RunBeforeExit(&env);
4078
4079 // Emit `beforeExit` if the loop became alive either after emitting
4080 // event, or after running some callbacks.
4081 more = uv_loop_alive(env.event_loop());
4082 } while (more == true);
4083 env.performance_state()->Mark(
4084 node::performance::NODE_PERFORMANCE_MILESTONE_LOOP_EXIT);
4085 }
4086
4087 env.set_trace_sync_io(false);
4088
4089 const int exit_code = EmitExit(&env);
4090
4091 WaitForInspectorDisconnect(&env);
4092
4093 env.set_can_call_into_js(false);
4094 env.stop_sub_worker_contexts();
4095 uv_tty_reset_mode();
4096 env.RunCleanup();
4097 RunAtExit(&env);
4098
4099 v8_platform.DrainVMTasks(isolate);
4100 v8_platform.CancelVMTasks(isolate);
4101 #if defined(LEAK_SANITIZER)
4102 __lsan_do_leak_check();
4103 #endif
4104
4105 return exit_code;
4106 }

其中,4036行env.Start(argc, argv, exec_argc, exec_argv, v8_is_profiling);进行启动环境的设置和检查,比如输入node --help时打印出命令帮助信息;而真正的核心启动逻辑则在4057行LoadEnvironment(&env);,该函数也定义在node.cc,在2870行,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
2870    void LoadEnvironment(Environment* env) {
2871 HandleScope handle_scope(env->isolate());
2872
2873 TryCatch try_catch(env->isolate());
2874 // Disable verbose mode to stop FatalException() handler from trying
2875 // to handle the exception. Errors this early in the start-up phase
2876 // are not safe to ignore.
2877 try_catch.SetVerbose(false);
2878
2879 // The bootstrapper scripts are lib/internal/bootstrap/loaders.js and
2880 // lib/internal/bootstrap/node.js, each included as a static C string
2881 // defined in node_javascript.h, generated in node_javascript.cc by
2882 // node_js2c.
2883 Local<String> loaders_name =
2884 FIXED_ONE_BYTE_STRING(env->isolate(), "internal/bootstrap/loaders.js");
2885 MaybeLocal<Function> loaders_bootstrapper =
2886 GetBootstrapper(env, LoadersBootstrapperSource(env), loaders_name);
2887 Local<String> node_name =
2888 FIXED_ONE_BYTE_STRING(env->isolate(), "internal/bootstrap/node.js");
2889 MaybeLocal<Function> node_bootstrapper =
2890 GetBootstrapper(env, NodeBootstrapperSource(env), node_name);
2891
2892 if (loaders_bootstrapper.IsEmpty() || node_bootstrapper.IsEmpty()) {
2893 // Execution was interrupted.
2894 return;
2895 }
2896
2897 // Add a reference to the global object
2898 Local<Object> global = env->context()->Global();
2899
2900 #if defined HAVE_DTRACE || defined HAVE_ETW
2901 InitDTrace(env, global);
2902 #endif
2903
2904 #if defined HAVE_PERFCTR
2905 InitPerfCounters(env, global);
2906 #endif
2907
2908 // Enable handling of uncaught exceptions
2909 // (FatalException(), break on uncaught exception in debugger)
2910 //
2911 // This is not strictly necessary since it's almost impossible
2912 // to attach the debugger fast enough to break on exception
2913 // thrown during process startup.
2914 try_catch.SetVerbose(true);
2915
2916 env->SetMethod(env->process_object(), "_rawDebug", RawDebug);
2917
2918 // Expose the global object as a property on itself
2919 // (Allows you to set stuff on `global` from anywhere in JavaScript.)
2920 global->Set(FIXED_ONE_BYTE_STRING(env->isolate(), "global"), global);
2921
2922 // Create binding loaders
2923 v8::Local<v8::Function> get_binding_fn =
2924 env->NewFunctionTemplate(GetBinding)->GetFunction(env->context())
2925 .ToLocalChecked();
2926
2927 v8::Local<v8::Function> get_linked_binding_fn =
2928 env->NewFunctionTemplate(GetLinkedBinding)->GetFunction(env->context())
2929 .ToLocalChecked();
2930
2931 v8::Local<v8::Function> get_internal_binding_fn =
2932 env->NewFunctionTemplate(GetInternalBinding)->GetFunction(env->context())
2933 .ToLocalChecked();
2934
2935 Local<Value> loaders_bootstrapper_args[] = {
2936 env->process_object(),
2937 get_binding_fn,
2938 get_linked_binding_fn,
2939 get_internal_binding_fn
2940 };
2941
2942 // Bootstrap internal loaders
2943 Local<Value> bootstrapped_loaders;
2944 if (!ExecuteBootstrapper(env, loaders_bootstrapper.ToLocalChecked(),
2945 arraysize(loaders_bootstrapper_args),
2946 loaders_bootstrapper_args,
2947 &bootstrapped_loaders)) {
2948 return;
2949 }
2950
2951 // Bootstrap Node.js
2952 Local<Object> bootstrapper = Object::New(env->isolate());
2953 SetupBootstrapObject(env, bootstrapper);
2954 Local<Value> bootstrapped_node;
2955 Local<Value> node_bootstrapper_args[] = {
2956 env->process_object(),
2957 bootstrapper,
2958 bootstrapped_loaders
2959 };
2960 if (!ExecuteBootstrapper(env, node_bootstrapper.ToLocalChecked(),
2961 arraysize(node_bootstrapper_args),
2962 node_bootstrapper_args,
2963 &bootstrapped_node)) {
2964 return;
2965 }
2966 }

该函数比较长,但核心逻辑有如下几个:

  • a.将js编写的启动引导程序(NodeJS源码中的lib/internal/bootstrap/loaders.js和lib/internal/bootstrap/node.js,)加载到执行环境(需要执行程序其实C++编写,并且js执行是依赖于v8 engine,这里需要将js文件定义的函数加载并通过v8 engine来执行。注:JS的函数在v8 engine中对应的定义是Local<Function>类型,然后可以调用其Call方法执行该函数)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    2879      // The bootstrapper scripts are lib/internal/bootstrap/loaders.js and
    2880 // lib/internal/bootstrap/node.js, each included as a static C string
    2881 // defined in node_javascript.h, generated in node_javascript.cc by
    2882 // node_js2c.
    2883 Local<String> loaders_name =
    2884 FIXED_ONE_BYTE_STRING(env->isolate(), "internal/bootstrap/loaders.js");
    2885 MaybeLocal<Function> loaders_bootstrapper =
    2886 GetBootstrapper(env, LoadersBootstrapperSource(env), loaders_name);
    2887 Local<String> node_name =
    2888 FIXED_ONE_BYTE_STRING(env->isolate(), "internal/bootstrap/node.js");
    2889 MaybeLocal<Function> node_bootstrapper =
    2890 GetBootstrapper(env, NodeBootstrapperSource(env), node_name);
    2891
    2892 if (loaders_bootstrapper.IsEmpty() || node_bootstrapper.IsEmpty()) {
    2893 // Execution was interrupted.
    2894 return;
    2895 }
  • b.将上个环节加载的引导程序函数利用v8 engine执行(2944行和2960行):

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    2942      // Bootstrap internal loaders
    2943 Local<Value> bootstrapped_loaders;
    2944 if (!ExecuteBootstrapper(env, loaders_bootstrapper.ToLocalChecked(),
    2945 arraysize(loaders_bootstrapper_args),
    2946 loaders_bootstrapper_args,
    2947 &bootstrapped_loaders)) {
    2948 return;
    2949 }
    2950
    2951 // Bootstrap Node.js
    2952 Local<Object> bootstrapper = Object::New(env->isolate());
    2953 SetupBootstrapObject(env, bootstrapper);
    2954 Local<Value> bootstrapped_node;
    2955 Local<Value> node_bootstrapper_args[] = {
    2956 env->process_object(),
    2957 bootstrapper,
    2958 bootstrapped_loaders
    2959 };
    2960 if (!ExecuteBootstrapper(env, node_bootstrapper.ToLocalChecked(),
    2961 arraysize(node_bootstrapper_args),
    2962 node_bootstrapper_args,
    2963 &bootstrapped_node)) {
    2964 return;
    2965 }

ExecuteBootstrapper函数定义在node.cc的2849行(非常简单,其实就是将传入的函数Local<Function> bootstrapper,调用其Call方法执行),如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2849    static bool ExecuteBootstrapper(Environment* env, Local<Function> bootstrapper,
2850 int argc, Local<Value> argv[],
2851 Local<Value>* out)
{

2852 bool ret = bootstrapper->Call(
2853 env->context(), Null(env->isolate()), argc, argv).ToLocal(out);
2854
2855 // If there was an error during bootstrap then it was either handled by the
2856 // FatalException handler or it's unrecoverable (e.g. max call stack
2857 // exceeded). Either way, clear the stack so that the AsyncCallbackScope
2858 // destructor doesn't fail on the id check.
2859 // There are only two ways to have a stack size > 1: 1) the user manually
2860 // called MakeCallback or 2) user awaited during bootstrap, which triggered
2861 // _tickCallback().
2862 if (!ret) {
2863 env->async_hooks()->clear_async_id_stack();
2864 }
2865
2866 return ret;
2867 }

JS原生模块加载和模块机制

3.关于lib/internal/bootstrap/loaders.jslib/internal/bootstrap/node.js的作用,这两个文件的源码开头的文档说明非常清楚,也对于理解NodeJS底层的module system加载机制非常有帮助。这里将lib/internal/bootstrap/loaders.js的文档说明贴出来,以便查看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// This file creates the internal module & binding loaders used by built-in
// modules. In contrast, user land modules are loaded using
// lib/internal/modules/cjs/loader.js (CommonJS Modules) or
// lib/internal/modules/esm/* (ES Modules).
//
// This file is compiled and run by node.cc before bootstrap/node.js
// was called, therefore the loaders are bootstraped before we start to
// actually bootstrap Node.js. It creates the following objects:
//
// C++ binding loaders:
// - process.binding(): the legacy C++ binding loader, accessible from user land
// because it is an object attached to the global process object.
// These C++ bindings are created using NODE_BUILTIN_MODULE_CONTEXT_AWARE()
// and have their nm_flags set to NM_F_BUILTIN. We do not make any guarantees
// about the stability of these bindings, but still have to take care of
// compatibility issues caused by them from time to time.
// - process._linkedBinding(): intended to be used by embedders to add
// additional C++ bindings in their applications. These C++ bindings
// can be created using NODE_MODULE_CONTEXT_AWARE_CPP() with the flag
// NM_F_LINKED.
// - internalBinding(): the private internal C++ binding loader, inaccessible
// from user land because they are only available from NativeModule.require()
// These C++ bindings are created using NODE_MODULE_CONTEXT_AWARE_INTERNAL()
// and have their nm_flags set to NM_F_INTERNAL.
//
// Internal JavaScript module loader:
// - NativeModule: a minimal module system used to load the JavaScript core
// modules found in lib/**/*.js and deps/**/*.js. All core modules are
// compiled into the node binary via node_javascript.cc generated by js2c.py,
// so they can be loaded faster without the cost of I/O. This class makes the
// lib/internal/*, deps/internal/* modules and internalBinding() available by
// default to core modules, and lets the core modules require itself via
// require('internal/bootstrap/loaders') even when this file is not written in
// CommonJS style.
//
// Other objects:
// - process.moduleLoadList: an array recording the bindings and the modules
// loaded in the process and the order in which they are loaded.

至此,我们可以很清楚,lib/internal/bootstrap/loaders.jslib/internal/bootstrap/node.js两个文件包裹的函数其实就是整个启动流程的核心和执行逻辑所在。
lib/internal/bootstrap/loaders.js的代码折叠之后,如下图所示:

lib/internal/bootstrap/node.js的代码折叠之后,如下图所示:

这两个文件的内部逻辑都被一个顶层函数包裹住,在环节a中(2879-2895行),就是将这两个顶层包裹的JS function加载映射到v8 engine的映射函数Local<Function> loaders_bootstrapper.ToLocalChecked()node_bootstrapper.ToLocalChecked()来执行。lib/internal/bootstrap/loaders.js的内部逻辑,主要是将NodeJS的C++原生代码的一些类和函数绑定到JS执行环境(v8 engine负责解释执行),另外就是创建NativeModule模块并加载NodeJS源码lib子目录下的JS原生模块(NodeJS源码中lib子目录的原始JS模块也引用了require函数,这个require函数是在这里实现的,而用户编写的JS文件即用户模块是通过lib/internal/modules/cjs/loader.js中创建的Module及其机制来load的,对于用户JS文件里面引用的require函数,其实并非global全局的,是属于该用户模块对象自身包含的require函数。这里一定要区分清楚两种require函数和机制,下文重点会分析用户JS文件所使用到的require函数,以及如何模仿之来实现类似的require机制,以用于基于v8 engine开发其他C++程序时使用的独立的module system和加载机制)。此外,lib/internal/bootstrap/loaders.js还会导出{ internalBinding, NativeModule }(参见lib/internal/bootstrap/loaders.js的127行和303行)对象给lib/internal/bootstrap/node.js定义的函数使用(通过传参的方式,可以看截图中的函数参数)。lib/internal/bootstrap/node.js定义的函数很长,但是真正执行核心逻辑在243-268行之间

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
243    } else if (process.argv[1] && process.argv[1] !== '-') {
244 perf.markMilestone(NODE_PERFORMANCE_MILESTONE_MODULE_LOAD_START);
245 // Make process.argv[1] into a full path.
246 const path = NativeModule.require('path');
247 process.argv[1] = path.resolve(process.argv[1]);
248
249 const CJSModule = NativeModule.require('internal/modules/cjs/loader');
250
251 perf.markMilestone(NODE_PERFORMANCE_MILESTONE_MODULE_LOAD_END);
252 perf.markMilestone(
253 NODE_PERFORMANCE_MILESTONE_PRELOAD_MODULE_LOAD_START);
254 preloadModules();
255 perf.markMilestone(
256 NODE_PERFORMANCE_MILESTONE_PRELOAD_MODULE_LOAD_END);
257 // Check if user passed `-c` or `--check` arguments to Node.
258 if (process._syntax_check_only != null) {
259 const fs = NativeModule.require('fs');
260 // Read the source.
261 const filename = CJSModule._resolveFilename(process.argv[1]);
262 const source = fs.readFileSync(filename, 'utf-8');
263 checkScriptSyntax(source, filename);
264 process.exit(0);
265 }
266 perf.markMilestone(NODE_PERFORMANCE_MILESTONE_BOOTSTRAP_COMPLETE);
267 CJSModule.runMain();
268 } else {

其中,process.argv[1]代表执行命令中的脚本文件(比如,你在命令行输入node a.js那么这里process.argv[1]就是a.js)。249行,通过lib/internal/bootstrap/loaders.js加载执行得到的NativeModule将NodeJS 源码中的lib子目录下lib/internal/modules/cjs/loader.js加载成CJSModule,这个缩写其实表明目前NodeJS用户编写的JS文件都遵循CommonJS规范,那么也就是认为用户编写的每个JS文件就是一个CommonJS module(模块)。在lib/internal/modules/cjs/loader.js中,关于用户的CommonJS module定义如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
52    module.exports = Module;
.
.
.
102 function Module(id, parent) {
103 this.id = id;
104 this.exports = {};
105 this.parent = parent;
106 updateChildren(parent, this, false);
107 this.filename = null;
108 this.loaded = false;
109 this.children = [];
110 }

在上面提到的lib/internal/bootstrap/node.js的267行CJSModule.runMain();,执行用户模块,从而进入用户程序的逻辑执行。runMain()函数定义在lib/internal/modules/cjs/loader.js的741行,如下:

1
2
3
4
5
6
7
741    // bootstrap main module.
742 Module.runMain = function() {
743 // Load the main module--the command line argument.
744 Module._load(process.argv[1], null, true);
745 // Handle any nextTicks added in the first tick of the program
746 process._tickCallback();
747 };

至此,则进入到Module._load函数。由刚才的分析得知,NodeJS中,用户编写的每个JS文件对应一个module,每个module都是function Module(id, parent)(可以理解为function obj,因为Javascript是class-free的面向对象编程语言)的一个instance obj(示例对象)。关于Javascript语言中对象的理解,请参见下文(摘自v8 engine的嵌入文档):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
JavaScript is a class-free, object-oriented language, and as such, it uses prototypal inheritance instead of classical inheritance. This can be puzzling to programmers trained in conventional object-oriented languages like C++ and Java.

Class-based object-oriented languages, such as Java and C++, are founded on the concept of two distinct entities: classes and instances. JavaScript is a prototype-based language and so does not make this distinction: it simply has objects. JavaScript does not natively support the declaration of class hierarchies; however, JavaScript's prototype mechanism simplifies the process of adding custom properties and methods to all instances of an object. In JavaScript, you can add custom properties to objects. For example:

// Create an object "bicycle"
function bicycle(){
}
// Create an instance of bicycle called roadbike
var roadbike = new bicycle()
// Define a custom property, wheels, on roadbike
roadbike.wheels = 2
A custom property added this way only exists for that instance of the object. If we create another instance of bicycle(), called mountainbike for example, mountainbike.wheels would return undefined unless the wheels property is explicitly added.

Sometimes this is exactly what is required, at other times it would be helpful to add the custom property to all instances of an object - all bicycles have wheels after all. This is where the prototype object of JavaScript is very useful. To use the prototype object, reference the keyword prototype on the object before adding the custom property to it as follows:

// First, create the "bicycle" object
function bicycle(){
}
// Assign the wheels property to the object's prototype
bicycle.prototype.wheels = 2
All instances of bicycle() will now have the wheels property prebuilt into them.

javascript代码的真正入口

4.lib/internal/modules/cjs/loader.js:在这里面,通过Module._load作为进入用户程序的入口:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
495    // Check the cache for the requested file.
496 // 1. If a module already exists in the cache: return its exports object.
497 // 2. If the module is native: call `NativeModule.require()` with the
498 // filename and return the result.
499 // 3. Otherwise, create a new module for the file and save it to the cache.
500 // Then have it load the file contents before returning its exports
501 // object.
502 Module._load = function(request, parent, isMain) {
503 if (parent) {
504 debug('Module._load REQUEST %s parent: %s', request, parent.id);
505 }
506
507 if (experimentalModules && isMain) {
508 if (asyncESM === undefined) lazyLoadESM();
509 asyncESM.loaderPromise.then((loader) => {
510 return loader.import(getURLFromFilePath(request).pathname);
511 })
512 .catch((e) => {
513 decorateErrorStack(e);
514 console.error(e);
515 process.exit(1);
516 });
517 return;
518 }
519
520 var filename = Module._resolveFilename(request, parent, isMain);
521
522 var cachedModule = Module._cache[filename];
523 if (cachedModule) {
524 updateChildren(parent, cachedModule, true);
525 return cachedModule.exports;
526 }
527
528 if (NativeModule.nonInternalExists(filename)) {
529 debug('load native module %s', request);
530 return NativeModule.require(filename);
531 }
532
533 // Don't call updateChildren(), Module constructor already does.
534 var module = new Module(filename, parent);
535
536 if (isMain) {
537 process.mainModule = module;
538 module.id = '.';
539 }
540
541 Module._cache[filename] = module;
542
543 tryModuleLoad(module, filename);
544
545 return module.exports;
546 };

关于Module._load内部模块缓存逻辑这一块,我们后续再阐述。该函数最终跳转至543行的tryModuleLoad(module, filename);

1
2
3
4
5
6
7
8
9
10
11
548    function tryModuleLoad(module, filename) {
549 var threw = true;
550 try {
551 module.load(filename);
552 threw = false;
553 } finally {
554 if (threw) {
555 delete Module._cache[filename];
556 }
557 }
558 }

跟踪到551行的module.load(filename);,该函数实际上定义在603行,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
602    // Given a file name, pass it to the proper extension handler.
603 Module.prototype.load = function(filename) {
604 debug('load %j for module %j', filename, this.id);
605
606 assert(!this.loaded);
607 this.filename = filename;
608 this.paths = Module._nodeModulePaths(path.dirname(filename));
609
610 var extension = path.extname(filename) || '.js';
611 if (!Module._extensions[extension]) extension = '.js';
612 Module._extensions[extension](this, filename);
613 this.loaded = true;
614
615 if (experimentalModules) {
616 if (asyncESM === undefined) lazyLoadESM();
617 const ESMLoader = asyncESM.ESMLoader;
618 const url = getURLFromFilePath(filename);
619 const urlString = `${url}`;
620 const exports = this.exports;
621 if (ESMLoader.moduleMap.has(urlString) !== true) {
622 ESMLoader.moduleMap.set(
623 urlString,
624 new ModuleJob(ESMLoader, url, async () => {
625 const ctx = createDynamicModule(
626 ['default'], url);
627 ctx.reflect.exports.default.set(exports);
628 return ctx;
629 })
630 );
631 } else {
632 const job = ESMLoader.moduleMap.get(urlString);
633 if (job.reflect)
634 job.reflect.exports.default.set(exports);
635 }
636 }
637 };

因为我们这里暂时只关注.js结尾的文件即javascript源代码默认文件格式文件。那么,612行Module._extensions[extension](this, filename);则相当于是执行Module._extensions['.js'](this, filename);,而Module._extensions['.js']是一个函数,定义如下:

1
2
3
4
5
710    // Native extension for .js
711 Module._extensions['.js'] = function(module, filename) {
712 var content = fs.readFileSync(filename, 'utf8');
713 module._compile(stripBOM(content), filename);
714 };

这里可以发现,实则跳转至module._compile函数,该函数定义如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
659    // Run the file contents in the correct scope or sandbox. Expose
660 // the correct helper variables (require, module, exports) to
661 // the file.
662 // Returns exception, if any.
663 Module.prototype._compile = function(content, filename) {
664
665 content = stripShebang(content);
666
667 // create wrapper function
668 var wrapper = Module.wrap(content);
669
670 var compiledWrapper = vm.runInThisContext(wrapper, {
671 filename: filename,
672 lineOffset: 0,
673 displayErrors: true
674 });
675
676 var inspectorWrapper = null;
677 if (process._breakFirstLine && process._eval == null) {
678 if (!resolvedArgv) {
679 // we enter the repl if we're not given a filename argument.
680 if (process.argv[1]) {
681 resolvedArgv = Module._resolveFilename(process.argv[1], null, false);
682 } else {
683 resolvedArgv = 'repl';
684 }
685 }
686
687 // Set breakpoint on module start
688 if (filename === resolvedArgv) {
689 delete process._breakFirstLine;
690 inspectorWrapper = process.binding('inspector').callAndPauseOnStart;
691 }
692 }
693 var dirname = path.dirname(filename);
694 var require = makeRequireFunction(this);
695 var depth = requireDepth;
696 if (depth === 0) stat.cache = new Map();
697 var result;
698 if (inspectorWrapper) {
699 result = inspectorWrapper(compiledWrapper, this.exports, this.exports,
700 require, this, filename, dirname);
701 } else {
702 result = compiledWrapper.call(this.exports, this.exports, require, this,
703 filename, dirname);
704 }
705 if (depth === 0) stat.cache = null;
706 return result;
707 };

从该函数的参数和文档注释可以得知,其将传入的原始js文件的内容,先通过668行的Module.wrap函数进行包裹。

1
2
3
4
5
6
7
8
124    Module.wrap = function(script) {
125 return Module.wrapper[0] + script + Module.wrapper[1];
126 };
127
128 Module.wrapper = [
129 '(function (exports, require, module, __filename, __dirname) { ',
130 '\n});'
131 ];

这样就是众多文献资料提到的,NodeJS在底层(under the hood)会将用户编写的每一个JS文件源码包裹在一个函数内部(如上代码所示)。包裹完成之后,通过670行的编译函数,将字符串格式的源码编译(当然也是v8 engine来负责)成目标代码(v8 engine可解释执行)并赋值给变量compiledWrapper,这里我们不考虑inspector机制(用于调试等功能)。那么真正执行代码的逻辑就在702行compiledWrapper.call,传入的后5个参数,依次对应包裹时添加的外层函数function (exports, require, module, __filename, __dirname)的每个参数。其实,本质就是调用该函数执行。这里的vm及其函数runInThisContext在NodeJS的build-in(内置)JS模块vm.js(位于NodeJS源码下的lib/vm.js)中定义。

模仿实现NodeJS的require和module system(module加载机制)

思路:

  1. 那么我们就可以完全模仿实现我们自己的module system。这里思路目前想到两种:1.我们直接参考他的lib\internal\modules\cjs\loader.js中的逻辑,我们在我们的脚本根目录建一个module.js,把这个文件的内容全部copy进去,然后进行适当修改,保留大部分逻辑,然后在v8初始化的时候,在C++层事先加载这个module.js到global全局,这样游戏相关的其他任何js源文件仍然也被包裹在function (exports, require, module, __filename, __dirname)函数内部,然后我们加载每一个游戏的js源文件的时候,利用module.js里面的函数来load的,并传参,这样就和NodeJS的执行流程完全一致了,其require和模块加载机制也是一样的。

  2. 在C++层来实现这个module.js,也就是把lib\internal\modules\cjs\loader.js中的逻辑直接在C++这层来写个类或者多个函数来实现,然后每个游戏相关的js文件,仍然会在加载的时候被function (exports, require, module, __filename, __dirname)函数包裹,然后执行的时候,就从C++层将参数传进去,这些参数就需要C++的对象和JS的参数对象做映射,这个我们之前就已经实现了(请参见V8Android)。

实现:
目前已经实现了仿NodeJS的require mechanism和module system。可以解决循环依赖无限循环的bug。请参见V8Android。欢迎交流讨论(我的邮箱cstsinghua@126.com,QQ:476590410,微信:cstsinghua)。

附录(参考)