Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Other bundlers support #117

Open
ivan-collab-git opened this issue Sep 26, 2024 · 5 comments
Open

Other bundlers support #117

ivan-collab-git opened this issue Sep 26, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@ivan-collab-git
Copy link

I want to contribute with support for other bundlers. Right now i am working at a website which uses metro bundler. Which files in the project should i take into consideration for doing this? and would you consider this as something feasible to implement?

@ivan-collab-git ivan-collab-git added the enhancement New feature or request label Sep 26, 2024
@j4k0xb
Copy link
Owner

j4k0xb commented Sep 26, 2024

Hey, take a look at these files for reference:

and would you consider this as something feasible to implement?

Yeah the bundle format looks pretty straight-forward.

I suggest you look through https://github.com/j4k0xb/webcrack/blob/master/CONTRIBUTING.md first and try debugging to get a better idea on how the existing unpacker works (only reading the code may be a bit confusing).

Then start by creating the boilerplate files and use a { VariableDeclaration(path) { visitor in src/unpack/metro/index.ts to check if var __BUNDLE_START_TIME__ is present.
Highly recommend using @codemod/matchers to help with finding AST nodes.

The next easiest thing would be extracting the entry module id from __r(0); (could use path.getAllNextSiblings()).

Example bundle:

var __BUNDLE_START_TIME__ = this.nativePerformanceNow ? nativePerformanceNow() : Date.now();
var __DEV__ = false;
var process = this.process || {};
var __METRO_GLOBAL_PREFIX__ = "";
process.env = process.env || {};
process.env.NODE_ENV = process.env.NODE_ENV || "production";
(function (global) {
  // ...
})(typeof globalThis !== "undefined" ? globalThis : typeof global !== "undefined" ? global : typeof window !== "undefined" ? window : this);
__d(function (global, _$$_REQUIRE, _$$_IMPORT_DEFAULT, _$$_IMPORT_ALL, module, exports, _dependencyMap) {
  "use strict";

  const lib = _$$_REQUIRE(_dependencyMap[0]);
  console.log(lib.foo);
}, 0, [1]);
__d(function (global, _$$_REQUIRE, _$$_IMPORT_DEFAULT, _$$_IMPORT_ALL, module, exports, _dependencyMap) {
  "use strict";

  exports.foo = "bar";
}, 1, []);
__r(0);

@ivan-collab-git
Copy link
Author

great. ill give it a check!

@ivan-collab-git
Copy link
Author

Hey what's up. I did this notes the other day, and i was expecting to work on the implementation, but i have had little to no time, so in the mean time ill share this with you, so you can give me a sanity check. I tried to map the concepts from a post i red about the webpack require function, and i think it is pretty similar. Basically the same.

MetroRequire.<hash>.js

The runtime is usually called MetroRequire.<hash>.js and it defines these variables:

var __BUNDLE_START_TIME__ = this.nativePerformanceNow
    ? nativePerformanceNow()
    : Date.now(),
  __DEV__ = false,
  process = this.process || {};
process.env = process.env || {};
process.env.NODE_ENV = process.env.NODE_ENV || "production";
  • It then calls an IIFE, which accepts only one parameter, this parameter is meant to be the global object

    (function(e){ ... })(
    "undefined" != typeof globalThis
        ? globalThis
        : "undefined" != typeof global
        ? global
        : "undefined" != typeof window
            ? window
            : this,
    );
  • We will now define five fundamentals objects (which ill explain their functionality further on):

    • The global object. e, is the only parameter received by the IIFE

    • Modules. A loaded module is defined as an object which is stored in a cache object (the next fundamental object), this cache stores the module as:

      {
          dependencyMap: o,
          factory: e,
          hasError: !1,
          importedAll: r,
          importedDefault: r,
          isInitialized: !1,
          publicModule: { exports: {} },
      }

      In which:

      • factory is the module accessible as a function that receives a fixed number of parameters
      • dependecyMap is an array of modulesId that will be provided to module.factory as a parameter to be able to load files (more on this later)
      • publicModule has an exports property, which after the module's factory function has been called, it'll contain the exported objects of the module
    • The t object: It is basically a cache for modules that already have been loaded, it' ll store every module as an object mapped by its id as a key that can be accessed trough t[moduleId]. This object is accesible trough window, as it is defined as a global variable.

      • The function o (this method is defined as (e.__c = o)) is sort of a constructor function for the cache variable t, because it is defined as:
      function o() {
          return (t = Object.create(null));
      }
      • It creates an object with null as a prototype, assigns it to a global variable t, and returns t.
      • After the assignment of o to e.__c, the cache variable is defined calling this function var t = o();
    • The require function. The __r method is an object, also attached to the global object, which is assigned an i function (e.__r = i),, this function serves as a require function, because it'll load any module, given an id. It works as follows:

      • This i is defined as:
        function i(e) {
          const r = e,
          n = t[r]; // t is the cache object
          return n && n.isInitialized ? n.publicModule.exports : d(r, n);
      }

      For this section i will use r, as the module id and n as the module object, both initialy passed to the d(r,n) function in i() (the function above, the definition of the require function). If the module has already been initialized, then it returns its exports, if not, it then calls d(r,n), which does nothing but returning the returned value of a funtion call m(r,n) (and handle errors if any) that receives the exact same parameters as d().

      This m funciton is where the interesting stuff happens. It makes some checks on the n module, then it sets its propery n.isInitialized to true, and calls the module, eventually returning its exports. As shown below:

          n.isInitialized = !0;
          const { factory: c, dependencyMap: d } = n;
          try {
          const t = n.publicModule;
          return (
              (t.id = r),
              c(e, i, l, u, t, t.exports, d),
              (n.factory = void 0),
              (n.dependencyMap = void 0),
              t.exports
          );
          } catch (e) { ... }

      This is interesting because it calls the factory function (the c function on top), which is the module itself, this function receives 7 parameters making accesible certain objects to the module:

      1. The first parameter e is the global object,
      2. The second parameter, i is the __r function, which is the require function (checks if a module exists in the cache and returns its exports, if is not cached, it calls the module, caches it, and then returns its exports)
      3. the third parameter is a function that receives a module id, and returns its property importedDefault, which for what ive seen is a global variable, initialized an empty object {}, and for what iv've seen, it is not modified in this file.
      4. The fourth parameter, is similar to the third, a function that receives a module id and returns a property importedAll, which for every module, it is initially the same object as importedDefault.
      5. The fifth is the publicModule property of the module ( which is the exports )
      6. The sixth is the publicModule.exports property that is the exports, (is the exports property of the fifth parameter)
      7. The seventh is the dependencyMap property of the module.
    • The __d method. The __d method loads modules into the cache. Also attached to the global object the method __d is defined as:

      (e.__d = function (e, n, o) {
          if (null != t[n]) return;
          const i = {
              dependencyMap: o,
              factory: e,
              hasError: !1,
              importedAll: r,
              importedDefault: r,
              isInitialized: !1,
              publicModule: { exports: {} },
          };
          t[n] = i;
      }),
      • It receives three parameters, the first is the module source code, the second is the module Id, and the third is the require object, called dependencyMap. As an example of a call to this method is:
      __d(
          function (g, r, i, a, m, e, d) {
              "use strict";
              const t = r(d[0]).default || r(d[0]);
              let c;
              m.exports = () => c || ((c = t("locale")), c || "en");
          },
          "44cd5c",
          ["b2dff4"],
      );
      • In the __d method definition, we observe that it checks in t (cache) if the module id (second paramter) is an existing key, if it is, it returns (because is loaded already), if not, it assigns the object i into t[n] (where n is the module id), where i represents the loaded module in t.

asyncRequire.\<hash\>.js

  • The asyncRequire file, is apparently what loads lazy loaded files. Modules that call lazy loaded files, call this module as r(d[4])(d[3]) where 4 is the mapping inside the calling module of the id of the module of asyncRequire. So the function r(d[4]) is loading asyncRequire, and then it receives the parameter d[3] which points to the id of the lazy loaded file. In this file there is a function passed as a parameter to __d, in where a function T is defined, this function is then assigned to a method of the exports object ( the one returned by __r),called setData, this method it is called at the end of the file __r("057569").setData( .. ) and what it seems to do is:

    • Recieves three parameters, the first is a url to a CDN package repo, then the second is an array of js bundle names, and the third is an object that seems to map a sort of bundle id, to an array of indexes that point to an element of the second argument (that is, they point to a name of a js bundle file). What the function do, is assign to a global's variable property h[o] a mapped array of arrays, which contains the links of to which every
        function T(t, n, o) {
        Object.entries(o).forEach(([o, s]) => {
            const c = s.map((s) => {
            if (void 0 === n[s])
                throw new ReferenceError(
                `Bad async module data, cannot locate index ${s} in the bundleRequestPaths array for segmentId=${o}`,
                );
            return `${t}${n[s]}`;
            });
            h[o] = c;
        });
        }

So how all of this holds up together?

  • It all begins by running the IIFE at metroRequire file, which will define the cache t, the require function __r, and the __d method and attaches all of these into the global object. Then there are some bundles that call the __r function directly, this i think are the entrypoints these bundles are:
    • "routeHandler.hash.js"
    • "initializer.hash.js":
    • "coreV2.hash.js": It looks like it manages some different domains for translations to other languages and countries.
    • "shims_post_modules.hash.js": Doesn't load any module, it seems just to add some scrolling behaviour
    • "client.hash.js"
      (I need to see which one is the entry point, i'd guess all of them are entry points) from there a chain begins by loadind modules which are dependencies of the current module, if this modules are lazy loaded, then theyll call the module (__r("<asyncRequireModuleId>")("<lazyLoadedModuleId>")) defined in async require.

Translations

  • Other than the special bundles that call __r directly, other bundles call the same moduleId and then run the extends() method over the exports object that __r returns. This are bundles that need translations to another language:
    • When extends() is called after the __r return value, this is used to make translations, in this bundles, it uses the module id __r("a9f4b1") to load this translations, and all of the modules that need translation in that bundle, loads it.

@ivan-collab-git
Copy link
Author

ivan-collab-git commented Oct 2, 2024

this is the metroRequire file. The one that contains the runtime. I tallked about other files at the end of the notes, but this is the main one.

var __BUNDLE_START_TIME__ = this.nativePerformanceNow
    ? nativePerformanceNow()
    : Date.now(),
  __DEV__ = false,
  process = this.process || {};
process.env = process.env || {};
process.env.NODE_ENV = process.env.NODE_ENV || "production";
!(function (e) {
  "use strict";
  (e.__r = i),
    (e.__d = function (e, n, o) {
      if (null != t[n]) return;
      const i = {
        dependencyMap: o,
        factory: e,
        hasError: !1,
        importedAll: r,
        importedDefault: r,
        isInitialized: !1,
        publicModule: { exports: {} },
      };
      t[n] = i;
    }),
    (e.__c = o),
    (e.__registerSegment = function (e, r, n) {
      (p[e] = r),
        n &&
          n.forEach((r) => {
            t[r] || h.has(r) || h.set(r, e);
          });
    });
  var t = o();
  const r = {},
    { hasOwnProperty: n } = {};
  function o() {
    return (t = Object.create(null));
  }
  function i(e) {
    const r = e,
      n = t[r];
    return n && n.isInitialized ? n.publicModule.exports : d(r, n);
  }
  function l(e) {
    const n = e;
    if (t[n] && t[n].importedDefault !== r) return t[n].importedDefault;
    const o = i(n),
      l = o && o.__esModule ? o.default : o;
    return (t[n].importedDefault = l);
  }
  function u(e) {
    const o = e;
    if (t[o] && t[o].importedAll !== r) return t[o].importedAll;
    const l = i(o);
    let u;
    if (l && l.__esModule) u = l;
    else {
      if (((u = {}), l)) for (const e in l) n.call(l, e) && (u[e] = l[e]);
      u.default = l;
    }
    return (t[o].importedAll = u);
  }
  (i.importDefault = l), (i.importAll = u);
  let c = !1;
  function d(t, r) {
    if (!c && e.ErrorUtils) {
      let n;
      c = !0;
      try {
        n = m(t, r);
      } catch (t) {
        e.ErrorUtils.reportFatalError(t);
      }
      return (c = !1), n;
    }
    return m(t, r);
  }
  const s = 16,
    a = 65535;
  function f(e) {
    return { segmentId: e >>> s, localId: e & a };
  }
  (i.unpackModuleId = f),
    (i.packModuleId = function (e) {
      return (e.segmentId << s) + e.localId;
    });
  const p = [],
    h = new Map();
  function m(r, n) {
    if (!n && p.length > 0) {
      const e = h.get(r) ?? 0,
        o = p[e];
      null != o && (o(r), (n = t[r]), h.delete(r));
    }
    const o = e.nativeRequire;
    if (!n && o) {
      const { segmentId: e, localId: i } = f(r);
      o(i, e), (n = t[r]);
    }
    if (!n) throw g(r);
    if (n.hasError) throw w(r, n.error);
    n.isInitialized = !0;
    const { factory: c, dependencyMap: d } = n;
    try {
      const t = n.publicModule;
      return (
        (t.id = r),
        c(e, i, l, u, t, t.exports, d),
        (n.factory = void 0),
        (n.dependencyMap = void 0),
        t.exports
      );
    } catch (e) {
      throw (
        ((n.hasError = !0),
        (n.error = e),
        (n.isInitialized = !1),
        (n.publicModule.exports = void 0),
        e)
      );
    }
  }
  function g(e) {
    return Error('Requiring unknown module "' + e + '".');
  }
  function w(e, t) {
    return Error(
      'Requiring module "' + e + '", which threw an exception: ' + t,
    );
  }
})(
  "undefined" != typeof globalThis
    ? globalThis
    : "undefined" != typeof global
      ? global
      : "undefined" != typeof window
        ? window
        : this,
);

@j4k0xb
Copy link
Owner

j4k0xb commented Oct 2, 2024

seems right so far
for analyzing how the runtime works, its easier to create your own test bundles without minifying or to search in the source code of metro.
also helpful to compare it with other bundles:

supporting multi-file bundles is not gonna be easy but can be added later, so for now it would be enough to focus on the __d(...) calls of a single script.

example:

__d(
    function (g, r, i, a, m, e, d) {
        "use strict";
        const t = r(d[0]).default || r(d[0]);
        let c;
        m.exports = () => c || ((c = t("locale")), c || "en");
    },
    "44cd5c",
    ["b2dff4"],
);

to

"use strict";
const t = require("./b2dff4.js").default || require("./b2dff4.js");
let c;
module.exports = () => c || ((c = t("locale")), c || "en");

the third parameter is a function that receives a module id, and returns its property importedDefault, which for what ive seen is a global variable, initialized an empty object {}, and for what iv've seen, it is not modified in this file.
The fourth parameter, is similar to the third, a function that receives a module id and returns a property importedAll, which for every module, it is initially the same object as importedDefault.

they are used for import v from 'foo' and import * as w from 'bar'; in ESM

Then there are some bundles that call the __r function directly, this i think are the entrypoints

yes.
apparently there can even be multiple top-level __r calls: https://github.com/getsentry/sentry-cli/blob/844cee0d263204b0b2fb75688b58fd83f13b15b9/tests/integration/_fixtures/file-ram-bundle/index.android.bundle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants