Lua Module Function Critiqued

lua-users home
wiki

The argument presented here is that the Lua 5.1 module function [1] has design flaws that encourage poor practices in module design, potentially leading to code bugs and ambiguities through side-effects in global variables, and this function should be avoided. It is the hope that this article will further deter the use of the module function and that this function would be either removed or improved upon in a future version of Lua.

(It is acknowledged that there are proponents of this view, as well as detractors and those indifferent -- e.g. in the thread [15].)

Before detailing the perils of the module function, we'll note that the choice of whether or not to use the module function is more than just a personal choice, but it affects other authors. It is quite easy for a Lua module author to avoid writing module calls. Indeed, this function is never required for defining modules, as it is just a simple helper function that wraps common behaviors that themselves are required by neither Lua nor the other much more useful parts of the Lua 5.1 module system such as require. [*A] However, since modules often use other modules written by other authors who themselves might have used the module function, and the module function causes global side-effects, its effects are not entirely avoidable by choice and without modifying the implementation of those other modules. In practice, the use of the module function is somewhat common, likely because the module function is included in the Lua standard libraries, presumably as a convenience and standardized best practice for module definition, and a number of official or reputable Lua sources, such as the Lua Reference Manual [2] and Programming in Lua (PiL) [3] encourage the use of the module function and even suggest it is a good one. Therefore, new users quickly become accustomed to using the module function.

The usual way to define a module with the module function is like this:

-- hello/world.lua

module(..., package.seeall)

local function test(n) print(n) end

function test1() test(123) end

function test2() test1(); test1() end

and it is used like this:

require "hello.world"

require "anothermodule"

hello.world.test2()

There are two main complaints presented on the module function, which are both seen if anothermodule is defined like this:

-- anothermodule.lua

module(..., package.seeall)

assert(hello.world.hello.world.print == _G.print)  -- weird

assert(hello ~= nil) -- where'd this come from anyway?

First, the global namespace is accessible by indexing the module table; second, hello is visible in this module even though it was not requested by it.

The first complaint it is less inherent to the module function but rather due only to the package.seeall option. package.seeall allows a module to see global variables, which are normally hidden since the module function replaces the current environment of the module with a local one. What package.seeall does is muck with the metatable of the module's environment to fallback to _G. This allows not only the module itself it access _G, but the variables in _G also become part of the module's interface. Among various things, the behavior of exposing the global environment through the module table could be detrimental to sandboxing (see SandBoxes), and these variables might be used accidentally, but more glaringly it's just plain weird.

Luckily, package.seeall is only a convenience option and can be avoided as such:

-- hello/world.lua

local _G = _G

module(...)

function test() _G.print(123) end

or

-- hello/world.lua

local print = print

module(...)

function test() print(123) end

Those are a bit awkward, but there may be other more syntactically pleasing ways to avoid it, such as by recognizing that the module table and the module environment table need not be the same (e.g. see ModuleDefinition -- "Module System with Public/Private Namespaces"). We won't go into further detail on this first point.

The second points is that the module function has the side effect of creating global variables named in ways the programmer doesn't fully control. On executing module("hello.world"), the function creates a table named "hello" in the global environment (the initial global environment, not the current environment set through setfenv), and stores the module table under the key "world" in that table. However, if any of those variables already exist (e.g. someone else placed them there), the function raises and error, which at least provides some level of safety. The behavior of the module function can best be understood with the following representation of it in Lua taken from LuaCompat [4] (the real version is in loadlib.c).

local _LOADED = package.loaded

function _G.module (modname, ...)

  local ns = _LOADED[modname]

  if type(ns) ~= "table" then

    ns = findtable (_G, modname)

    if not ns then

      error (string.format ("name conflict for module '%s'", modname))

    end

    _LOADED[modname] = ns

  end

  if not ns._NAME then

    ns._NAME = modname

    ns._M = ns

    ns._PACKAGE = gsub (modname, "[^.]*$", "")

  end

  setfenv (2, ns)

  for i, f in ipairs (arg) do

    f (ns)

  end

end

The problem results since we have different modules maintained by different people writing to the global environment. Furthermore, an application using those modules may be writing to the global environment as well. Due to information hiding, [5] the modules and the application should have no knowledge of the internal workings / implementation of those modules--nor, possibly, even the names of the modules those modules require. The result is that a program lacks control over which global variables get set. Various types of this problem that result from this are illustrated below.

In the following examples, we will as a convenience define modules inline rather than in separate files. For example, rather than creating two files like such

-- mymodule.lua

module(...)

function test() return 1+2 end



-- mymodule_test.lua

require "mymodule"

print(mymodule.test())

we will simply write

(function()

  module("mymodule")

  function test() return 1+2 end

end)();

print(mymodule.test())

Here is the first example:

(function()

  local require = require

  local print = print

  local module = module

  module("yourmodule");



  (function() module("mymodule") end)()



  print(mymodule ~= nil) -- prints false (where is it?)

end)();



print(mymodule ~= nil) -- prints true (where did this come from?)

As shown, loading modules like "mymodule" always populates the global environment rather than the current environment where the module is used. This is the reverse of what is needed. Many such module loads can fill the global environment with variables intended to be private.

Another problem is as Mark Hamburg notes [16], putting modules into the global namespace hides dependencies. Assume your program loads module "bar" and loading module "bar" also loads module "foo". Now module "foo" will also be available in the global namespace. In your program you start using module "foo" from the global namespace. If module "bar" now removes the dependency on module "foo", it will also no longer be available in the global namespace and break your program. It is not immediately apparent where foo in the global namespace came from, nor that it is actually a module (that used to be a dependency of module "bar").

The following two examples are related to each other:

function test() return 1+2 end



(function()

  module("mymodule", package.seeall);



  (function()

    module("test.more") -- fails: name conflict for module 'test.more'

    function hello() return 1+2 end

  end)()

end)()

and

(function()

  module("test")

  function check() return true end

end)();



(function()

  module("test.check") -- fails: name conflict for module 'test.check'

  function hello() return 1+2 end

end)();

As seen, package names and regular variable names conflict. The module function does detect and raise an error if a global variable it's overwriting already exists. That's what we want, right? Well, this also means that it's particularly indeterminant whether loading a module will succeed since the module may load other modules whose names (and names of its members) we might not know and that conflict with global variables.

As a side note, in some other languages (e.g. Perl), variables and package names are maintained in separate namespaces and so are prevented from conflicting. [*3] It's also noteworthy, that the module naming conventions affect if and how names conflict. For example, Java package names [6] are conventionally prefixed by a (unique) domain name under the author's control, which is verbose but provides a mechanism to avoid conflict. In Perl, CPAN provides a central naming registry to prevent conflicts, and modules with the same prefix indicate a common function rather than a common maintainer (e.g. "CGI" [7] and "CGI::Minimal" [8] are maintained independently by different authors, and "CGI::Minimial" is not stored inside the "CGI" table).

(function()

  module("mymodule", package.seeall);



  (function()

    module("test.more")

    function hello() return 1+2 end

  end)()



  function greet()

    test.more.hello()  -- fails -- attempt to index global 'test' (a function value)

  end

end)();



function test()

  mymodule.greet()

end



test()

Here, the program inadvertently overwrites a global variable set by the module function. The module function does not detect this. Rather, there is program failure (possibly a silent one) when a module that depends on this global variable attempts to access this variable.

(function()

  local require = require

  local module = module

  local print = print

  local _P = package.loaded

  module('yourmodule.two');



  (function()

    module('mymodule.one')

  end)()



  print(_P['mymodule.one'] ~= nil) -- prints true

end)();



local _P = package.loaded

print(_P['mymodule.one'] ~= nil) -- prints true

Storing modules in the global environment is in fact somewhat redundant since they are also stored in package.loaded (though without creating nested tables for the periods in the module name).

~~~

The problems above can be avoided by not using the module function but instead defining modules in the following simple way: [*1][*2]

-- hello/world.lua

local M = {}



local function test(n) print(n) end

function M.test1() test(123) end

function M.test2() M.test1(); M.test1() end



return M

and importing modules this way:

local MT = require "hello.world"

MT.test2()

Note that the public functions are clearly indicated with the M. prefix. Unlike when using module, the global environment is not visible though the MT table (i.e. MT.print == nil), the hello.world table has not been exported (or polluted) to the global environment but is rather a lexical, and modules with the same prefix (e.g. hello.world.again) would not alter the hello.world table. In the client code, the module hello.world can be given a short abbreviation local to that module (e.g. MT). The approach also works well with DetectingUndefinedVariables. This is great. The one complaint is that public functions need to be prefixed with M. in the module itself, but then the other solutions are often proposed introducing their own problems and complexities, such as package.seeall noted above. It does not particularly hurt to be explicit with M. (two characters), especially when code size gets larger.

A related note on C code: The luaL_register [9] function in C is somewhat analogous to the module function in Lua, so luaL_register shares similar problems, at least when a non-NULL libname is used. Furthermore, the luaL_newmetatable/luaL_getmetatable/luaL_checkudata functions use a C string as a key into the global registry. This poses some potential for name conflicts--either because the modules were written by different people or because they are different versions of the same module loaded simultaneously. To address this, one may instead use a lightuserdata (pointer to variable of static linkage to ensure global uniqueness) for this key or store the metatable as an upvalue--either way is a bit more efficient and less error prone.

The module function (and its ilk) may introduce more problems than it solves.

--DavidManura

Footnotes

[*1] (Advocates of the above style include RiciLake, DavidManura, others who have mentioned it on IRC, MikePall [17][18][19], ... (add your name here))

[*2] There has also been the suggestion to move the standard libraries in this direction [20].

[*3] Example in Perl where modules and variables of the same name do not conflict:


package One;

our $Two = 2;

package One::Two;

our $Three = 3;

package main;

print "$One::Two,$One::Two::Three" # prints 2,3

Additional Points

Many of the additional points below were taken from the Oct 2011 discussion on module [21][22].

Bundling

With modules defined using the module function, we can sometimes just concatenate them (cat *.lua > bundle.lua) if it's desired to bundle them into a single file [21]. However, this does not work in the general case:

module("one", package.seeall)

require "two"  -- This fails unless you sort the modules according to their dependency graph

               -- (assuming, as is best design, it has no cycles and can be computed statically)

local function foo() print 'one.foo' end

function bar() foo() two.foo() end



module("two", package.seeall)

function foo() print 'two.foo' end  -- This overwrite a previous local



module("main", package.seeall)

require "one"

one.bar()

A general solution, which works for modules both with and without module involves package.preload as follows:

package.preload['one'] = function()

  module("one", package.seeall)

  require "two"

  local function foo() print 'one.foo' end

  function bar() foo() two.foo() end

end



package.preload['two'] = function()

  module("two", package.seeall)

  function foo() print 'two.foo' end

end



package.preload['main'] = function()

  module("main", package.seeall)

  require "one"

  one.bar()

end



require 'main'

A number of bundling utilities listed on the bottom of BinToCee utilize approaches like this.

Switching between private and public

One criticism placed on the "M" table style of module definition is that if a function definition in the module is changed from public to private then all references to that function must be renamed (e.g. M.foo to foo) [23].

function M.foo() end          -- change to "local function foo() end"

function M.bar() M.foo() end  -- and also change "M.foo()" to "foo()"

A mitigating factor is that references to M.foo() are localized to the current module and may typically be relatively few in number. The refactoring operation required here is also the same for when you want to rename a function, which you'll need anyway. Text editors can assist in this refactoring, and some editors with knowledge of the Lua language can also rename variables quite robustly. In some languages, e.g. Python, private variables are informally differentiated from public variables with leading underscores, so the same criticism would apply.

One technique to avoid renaming is to keep all functions local, and insert any functions that should be public into the public table right after their definition:

local function foo() end; M.foo = foo

Some performance critical code does that anyway for the small performance advantage. The triplicate use of foo in the definition is unfortunate, and workarounds to avoid this (such as localmodule in ModuleDefinition or token filters) are likely not worth it.

Finally, note that changing a function from public (table or global variable) to private (local variable) may also require moving the function definition. local variables, unlike table or global variables, are lexically scoped, so they must be declared (or forward declared) prior to use. New users not versed in lexical scoping can be confused by this. We can avoid this by declaring all variables (public and private) uniformly with either locals (as in the example above) or table/global variables (as will be shown below). The latter can involve a Python-like technique of prefixing private variables with underscores or using two tables:

local M = {}

function M._foo() print 'foo' end

function M.bar() M._foo() end

return M

local M = {} -- public

local V = {} -- private

function V.foo() print 'foo' end

function M.bar() V.foo() end

return M

Neither of those addresses the problem, however, of needing to replace references when a function is changed from public to/from private. We may also solve that problem by using a technique like this:

local M = {} -- public

local V = setmetatable({}, {__index = M}) -- private and public

function V.foo() print 'foo' end

function M.bar() V.foo() end

return M

Now, we can always safely change a function from public to/from private by changing only one character in the file. If we wanted to avoid some of the cruft, we could move some of it into the module loader so that modules need only be written as

function V.foo() print 'foo' end

function M.bar() V.foo() end

It may not be the tersest, but the differentiation between public and private scopes (V/M) is explicit.

Mechanism not Policy

"Despite our “mechanisms, not policy” rule — which we have found valuable in guiding the evolution of Lua — we should have provided a precise set of policies for modules and packages earlier. The lack of a common policy for building modules and installing packages prevents different groups from sharing code and discourages the development of a community code base. Lua 5.1 provides a set of policies for modules and packages that we hope will remedy this situation." -- The Evolution of Lua, http://www.lua.org/doc/hopl.pdf

"Usually, Lua does not set policies. Instead, Lua provides mechanisms that are powerful enough for groups of developers to implement the policies that best suit them. However, this approach does not work well for modules. One of the main goals of a module system is to allow different groups to share code. The lack of a common policy impedes this sharing." -- http://www.inf.puc-rio.br/~roberto/pil2/chapter15.pdf

More recent clarification: LuaList:2011-10/msg00485.html

See also MechanismNotPolicy.

Prevalence of the use of the module function

The majority of pure-Lua modules in repositories currently use the module function:

Use of "..." in the module function

LuaList:2011-10/msg00686.html argues that it's preferable for the name of the module to be explicitly specified in the module text so that it's clear how to load it:

module("foo.bar")    -- encouraged

module(...)          -- discouraged

-- module: foo.bar   -- name in informal comment better than nothing

local M = {} return M                        -- anonymous and likewise discouraged

local M = {}; M._NAME = "foo.bar"; return M  -- better than above

On the other hand, this doesn't make the package as easily relocatable.

Lua version compatibility

The Lua 5.1 module function can be used in Lua 5.0 via [LuaCompat]. Lua 5.2.0-beta has a compatibility mode, and furthermore "It is quite easy to write a 'module' function in 5.2, using the debug library. (But it will not allow multiple modules in a single file, which is a kind of hack anyway)" (Roberto, LuaList:2011-10/msg00488.html).

The "M" table style of module definition is also compatible in 5.0, 5.1, and 5.2.0-beta.

_ENV is not supported directly in 5.1, so its use can prevent a module from remaining compatible with 5.1. Maybe you can simulate _ENV with setfenv and trapping gets/sets to it via __index/__newindex metamethods, or just avoid _ENV.

Lua 5.2 module definition

In 5.2.0-beta, you can just continue to use the "M" table style of module definition, and there are those that recommend it [24]. On the other hand, some have suggested that in Lua 5.2, modules will be written like this, using the new _ENV variable (which largely supplants setfenv):


_ENV = module(...)

function foo() end

Some argue against new users needing to be aware of the obscure looking _ENV. Although that might be avoided by having someone else set up _ENV when the chunk is loaded (e.g. by require or the searcher function), others continue to argue that the module's private environment and public tables should not be mixed, and there is no need for _ENV at all.

Fostering module development

The argument in LuaModuleFunctionCritiqued was that module has technical defects (side-effects and obscure corner cases), which hinder a core property of modularity: composability. In practice, this means that an application loading two different modules written by two different authors should not experience any surprising interactions between the two modules. On the other hand, the "M" table style (if properly used) does not have these defects, due to its very simple semantics without side-effects (formally, the module loader can often be thought of as a [pure function] in the functional programming sense).

Hisham has argued [25] that even though module has technical defects, these can largely be fixed and they are minor compared to its success in module promoting a more standard policy for module definition (absent in 5.0). This success appears to be on a sociological rather than technical level. module is said to have fostered code sharing and development of a community code base, and most modules in LuaRocks use module. Moreover, the use of module (which is a built-in keyword in some other languages) has a concern self-documenting property, announcing the intention of the code (I am a module, this is my name, and here are my public functions) with minimal boilerplate. We also all seem to agree than obscure boilerplate (setfenv/_ENV things) in modules is a negative for readability.

It can be argued, however, that some years after the introduction of the 5.1 module system, complaints are still heard with some frequency about the quantity, quality, and consistency of Lua modules, even for the basics like StandardLibraries. There are other facts in play, and efforts like LuaRocks are addressing some of these areas, but the more difficult question is separating out whether module has helped or hurt and whether changes in 5.2 will help.

Given that a "standard library" like penlight [10] has removed module calls from its implementation [26], it's not apparent this has negatively affected anyone. However, the fact that Lua 5.2.0-beta has deprecated module, suggesting existing modules using it might no longer work without loading a compatibility function or rewriting has caused some concern [27].

Mixing global and module namespaces

In putting globals (e.g. print) and module functions (e.g. foo.print) in the same namespace, as package.seeall does, the module function may overshadow a global of the same name. This is one reason some prefer to be explicit by giving module functions a unique prefix (e.g. M.print). This may also help readability in that it's obvious that M.print is a public variable exported from the current module and not a local, global, or imported package name. Sometimes this type of prefixing is needed anyway in other parts of the code (e.g. "self." or "ClassName:") This explicitness also avoids any issues or overhead with merging the two namespaces with a metatable. This practice does, however, introduce some repetition (see "Switching between private and public" above).

A similar debate has occurred in the C++ community concerning things like "using std;" [11], which imports possibly conflicting names into the current namespace and therefore is safest to avoid. Moreover, in C, there's a common practice of prepending "g_" or "s_" to global or static variables (and similarly to "m_" for members in C++), although intelligent IDE's can mitigate some of the need for this.

Python and Perl name leakage

Python has an issue similar to package.seeall [23]:


-- bar.py

import logging # logging is now available as bar.logging

Perl can also have this issue:


package One;

use Carp qw(carp);  # carp is now available via One::carp

carp("test");

1

unless you avoid symbol imports (and instead fully qualify names):


package One;

use Carp qw();  # carp is now available via One::carp

Carp::carp("test");

1

or use [namespace::clean], [namespace::autoclean], or [namespace::sweep] modules.

Python (like Lua module) imposes a relationship between the modules foo and foo.baz. If your module loads foo and another module loads foo.baz, then baz will then be placed inside your module. This likely accounts for why the Hitchhiker's Guide to Packaging [12] suggests that the first part of the module name ("foo.") be globally unique, and it seems that Python packages tend to share the same prefix only if they are managed by the same entity (e.g. numerous, but not all, Zope packages/modules are under a "zope." prefix). The same guideline should apply to Lua in its current state.

Perl is not quite the same since you can have a package Foo with variable $Foo::Baz and another package Foo::Bar, and these do not conflict since packages exist in their own namespace. Perl does, however, share the following issue with Lua module:


# One.pm

package One; sub test { }; 1



# Two.pm

package Two; One::test(); 1



# main.pl

use One;

use Two;   # This succeeds as written but fails if the lines are later reversed

Global variables through package.loaded

Require globally registers modules in package.loaded. So, regardless how modules are defined, you still have global access to loaded modules if you want it:

local L = package.loaded

.....

L['foo.bar'].baz()

L.foo.bar.baz()  -- if require'foo'.bar == require 'foo.bar'

Doing this may be considered laborious:

local FBAR = require 'foo.bar'

local FBAZ = require 'foo.baz'

local FBUZ = require 'foo.buz'

...

You may just accept it, or you could find ways to simplify it:

-- foo.lua

return {

  bar = require 'foo.bar',

  baz = require 'foo.baz',

  buz = require 'foo.buz'

}

or ways to automate it:

local _G = require 'autoload' -- under appropriate definition of autoload

_G.foo.bar.qux()

-- note: penlight offers something like this

The latter does not have problems with missing hidden dependencies since modules are always loaded on demand if needed. On the other hand, module loading it not localized to the module loader function but rather can occur later wherever the module functions are used, which means that error detection may be delayed and usages of these functions may be more prone to fail (e.g. module loading failure due to module not installed), complicating error handling. There are ways this might be addressed though.

Defining Classes

A concern similar to standardizing module definition is standardizing class definition (ObjectOrientedProgramming). Moreover, some modules are also classes.

As an example, ratchet's code [13] does something like this:

.....

module("ratchet.http.server", package.seeall)

local class = getfenv()  -- why not _M?

__index = class



function new(socket, from, handlers, send_size)

    local self = {}

    setmetatable(self, class)

    .....

    return self

end



function handle(self) ..... end

If modules should utilize module, then we should ask if this is how class modules should be defined.

lua -l switch

The lua -l command line switch is not useful if it doesn't have some side-effect. With M-style modules in 5.1, -l will at least create a variable in package.loaded, but accessing that is long-winded, and the purpose of -l is typically for short-hand on invoking the interpreter (after all, the same can be achieved via -e "require....."). In 5.2.0-beta, -l will create a global table using lua_pushglobaltable even for M-style modules. The global variable created via lua_pushglobaltable may be longer than desired though, and you might instead want the effect of -e 'FB = "require 'mypackage.foo.bar'". See the discussion LuaList:2011-11/msg00016.html .

If -l is used to create a global variable, should this be added to _G? or somehow limited to just the main program chunk (e.g. the effect of -lfoo would be to add local foo = require 'foo' or _ENV = setmetatable({foo = require 'foo'}, {__index = _G}) to the top of the main chunk)? The latter is cleaner.

There was also a suggestion that -l should accept parameters [14][19].

Static Analysis

Modules defined with "M" tables, at least without metatables, even though they may have some variation in form, can be statically analyzed from first principles (e.g. behavior of lexical variables and tables), as LuaInspect does [28].

The 5.1 module function has more complicated semantics (side-effects and metatable behavior). Nevertheless, we can still infer meaning on a higher level, particularly if conventions are followed, as tools like [LuaDoc] have done. Changes in 5.2 and suggestions for improving the module function may also affect this area.

Improving "module"

Some proposals for making module better rather than tossing it out are in the thread http://lua-users.org/lists/lua-l/2011-10/threads.html#00481 . (TODO: post best recommendations here)

Other User Comments

Not using the module function means that by omitting the local keyword, it could be very easy to pollute the global environment (which is bad, that's the purpose of that article). So we can improve the module function by changing the environment to something private (that can inherit from _G) and define in it the _M table (as now) that will contains the module public interface. I also was concerned about these issues, and there is a tricky way to use the module function and not clutter the global environment. There it is:

package.loaded[...]={}

module(...) -- you might want to add package.seeall

However this is not a solution to solve the global namespace being accessed through the module. For that we would need a modified module function. Hopefully in the next Lua release.

--MildredKiLya

"Not using the module function means that by omitting the local keyword, it could be very easy to pollute the global environment (which is bad,...)" -- True, but unwanted global accesses are detectable prior to run-time using a method in DetectingUndefinedVariables, and I consider them errors that should be fixed.

Your approach using the module function might be done as follows, though this is going out of its way to circumvent the current behavior of the module function:

-- mod.lua

local _E = setmetatable({}, {__index=_G})

local _M = {}

package.loaded[...] = _M

module(...)

_E.setfenv(1, _E)

function _M.test()

  return math.sqrt(9)

end

test2 = 1



--modtest.lua

local m = require "mod"

assert(not mod)

assert(m.test() == 3)

assert(not test)

assert(not test2)

assert(not m.print)

print 'done'



$ luac -p -l mod.lua | lua /usr/local/lua-5.1.3/test/globals.lua

setmetatable    1

_G      1

package 3

module  4

test2   9*

math    7

--DavidManura

See Also


RecentChanges · preferences
edit · history
Last edited January 10, 2012 2:05 am GMT (diff)