Sven Olsen |
|
What I found there was something far better -- Peter Shooks beautifully clever table unpack patch. Since that discovery, tweaking the syntax rules of my Lua parser has become a bit of a guilty pleasure.
Rather than clutter up the power patch page with a long list of small, debatably useful patches, Ive decided to try documenting most of them here. Again -- Im following PeterShook's lead in this; as he also seemed to have decided to move the docs for some of his more debatably useful language tweaks to his personal bio page.
a=b (f or g)()
In Lua 5.1, the parser will throw an error given the above code, complaining about ambiguous syntax. Lua 5.2 will accept the code, interpreting the two lines as single statement, one that executes:
a=b( f or g)()
In general, I prefer 5.1s behavior. After upgrading to Lua 5.2, I occasionally found myself writing bugs that the old ambiguous syntax check would have caught.
However, as Roberto has pointed out, there are problems with 5.1s ambiguous syntax check. For one, it doesnt actually check for ambiguous syntax -- a task that would be effectively impossible inside a single-pass parser. Instead, the check is implemented by simply throwing an error anytime a function argument list starts on a new line. Thus, under 5.1
print ( "long string one", "long string two" )
results in an error for ambiguous syntax; though theres clearly nothing ambiguous about the code.
Ive tweaked my own Lua parser to have behavior somewhere between Lua 5.1s and 5.2s. My check works by adding a second condition to the one used in 5.1, restricting errors to the case of expressions that include 2+ function calls. Ive also changed the text of the error message, in hopes of making it more obvious that the error should be interpreted as a warning about dangerous formating.
My modified check isnt perfect -- like Lua 5.1, it will still sometimes throw errors in response to code that only has one possible interpretation. For example, the following triggers an error, even though theres only one valid way of parsing the code:
new_object (f or g)(state)
The beautiful thing about Luas syntax is that these sorts of troublesome syntax ambiguities very rarely come up in practice. Even under the very aggressive 5.1-style newline handling, programmers would only rarely see ambiguous syntax errors. Under my own more cautious check, such situations are even less common.
But while it may be a rare edge case, I think its a mistake to ignore the issue completely.
Of all the patches Ive written, this is the only one which Id seriously recommend as an addition to the official Lua branch. It does prevent bugs; and its costs are tiny.
This is perhaps the simplest powerpatch youll ever find. Its a one liner that Brian Palmer included in his concise anonymous functions patch -- one that removes the need to follow if statements with a then token. Ive extended it to similarly make optional following a for statement with a do token. Using the patch will add some potential parsing quirks to the language. For example, if you have a statement like this:
if a then (f or g)() end
and you remove the then, youll end up generating code that simply executes the line:
a( f or g )()
But as discussed above -- potentially ambiguous syntax is something that Lua programmers will always need to be mindful of -- its a necessary consequence of making semicolons optional. In the version posted here, Ive packaged it with my newline handling patch; as using it along with 5.2s relaxed parse rules seems unnecessarily dangerous.
If you're curious to try out any of the rest of my patches, you can download them all as one [super-patch], based on 5.2.2. While most of my mods are fairly self-contained -- there's just enough overlap between them to make maintaining independent patch files troublesome. Peter's Table Unpack patch overlaps with both Compound Assignment and the Required Fields semantic. Required Fields, meanwhile, share a VM change with the Safe Navigation patch. The Stringification patches are both small and self-contained -- though if you want a clean-version of those two, you can find my walk-through of the diffs in the lua-l archives [1].
As Ive mentioned above, this is my favorite powerpatch. And if youre going to try any of my own syntax mods, you should certainly try Peters as well. The syntax converts:
a,b,c in t ==> a,b,c=t.a,t.b,t.c.
Its a wonderful transformation -- and one that's become even more useful as a result of the new _ENV rules in 5.2. For example, if youre planning to change _ENV to something unusual, but want to keep certain standard global functions in scope, you simply write:
local pairs, ipairs, tostring, print in _ENV
However, there are more subtle idioms you can try as well, most of which come from combining the syntax with metamethods. Consider:
local x,y,z,vx,vy,vz in INIT(0)
If INIT(a) returns a table with __index = function() return a end, then the above will initialize all the given variables to 0.
A similar __index trick will let you significantly simplify most require statement boilerplate. For example, you can define an REQUIRE proxy object that lets you replace,
local socket = require 'socket' local lxp = require 'lxp' local ml = require 'ml'
with,
local socket, lxp, ml in REQUIRE
Peters syntax is as powerful as it is because it gives programmers a tool for converting variable names into strings. However, it only does so inside the context of variable assignment. A more general tool for transforming variables into their matching string representations would be helpful; and while the patch Ive come up with isnt as elegant or as clear as Peters, I do often use it.
The patch applies two transformations. First, in the context of a table constructor, writing
t = {..star, ..planet, ..galaxy} ==> t = {star=star, planet=planet, galaxy=galaxy}
Similarly, in the context of function argument lists, writing
f(..star,..planet,..galaxy) ==> f('star',star,'planet',planet,'galaxy',galaxy)
Due to a quirk of the implementation, using .. on a on a complex expression will return the last string, name, or numeric constant encountered while parsing that expression. Thus
{..planet.star, ..planet, ..moon 'luna' } ==> { star=planet.star, planet=planet, luna = moon 'luna' }
The name for this handy semantic is borrowed from Groovy; through CoffeeScript? also has a similar feature. The idea is to make it possible to check for values without triggering an 'attempt to index nil' error.
For example, the following expression will evaluate to the glow color of an objects icon, if a glow color is defined, or white, otherwise:
color = object?.icon?.glow?.color or white
Failing to define object, or object.icon, or object.icon.glow, will result in the first part of the expression evaluating to nil.
When I proposed this patch on lua-l [2], there was quite a bit of enthusiasm for it. However, there was also some disagreement over the details of how, exactly, the semantics ought to work.
My own preference is to define ? as a relatively simple piece of syntax sugar, though one that relies on adding a new variable to the global namespace. Thus, when a new lua state is initialized, I add a userdata called _SAFE to the global table, where _SAFE has __index, __newindex, and __call all set to nullops, __len set to always return 0, and __pairs and __ipairs defined as functions that themselves return a nullop.
Once we have such a user data, it's fairly simple to add some syntax sugar that converts
(<expr>?) ==> (<expr> or _SAFE*)
Given the default definition of _SAFE , this results in indexing operations that work as desired. But the semantic also opens up some neat new features. For example, if youre also using Peters table unpack patch, you can write:
local update, display in object?
With the result that update and display will be nil if object isnt defined.
Similarly, you can call
update?()
with the result that update will only execute if its defined. Or you can write the following, which which iterates through all of objects icons, provided theyre defined.
for k,v in pairs(object.icons?)
But theres one caveat to this otherwise fairly elegant definition. For the patch to work as intended, the version of _SAFE referenced from the syntax sugar needs to be evaluated as if the upvalue _ENV was equal to _G -- otherwise, a seemingly harmless line like _ENV = {} will change the meaning of the shorthand. (This is why Ive included an asterix in the transformation definition.)
So while implementing this patch requires only a small parser change, it also requires reintroducing the op code OP_GETGLOBAL back into the VM.
Its probably also worth pointing out that the semantics do have a bit of quirk, one related to the way or operations are interpreted. Specifically:
v = (nil)?.v ==> v=nil v = (false)?.v ==> v=nil v = (true)?.v ==> runtime error: attempt to index a boolean value
Several lua-l users considered that behavior a bit unintuitive; though personally, I prefer it to throwing an error on (false)?.v.
A few months after writing up the safe navigation patch -- there was a long discussion on lua-l about the various ways in which having undefined table references return nil can lead to bugs [3]. For example, if youre writing code to build a list of all object names, iterating along and setting
name[i] = object.name
can lead to strange behaviors on down the line if you happen to come across an object that lacks a name.
In a way, this is the exact opposite of the situation that motivated the safe navigation patch. The purpose of ? is to make Lua return nil when it would otherwise throw an error. However, there are certainly cases where the opposite behavior is desirable; where wed like Lua to throw an error, rather than returning nil. Thus, Ive written a patch for a "required field" operator. On the surface, it's quite similar to my safe navigation patch. In its simplest form, it converts:
(object!) ==> (object or _MISSING*( "object" ) )
Here _MISSING is a global variable that's fetched via OP_GETGLOBAL as per the Safe Navigation patch. Thus, a line like
name = get_name(object!,field!)
Will result in
error: missing required value "object", if object==nil or error: missing required value "field", if field==nil
I've taken things a bit farther, however, augmenting the syntax so that table lookups that include a '!' will throw errors if they return nil or false. Thus, for a line like:
age,height in record![name]!
The errors that may be generated include:
error: missing required value "record", if record==nil error: <expr> is missing required field ["age"], if record[name].age==nil error: <expr> is missing required field [name -> "foo"], if name=='foo', and record.foo==nil
This is not an elegant patch -- dispatching between the various error cases gets messy, and the generated bytecode is not terribly efficient. Even so, I've been finding it useful. It removes the need for much of the boilerplate sanity checking code Id otherwise write.
This was the syntax sugar I originally went looking for. All the goodies I'd been missing from C: +=, -=, *=, etc.
The implementation included in the bundle allows vector increments, for example:
vx,vy,vz += ax, ay, az
You can also use an open function call to provide data for an arbitrary list of additional values. For example:
px,py,pz += 1, calc_yz_vel()
However, if there's a clear miss-match between the number of right hand and left hand values, the parser will throw an error.
Unlike standard assignments, the compound assignments are evaluated left-to-right. So,
local a,b=2,4 a,b+=b,a ==> a==6, b==10
Also -- because I was having altogether too much fun with this hack -- I added a '++' sugar as well. In the case of multiple left hand values, all will be incremented by 1.
i,j,k++ ==> i,j,k=i+1,j+1,k+1