String Indexing

lua-users home
wiki

In some languages, e.g. Python, C and Pascal, you can write a[5] for the fifth (or maybe the sixth) character of the string a. Not in Lua. You have to write a:sub(5,5) or string.sub(a,5,5). Can we do something about it?

From Lua 5.1 on, yes. Thus:

getmetatable('').__index = function(str,i) return string.sub(str,i,i) end

-- demo

a='abcdef'

return a[4]      --> d

But what about substrings, say a[3,5]? No, that's illegal. We have to use the __call metamethod instead.

getmetatable('').__call = string.sub

-- demo

a='abcdef'

return a(3,5)    --> cde

return a(4)      --> def -- equivalent to a(4,-1)

Let's get really fancy and implement a suggestion of Luiz himself. [1]

getmetatable('').__index = function(str,i) return string.sub(str,i,i) end

getmetatable('').__call = function(str,i,j)  

  if type(i)~='table' then return string.sub(str,i,j) 

    else local t={} 

    for k,v in ipairs(i) do t[k]=string.sub(str,v,v) end

    return table.concat(t)

    end

  end

-- demo

a='abcdef'

return a[4]       --> d

return a(3,5)     --> cde 

return a{1,-4,5}  --> ace

So there you have it: one-byte substrings with square brackets, to-from substrings with round, selected bytes with curly.

Note: using this simple __index method you will lose the ability to call methods on strings, such as a:match('abc'). You need to modify __index as follows:

getmetatable('').__index = function(str,i)

  if type(i) == 'number' then

    return string.sub(str,i,i)

  else

    return string[i]

  end

end

If you don't like that, you can omit the redefinition of __index and use a{4} instead of a[4].

Characters versus bytes

Always remember that these indexing functions select bytes, not characters. For example, UTF-8 characters occupy a variable number of bytes: see the discussion ValidateUnicodeString.

Lua = 'Lua'

print (Lua(1,3))    -->   L


RecentChanges · preferences
edit · history
Last edited July 16, 2011 12:48 pm GMT (diff)