Bringing More to the Table with Lua (Part 2)

A visual example of how a hash function works to create a hash table.A visual example of how a hash function works to create a hash table.

This is the sixth entry in my Lua Tutorial series. See here for an introduction for this series, or here for our previous entry covering using tables as arrays.

We’re going to cover using tables as hash tables which are a bit different than arrays. Hash tables or hash maps are the basis of many useful data structures. Ultimately, they end up being basically arrays on the back end with a way to map and retrieve values based on keys. See the example below for a visual explanation before we move on to the full explanation.

This example shows a few keys being passed into a hash function and mapping into values. The values are shuffled around because they will typically be located in an array behind the scenes and the hash function exists as a way for the system to find these variables again. How a language with these built in actually handles this is pretty much immaterial to our understanding until a much higher level since it will not affect operation in any notable way for most (if any) use cases. Even though these values map one-to-one with a key, they typically will not necessarily be stored in order on the system. This behavior can definitely affect your usage.

Hash tables or, for the rest of this article, tables are instantiated in a similar fashion to arrays:

exampletable = {}

You can instantiate this type of table similar to an array, but note there are some key differences as noted below:

 exampletable = { ["Chicken"] = "Egg", ["Dog"] = "Puppy", ["Rabbit"] = "Kit" }

Accessing a table is very similar to accessing an array. You work with tables like follows:

table[ "key" ] = value

In this example table is the table name and key is the desired key. This will set the value as desired. So, you can write something like:

 exampletable = { ["Chicken"] = "Egg", ["Dog"] = "Puppy", ["Rabbit"] = "Kit" }

exampletable["Chicken"] = "Egg"
exampletable["Dog"] = "Puppy"
exampletable["Rabbit"] = "Kit"

print( exampletable[ "Chicken" ] )
print( exampletable[ "Dog" ] )
print( exampletable[ "Rabbit" ] )

We will get back:

Egg
Puppy
Kit

Just like with our arrays, this is extremely cumbersome. Luckily, we have something similar to ipairs, pairs. Pairs works basically just like ipairs except instead of an iterator and a value, it returns a key and a value. Also, like we mentioned before, items in a hash table may not be stored in any specific order. This is especially true in Lua. Do not ever expect the results in a table aside from an array to ever be returned in a sane or consistent order.

exampletable = { ["Chicken"] = "Egg", ["Dog"] = "Puppy", ["Rabbit"] = "Kit" }

for key, value in pairs( exampletable ) do
	print( "exampletable[ " .. key .. " ] = " .. value )
end

Which may get back:

exampletable[ Chicken ] = Egg
exampletable[ Dog ] = Puppy
exampletable[ Rabbit ] = Kit

We can make an entry easily, but how do we get rid of one? The answer is deceptively simple, like with an array, where if we request a value which isn’t assigned, we get nil, we can do the same with a hash table. By a minor extra bit of abstraction, we can take this to be interpreted as a nil value is for all intents and purposes unassigned. So, by setting a key to nil, we basically clear it out from existence.

exampletable = {}

exampletable["Chicken"] = "Egg"
exampletable["Dog"] = "Puppy"
exampletable["Rabbit"] = "Kit"

for key, value in pairs( exampletable ) do
	print( "exampletable[ " .. key .. " ] = " .. value )
end

exampletable[ "Chicken" ] = nil
print( "--------" )

for key, value in pairs( exampletable ) do
	print( "exampletable[ " .. key .. " ] = " .. value )
end

We (may) end up with:

exampletable[ Chicken ] = Egg
exampletable[ Dog ] = Puppy
exampletable[ Rabbit ] = Kit
--------
exampletable[ Dog ] = Puppy
exampletable[ Rabbit ] = Kit

We did something interesting above with our for loops. We reused the variables entirely. Lua doesn’t really have any stylistic or syntactic restrictions about wholesale reusing loop variables and variables in general. As we get along further, we will go over some variable usage with keywords like local and other stylistic points. In the previous loops article, I went ahead and did not reuse variables in any example for clarity’s sake. You can give an iterator almost any name you want aside from reserved words, which basically just excludes the language’s key structures. This is a bit of a double edged sword as this means you can easily override functions and variables from imported libraries (keep this in mind even though we will go over it again). We will go over some prospective best practices in a later lesson in order to make it easier to mitigate these types of issues.

Hash tables are extremely useful, but what if we want our information back in order? Fortunately, there is a sort function for tables (specifically arrays, but we will get to that in short order). It is extremely useful for both arrays and hash tables. The sort function is based on the quicksort algorithm. Quicksort is an extremely fast, reliable sort algorithm for a lot of different data which is why it is the default in many languages. There are other sorts which suit other data or organizations, but they are not important at this stage. In order to sort a table or array you do the following:

table.sort( [table] )

Or:

table.sort( [table], [sort function] )

When you do a plain sort, you get an array ordered in ascending order. If you supply a sort function, you will get an array ordered by whatever arbitrary function you supply. We will cover these in a bit. When you sort an array, you just run table.sort( array ) and it sorts the actual array you supplied for “array”. If we want to sort a hash table, we need to sort the keys, or create a key set based on whatever sort criteria we may have. For example:

exampletable = {}

exampletable["Dog"] = "Puppy"
exampletable["Rabbit"] = "Kit"
exampletable["Chicken"] = "Egg"

array = { "Dog", "Chicken", "Rabbit" }
table.sort( array )

for i, key in ipairs( array ) do
	print( "exampletable[ " .. key .. " ] = " .. exampletable[ key ] )
end

This will then give us:

exampletable[ Chicken ] = Egg
exampletable[ Dog ] = Puppy
exampletable[ Rabbit ] = Kit

We can start diving a little deeper. If you want to sort by a given sort function, you basically need a function which takes a and b and returns a boolean which is either true if a>b or false if b<a. This is not extremely important yet and we will go over it, so do not worry too much about the details. The default sort function looks something like:

exampletable = {}

exampletable["Dog"] = "Puppy"
exampletable["Rabbit"] = "Kit"
exampletable["Chicken"] = "Egg"

array = { "Dog", "Chicken", "Rabbit" }
table.sort( array, function (a, b) return a < b end ) --this is the default

for i, key in ipairs( array ) do
	print( "exampletable[ " .. key .. " ] = " .. exampletable[ key ] )
end

Which returns:

exampletable[ Chicken ] = Egg
exampletable[ Dog ] = Puppy
exampletable[ Rabbit ] = Kit

As you can see, you can pass a function inline. Functions are first-class in Lua. You can pass them and treat them as variables for all intents and purposes. So, you could just have easily have passed the function as follows:

exampletable = {}

exampletable["Dog"] = "Puppy"
exampletable["Rabbit"] = "Kit"
exampletable["Chicken"] = "Egg"

array = { "Dog", "Chicken", "Rabbit" }
f =  function (a, b) return a < b end
table.sort( array, f ) --this is the default

for i, key in ipairs( array ) do
	print( "exampletable[ " .. key .. " ] = " .. exampletable[ key ] )
end

Which returns:

exampletable[ Chicken ] = Egg
exampletable[ Dog ] = Puppy
exampletable[ Rabbit ] = Kit

We will get into function declaration and many other fun features as we square away the basics of the language. The next feature we can take advantage of is the fact that all variables are pretty much equal in terms of functionality, so we can actually assign tables into tables, though how we do it is a little weird for certain cases. This is extremely useful for propping up certain data structures in a way which takes some heavy lifting in other languages. A quick example of this is as follows:

exampletable = {}

exampletable[ "Dog" ] = { ["head"] = 1, ["body"] = 2, ["skin"] = 3 }
exampletable[ "Rabbit" ] = { ["head"] = 2, ["body"] = 3, ["skin"] = 1 }
exampletable[ "Chicken" ] = { ["head"] = 3, ["body"] = 1, ["skin"] = 2 }

We can’t just print our values like we could previously. We need to get a little more creative and do something like follows:

exampletable = {}

exampletable[ "Dog" ] = { ["head"] = 1, ["body"] = 2, ["skin"] = 3 }
exampletable[ "Rabbit" ] = { ["head"] = 2, ["body"] = 3, ["skin"] = 1 }
exampletable[ "Chicken" ] = { ["head"] = 3, ["body"] = 1, ["skin"] = 2 }

for animal, features in pairs( exampletable ) do
	print( animal .. ":" )
	for key, value in pairs( features ) do
		print( key .. " = " .. value )
	end
end

Which results in:

Chicken:
skin = 2
body = 1
head = 3
Dog:
skin = 3
body = 2
head = 1
Rabbit:
skin = 1
body = 3
head = 2

Basically, when you work with structures like this, you need to nest for loops. This is something we covered previously specifically for these types of situations. We can get even more creative with some of our data manipulation and sort by a specific value, but this is not something extremely intuitive. It would look like follows for our previous examples:

exampletable = {}

exampletable[ "Dog" ] = { ["head"] = 1, ["body"] = 2, ["skin"] = 3 }
exampletable[ "Rabbit" ] = { ["head"] = 2, ["body"] = 3, ["skin"] = 1 }
exampletable[ "Chicken" ] = { ["head"] = 3, ["body"] = 1, ["skin"] = 2 }

array = {}

for animal, features in pairs( exampletable ) do
	table.insert( array, animal )
end

f = function (a, b) return exampletable[ a ][ "head" ] < exampletable[ b ][ "head" ] end
table.sort( array, f )

for i, animal in ipairs( array ) do
	print( animal .. ":" )
	for key, value in pairs( exampletable[ animal ] ) do
		print( key .. " = " .. value )
	end
end

This prints out:

Dog:
skin = 3
body = 2
head = 1
Rabbit:
skin = 1
body = 3
head = 2
Chicken:
skin = 2
body = 1
head = 3

Don’t worry too much about the function we set up to sort. This will be covered in more detail when we cover functions and how to work with them. Right now, we just want to see kind of what is possible so we can know what to do with the language overall and to establish patterns. One thing which may have piqued your curiosity is “What happens if we change the ipairs to a pairs statement for the array?” For all intents and purposes, an array is a hash table is a table. These constructs are all similar and things like ipairs are effectively syntactic sugar as Lua also stores these arrays in tables too so you really won’t notice much a difference unless you start trying to store hash table values in the table. This technically works, but isn’t typically ideal.

Some Dude: