Binary Relations
Three new associative container classes
|
This is a small C++ library (a single header file) that adds three new associative container types that help you organize and query data in memory.
std::vector
and std::unordered_map
.There is a data structure I use that has served me very well for many years in the Insomniac Core tools group. I don’t hear other programmers refer to it, so I want to introduce it to you. I call it a binary relation. It’s a bidirectional associative container.
The term “binary relation” and the concept are borrowed from mathematics, specifically set theory. But this is not a library for mathematicians. It is for programmers needing to organize their data.
Binary relations are the association between elements of two sets. It can be seen as a set consisting of related pairs (x,y) where x is the input or the domain and where y is the output or the range. The notation x R y means that x is related to y by R, where R can be the relation that links x and y.
In this library x and y are called left and right, indicating their position in the diagram.
There are four kinds of binary relations. They are: one-to-one, one-to-many, many-to-one, and many-to-many. But because one-to-many and many-to-one are interchangeable if you swap the left and the right side, we ignore many-to-one. You won’t need it, as will become clear later.
The real world examples come from my experience as a game tools programmer, specifically as a programmer of the Insomniac world editor. So I will use that as an example.
So in the world editor then, every object in the world is represented by a handle. They need to be organized at the global level. For example, objects can be connected in a parent-child relationship. A parent object can have any number of children. Each child has exactly one parent. This is a binary relation, a one-to-many. And each child may itself be a parent and have children. So we have a hierarchy tree.
Typically, this is kind of relationship is expressed in the object data itself. Every object that is a child contains a handle (or pointer) to a parent. And each parent contains an array of handles (or pointers) to its children.
This organization requires that when a child changes parent, you need to update both the m_Parent
of the child as well as the m_Children
list of the parent. This kind of connection is a one-to-many binary relation. With binary relations, the relationship between parents and children is stored outside of the object data structure, in its own container. All parent-to-child relationships are stored in one container that lives alongside the game objects.
With this data structure, you can look up the parent handle for any object, and get a list of handles of its children.
This is a key concept, and I want to emphasize it here: relations between objects (such as parent-child) are not stored in the objects, but in a separate worldwide relationship table.
Once you get the hang of storing relationships outside of the objects, you will find uses for it everywhere. For example, game objects can be member of multiple groups. That’s a many-to-many. Given the group’s handle, you can look up all its members. And when you have a game object, you can get a list of the groups it belongs to.
Perhaps a more surprising example is this. Say you have an object type to classify people, vehicles, and buildings. This, too, is a binary relation. In this case it’s a one-to-many. Given an object handle, you can look up what type it is. Given an object type, you can get a list of all objects of that type.
Here are some more use cases:
The library is located in the BinaryRelations directory. It consists of a single C++ header file. There are three class templates that you need to know of. The classes are: OneToMany
, ManyToMany
, and OneToOne
.
Each binary relation type is a template, with type arguments LeftType
and RightType
. Both types need to be small, hashable, and immutable. I recommend that you only use simple types, such as int
and enum
, and possibly std::string
.
This is the entire OneToMany
API. OneToOne
and ManyToMany
are near identical. Click here for the full documentation.
That’s all.
Output:
In the code example, note that the name of the OneToMany
has the form of “SingularToPlural”, like “ParentToChildren”. Similarly, ManyToMany names would be “PluralToPlural”, OneToOne would be “SingularToSingular”.
The efficiency for lookup such as FindLeft()
and FindRight()
is constant time, or O(1). All operations on a OneToOne
are also constant.
Things get more complicated with OneToMany
and ManyToMany
. They maintain sorted arrays. Insertion and erasure of elements in an array involves shifting everything between the point of insertion/erasure and the end of the array. This is O(n).
There is a new bulk insert/erase that will speed up insertions and erasures by bundling them up.
Performance measurements in Xcode on an iMac M1, release build.
The worst case is inserting random numbers on the right with the same left value. This test uses the optimized bulk insert. Results in milliseconds.
one-to-one | one-to-many | many-to-many | |
---|---|---|---|
10 | 0.001875 | 0.00225 | 0.003 |
100 | 0.008333 | 0.011 | 0.017833 |
1,000 | 0.078292 | 0.097208 | 0.130792 |
10,000 | 0.780334 | 2.27113 | 2.63937 |
100,000 | 13.2366 | 18.0005 | 18.4522 |
1,000,000 | 62.1115 | 199.653 | 268.983 |
This is the time it takes for a single lookup in tables of different sizes. Results microseconds.
one-to-one | one-to-many | many-to-many | |
---|---|---|---|
10 | 0.049725 | 0.04265 | 0.0369667 |
100 | 0.0334625 | 0.0351042 | 0.0344917 |
1,000 | 0.0370417 | 0.0376333 | 0.0379292 |
10,000 | 0.0473625 | 0.0477541 | 0.04305 |
100,000 | 0.0406292 | 0.0389792 | 0.0372666 |
1,000,000 | 0.0652959 | 0.0715042 | 0.0902375 |
std::unordered_map
is not the fastest hash map. I’m aware of faster ones, but all those I have found have a license that is more restrictive than the MIT license.
This section is for those who want to venture into the code.
I will show you, and talk you through the structure of the OneToMany
template class. The OneToOne
and ManyToMany
template classes are structured along the same lines.
I used the above diagram to write the code. The l2r_it
style labels refer to local variable names in the code. m_LeftToRight
and m_RightToLeft
are both hash tables. Each entry on the m_LeftToRight
table contains a std::pair
with a left value in the first
slot, and a pointer to a std::vector
in the second
. The vector
has one or more right
values in it, sorted by value. The m_RightToLeft
hash table contains pair
s with a right
value in the first
slot, and a left
value in the second
slot.
When findRight()
is called, the OneToMany
is returns the pointer to the vector
of right
values from the second
slot in the m_LeftToRight
hash table. When findLeft()
is called, it returns the left
value from the second
slot in the m_RightToLeft
hash table.