R* Tree Implementation for C++

So a few weeks ago I finished preliminary work on an R* Tree implementation in C++. My goal for this implementation is as follows:

  • Simple API – Most existing implementations I’ve seen are far from simple
  • Generic enough to use with most types, with low overhead
  • N dimensions for bounding boxes
  • Easy to integrate and use for Roadnav
  • Take advantage of modern C++ features to get decent performance

This is a header-only implementation of an R tree with an R* index, and makes heavy use of templates, STL, and STL-style functors; but it should work in any relatively modern C++ compiler. To do searching for nodes, this implementation uses the Visitor pattern via functors — there are examples and documentation in RStarVisitor.h.

If you use this, *please* let me know, I’m very interested to see what bugs may come up, and possible performance problems. Unfortunately, I haven’t gotten around to writing any comprehensive tests for it (but given the tests I’ve run so far, I’m reasonably sure the implementation is mostly correct), but I expect that once I get it integrated into Roadnav that this will follow shortly after. Also, integration of the R* tree into Roadnav will make a lot of potential errors pop out. Look for this in the near future.

Additionally, I am thinking about some of the following ideas too:

  • Adding the ability to work with extra-large datasets by saving/loading to/from disk (after all, that is the point of R trees…)
  • Adding a bulk insert function — there is pseudocode already for this, but I haven’t actually written an implementation for this yet. Also, a more efficient bulk delete function
  • Trying to implement it as a Boost intrusive container. I’ve been playing with some Boost stuff lately at work, and theres a lot of neat things there.

I’ll also use this opportunity to put in a plug for Roadnav — if you’re looking to work on an open source project, theres a bunch of (mostly) interesting stuff that I’m working on/planning for Roadnav. What I’m working on right now is totally removing US-specific stuff from Roadnav, and moving to an OpenStreetMap-only implementation. Theres a number of features that are getting thrown away, but I think that the dynamic community that OSM opens up to Roadnav will be well worth it.

Note: I’ve updated this quite a bit (though, remove functionality has been removed and it no longer uses the visitor pattern for data access), and its currently a part of libsdbx. Libsdbx can be found in Roadnav SVN at https://roadnav.svn.sourceforge.net/svnroot/roadnav/libsdbx/trunk/. I’ll write about this soon.. but its pretty neat. It supports R* and modified B+ trees, in memory or on disk. Check it out.

Note II: I’ve pushed the source code to github at https://github.com/virtuald/r-star-tree

Download the sourcecode here

22 Responses to “R* Tree Implementation for C++”

  1. Mathieu Leocmach says:

    Hello,

    I’m using your code and it’s pretty usefull for my purpose (N particles spatial statistics, local structure detection, etc.), even if some documentation would be great.

    For example, how would you get the minimum bounding box including all a tree ?

    I thought about making a Visitor structure that would have a bounding box as attribute. Each time the Visitor visits a leaf not totally enclosed in the bounding box, the bonding box is stretched to enclose the leaf.

    But is their any other way ?

  2. dustin says:

    Awesome to hear that. Sorry about the lack of documentation, still very beta. I’ve tried to document a little bit. 🙂 I still haven’t done any heavy duty testing yet — I’m working to integrate it into Roadnav…

    Each intermediate node of the tree has a bounding box associated with it. So if you looked at the bounding box of the m_root member of the tree, that will give you the bounding box for the entire tree.

    Let me know if you have questions or find bugs!

  3. Arnold Eibel says:

    Hi Dustin,

    this is was I was looking for for a long time: no lib, template based –> easy to integrate into my project, thanks.

    Anyway here are some questions/comments:
    1. What are the “best” values for “min_child_items” and “max_child_items”. Because of my limited understanding of the R* algorithm I’m not sure about the meaning of these template parameters and what values are the best for me.
    2. When calling RemoveItem on an empty tree I got an access violation, I changed
    line 62 and 83 of rstarvisitor.h from
    return m_bound.overlaps(node->bound);
    to
    return (node && m_bound.overlaps(node->bound));
    line 67 and 88 from
    return m_bound.overlaps(leaf->bound);
    to
    return (leaf && m_bound.overlaps(leaf->bound));
    3. Compiling with MSVC I got this error: C2166: l-value specifies const object. To get around this, I changed line 147 of rstarvisitor from
    bool ContinueVisiting;
    to
    mutable bool ContinueVisiting;

    Arnold

  4. Arnold Eibel says:

    sorry I forgot this:

    changed line 99 and 100 of rstarvisitor.h form
    bool operator()(const Node * const node) const { return true; }
    bool operator()(const Leaf * const leaf) const { return true; }
    to
    bool operator()(const Node * const node) const { return (node != NULL); }
    bool operator()(const Leaf * const leaf) const { return (leaf != NULL); }

  5. crg says:

    In your tests, in main.cpp line 31, there is this code:

    typedef RStarTree RTree;

    this violates your assertions, since minchild = 2, maxchild = 3, but assert requires that
    minchild <= maxchild /2. Changing this code to

    typedef RStarTree RTree;

    fixes it.

  6. crg says:

    The blog dropped the template parameters. Some problem with the less than/greater than symbols. Oh well. A preview button would be nice 🙂

    in line 31, change the “3” to “4”. This way, minchild <= maxchild/2 is correct (resolves to 2 <= 2).

  7. Anton says:

    A good candidate to be a part of boost. I would propose to make bounding box templated (now it is hard typed by int) and use namespaces. Design looks pretty good to me. Do you plan to support this code?

  8. dustin says:

    The bounding box in the R* Tree in libsdbx is templated, actually. namespaces probably aren’t a bad idea, I’ve never really used them in the past but as I’ve been using boost more and more I’ve been getting used to them. 🙂

    libsdbx currently lives in roadnav SVN: https://roadnav.svn.sourceforge.net/svnroot/roadnav/libsdbx/trunk/ . The R* Tree there has a slightly different design, and supports a disk or memory based storage mechanism (and the disk mechanism is backed by an LRU cache). libsdbx also has a B+ tree in it that shares the same design structure, since an R* tree and a B+ tree are quite similar. Unfortunately, I removed the ‘remove node’ functionality from that version, and replaced the visitor functions with an iterator.

    I plan to support this code (well, the code in libsdbx), and hopefully improve it some more once I start contributing to Roadnav again (I’ve taken a small hiatus). For example, someone wrote a more comprehensive unit test for the R* Tree but I haven’t had the time to add that to the libsdbx distribution.

    Of course, a problem with a very small part of libsdbx is that it contains some wxwidgets specific code. But, its a rather small portion if I remember correctly (just the file operations).

    Dustin

  9. Florent says:

    Hi Dustin

    I’m trying to insert some data containing a std::vector in the R-tree. It seems I can’t do it since the sdbx library’s serialization concept requires serialized types to have a static type size, through the use of the record_size enum. Is there a workaround?

    Thanks
    Florent

  10. dustin says:

    For my implementation, I think I decided to not deal with things that were of a dynamic size. The way you would get around this is insert pointers into the tree — or in a file-based implementation, insert an offset or index number that allows you to find the data in another file, and just use the R* Tree as an index to that file. Of course, then you have to do your own serialization…

  11. Jay says:

    Thanks for posting. I’m interested in a geospatial database application and this should be very helpful

  12. lieutenant7 says:

    Hello,
    I downloaded your code for skyline query algorithm on sliding windows. it really helped me a lot. I found an interesting thing that your program runs much faster in Ubuntu OS(about 5 seconds) than in Windows OS(about 1 min). I guess it is because g++ compiler is more optimize than vs2008’s compiler.
    I have some question about skyline query processing using R* tree, I wonder if you have researched the skyline querys. it would be great help if we can talk about this.
    thank you!

  13. Hey, I was reading through your implementation of the R* Tree and I noticed that you’re calculating overlap enlargement differently from how the paper defines it. It’s computationally simpler in your implementation, but I can’t find any obvious reasons why the two would result in the same or similar quality trees.

    Any feedback on this would be nice. I asked a question over at StackOverflow which goes into a few more details about this discrepancy: http://stackoverflow.com/questions/14346549/r-tree-overlap-computation

  14. David McKee says:

    Just to let you know that your implementation here has been grabed for use in LArSoft (a particle physics analysis code being developed for new class of detectors called liquid argon time projection chambers). I did look at libsdbx version because I was not terribly comfortable with the visitor implementation but I found the newer code difficult to extract from it’s context, and the whole library was both more than we needed and not easy to compile in the context of our build system. Several other R* tree from around the internet also failed to meet all of our needs, so you code wins.

    We’ve switch to using floating point positions.

    We needed a reliable R* Tree for use in a DBSCAN implementation (for which someone had initially provided a O(n) neighbors search) which was running far too slowly, and the code has been serving us well for rather more than a year now.

  15. dustin says:

    It’s possible (probable?) there is a bug in my implementation. However, I haven’t really looked at the code in years, and don’t currently have the time to test it.

  16. preethi says:

    sir can you send me the r tree insertion code with a dataset

  17. ng says:

    hi, nice work !! How can I get the list of bounding rectangles for all the nodes at all the levels of the trees ? An example would be very helpful !!

  18. Kunal says:

    I was not able to find a useful library for n dimensional R tree ,basically I have 4D data set ie latitude,longitude,rating and a hash index so does this work in 4D case too ,and secondly I also want to store some data at each node so can you help me in that too ; how can I modify your code to enable that

  19. Muthu says:

    I am working on a datasets with 16, 32 dimensions. There isn’t any documentation saying where the code takes input dataset. Could you tell where I need to modify and give my input Dataset. And all your examples are with 2D dataset only. Does this work for nD datasets.

  20. Mahnaz says:

    i need to cource code R* tree with c# . I want to generate db by entityframework by using this code.
    who can help me????
    mahnazfathyan@gmail.com

  21. Mahnaz says:

    if any body knows how can I convert R+Tree to R*, it is useful for me, bcuz I implement R+Tree

  22. zafar says:

    Does the code also allows point insertion?? or just bounding box?

Leave a Reply