Office of Secret Intelligence

Who taught you to be a spy, fucking Gallagher!?!

Quick and Dirty How To - Trees in SQL + Postgres + Rails 4

Intro

I'm going to give a quick rundown on how I implemented threaded comments in this app.

I'm using Postgres 9.4 (recursive queries have been available since 8.4), and Rails 4.

 

The Query

 

I wrote a little gem called Treeify, and (right now) it just gives us a little wrapper around some recursive SQL queries.  Here's the main method we are concerned with:

def tree_sql(instance)
"WITH RECURSIVE cte (id, path) AS (
SELECT id,
array[id] AS path
FROM #{table_name}
WHERE id = #{instance.id}
UNION ALL
SELECT #{table_name}.id,
cte.path || #{table_name}.id
FROM #{table_name}
JOIN cte ON #{table_name}.parent_id = cte.id
)"
end

 

This generates some SQL that ends up looking like this:

SELECT "posts".* FROM "posts" WHERE (posts.id IN (WITH RECURSIVE cte (id, path) AS (
SELECT id,
array[id] AS path
FROM posts
WHERE id = 7
UNION ALL
SELECT posts.id,
cte.path || posts.id
FROM posts
JOIN cte ON posts.parent_id = cte.id
)
SELECT id FROM cte
ORDER BY path)) ORDER BY posts.id

This does alright performance-wise, although I'd much rather not have the "IN" portion there and have it do a JOIN or something instead, as I believe that would be faster, but I digress.

So, moving on, we have a method called "descendents" which basically grabs all the desecendents for a given post:

def descendents
self_and_descendents - [self]
end

self_and_descendents simply grabs the whole tree, descendents just removes the root of the tree.  This gives us our tree of descendents, which ends up looking something like this (after a little bit of serialization - we'll get to that):

[{"id"=>20,
"title"=>"RE: testing",
"body"=>"<p>asfsafasd</p>",
"parent_id"=>7,
"user_id"=>1,
"created_at"=>Thu, 02 Oct 2014 20:04:45 UTC +00:00,
"updated_at"=>Thu, 02 Oct 2014 20:04:45 UTC +00:00,
"category_id"=>1,
"tsv"=>"'asfsafasd':3 're':1 'test':2",
"slug"=>"re-testing"},
{"id"=>21,
"title"=>"RE: testing",
"body"=>"<p>poop</p>",
"parent_id"=>7,
"user_id"=>1,
"created_at"=>Fri, 03 Oct 2014 02:01:17 UTC +00:00,
"updated_at"=>Fri, 03 Oct 2014 02:01:17 UTC +00:00,
"category_id"=>1,
"tsv"=>"'poop':3 're':1 'test':2",
"slug"=>"re-testing-4d35d96b-1c8b-4749-bf4b-052af7baf3cf"},
{"id"=>22,
"title"=>"RE: RE: testing",
"body"=>"<p>poop fart</p>",
"parent_id"=>21,
"user_id"=>1,
"created_at"=>Fri, 03 Oct 2014 02:01:28 UTC +00:00,
"updated_at"=>Fri, 03 Oct 2014 02:01:28 UTC +00:00,
"category_id"=>1,
"tsv"=>"'fart':5 'poop':4 're':1,2 'test':3",
"slug"=>"re-re-testing"},
{"id"=>23,
"title"=>"RE: RE: RE: testing",
"body"=>"<p>poop and fart</p>",
"parent_id"=>22,
"user_id"=>1,
"created_at"=>Fri, 03 Oct 2014 02:01:40 UTC +00:00,
"updated_at"=>Fri, 03 Oct 2014 02:01:40 UTC +00:00,
"category_id"=>1,
"tsv"=>"'fart':7 'poop':5 're':1,2,3 'test':4",
"slug"=>"re-re-re-testing"}]

Cool!  Our whole tree in one query.

But, it's not a tree, it's just a hash.  We need a tree or it will look really weird when we display it.  Let's fix that.

Let's create a method in our model called "build_tree".  We can pass it our results from our descendents method, which I do like so:

 def reply_tree
# give build_tree an array of hashes with
# the AR objects serialized into a hash
build_tree(descendents.to_a.map(&:serializable_hash))
end 

This just turns our descendents data into a serializable hash, which could be turned into JSON, or mangled more easily, like so:

 

def build_tree(data)
# turn our AoH into a hash where we've mapped the ID column
# to the rest of the hash + a comments array for nested comments
nested_hash = Hash[data.map{|e| [e['id'], e.merge('comments' => [])]}]
# if we have a parent ID, grab all the comments
# associated with that parent and push them into the comments array
nested_hash.each do |id, item|
nested_hash[id]['name'] = item['user_id'] ? User.find(item['user_id']).name : "Anonymous"
parent = nested_hash[item['parent_id']]
parent['comments'] << item if parent
end
# return the values of our nested hash, ie our actual comment hash data
# reject any descendents whose parent ID already exists in the main hash so we don't
# get orphaned descendents listed as their own comment
nested_hash.reject{|id, item|
nested_hash.has_key? item['parent_id']
}.values
end

Let's walk through this a little bit.

First, we want to turn our array of hashes into a nested hash, since we are dealing with tree data.

nested_hash = Hash[data.map{|e| [e['id'], e.merge('comments' => [])]}]

This casts the data variable (our array of hashes) as a hash, and maps each id to a the original hash (the comment data itself), and merges in a new key called "comments" that's assigned to an empty array.  This sets us up for our nested comments.

At this point, our data structure looks like this: 

{20=>
{"id"=>20,
"title"=>"RE: testing",
"body"=>"<p>asfsafasd</p>",
"parent_id"=>7,
"user_id"=>1,
"created_at"=>Thu, 02 Oct 2014 20:04:45 UTC +00:00,
"updated_at"=>Thu, 02 Oct 2014 20:04:45 UTC +00:00,
"category_id"=>1,
"tsv"=>"'asfsafasd':3 're':1 'test':2",
"slug"=>"re-testing",
"comments"=>[]},
21=>
{"id"=>21,
"title"=>"RE: testing",
"body"=>"<p>poop</p>",
"parent_id"=>7,
"user_id"=>1,
"created_at"=>Fri, 03 Oct 2014 02:01:17 UTC +00:00,
"updated_at"=>Fri, 03 Oct 2014 02:01:17 UTC +00:00,
"category_id"=>1,
"tsv"=>"'poop':3 're':1 'test':2",
"slug"=>"re-testing-4d35d96b-1c8b-4749-bf4b-052af7baf3cf",
"comments"=>[]},
22=>
{"id"=>22,
"title"=>"RE: RE: testing",
"body"=>"<p>poop fart</p>",
"parent_id"=>21,
"user_id"=>1,
"created_at"=>Fri, 03 Oct 2014 02:01:28 UTC +00:00,
"updated_at"=>Fri, 03 Oct 2014 02:01:28 UTC +00:00,
"category_id"=>1,
"tsv"=>"'fart':5 'poop':4 're':1,2 'test':3",
"slug"=>"re-re-testing",
"comments"=>[]},
23=>
{"id"=>23,
"title"=>"RE: RE: RE: testing",
"body"=>"<p>poop and fart</p>",
"parent_id"=>22,
"user_id"=>1,
"created_at"=>Fri, 03 Oct 2014 02:01:40 UTC +00:00,
"updated_at"=>Fri, 03 Oct 2014 02:01:40 UTC +00:00,
"category_id"=>1,
"tsv"=>"'fart':7 'poop':5 're':1,2,3 'test':4",
"slug"=>"re-re-re-testing",
"comments"=>[]}}

As you can see like I mentioned earlier, we have a hash with each comment's ID as the key and the value is the actual comment data.

Next step, we want to load up the sub-comments.

 nested_hash.each do |id, item|
nested_hash[id]['name'] = item['user_id'] ? User.find(item['user_id']).name : "Anonymous"
parent = nested_hash[item['parent_id']]
parent['comments'] << item if parent
end

This basically traverses the current hash and checks to see if the current node has a parent ID that matches an ID in the hash, and pushes that data into the 'comments' array.

This is what it ends up looking like:

{20=>
{"id"=>20,
"title"=>"RE: testing",
"body"=>"<p>asfsafasd</p>",
"parent_id"=>7,
"user_id"=>1,
"created_at"=>Thu, 02 Oct 2014 20:04:45 UTC +00:00,
"updated_at"=>Thu, 02 Oct 2014 20:04:45 UTC +00:00,
"category_id"=>1,
"tsv"=>"'asfsafasd':3 're':1 'test':2",
"slug"=>"re-testing",
"comments"=>[],
"name"=>"Devin"},
21=>
{"id"=>21,
"title"=>"RE: testing",
"body"=>"<p>poop</p>",
"parent_id"=>7,
"user_id"=>1,
"created_at"=>Fri, 03 Oct 2014 02:01:17 UTC +00:00,
"updated_at"=>Fri, 03 Oct 2014 02:01:17 UTC +00:00,
"category_id"=>1,
"tsv"=>"'poop':3 're':1 'test':2",
"slug"=>"re-testing-4d35d96b-1c8b-4749-bf4b-052af7baf3cf",
"comments"=>
[{"id"=>22,
"title"=>"RE: RE: testing",
"body"=>"<p>poop fart</p>",
"parent_id"=>21,
"user_id"=>1,
"created_at"=>Fri, 03 Oct 2014 02:01:28 UTC +00:00,
"updated_at"=>Fri, 03 Oct 2014 02:01:28 UTC +00:00,
"category_id"=>1,
"tsv"=>"'fart':5 'poop':4 're':1,2 'test':3",
"slug"=>"re-re-testing",
"comments"=>
[{"id"=>23,
"title"=>"RE: RE: RE: testing",
"body"=>"<p>poop and fart</p>",
"parent_id"=>22,
"user_id"=>1,
"created_at"=>Fri, 03 Oct 2014 02:01:40 UTC +00:00,
"updated_at"=>Fri, 03 Oct 2014 02:01:40 UTC +00:00,
"category_id"=>1,
"tsv"=>"'fart':7 'poop':5 're':1,2,3 'test':4",
"slug"=>"re-re-re-testing",
"comments"=>[],
"name"=>"Devin"}],
"name"=>"Devin"}],
"name"=>"Devin"},
22=>
{"id"=>22,
"title"=>"RE: RE: testing",
"body"=>"<p>poop fart</p>",
"parent_id"=>21,
"user_id"=>1,
"created_at"=>Fri, 03 Oct 2014 02:01:28 UTC +00:00,
"updated_at"=>Fri, 03 Oct 2014 02:01:28 UTC +00:00,
"category_id"=>1,
"tsv"=>"'fart':5 'poop':4 're':1,2 'test':3",
"slug"=>"re-re-testing",
"comments"=>
[{"id"=>23,
"title"=>"RE: RE: RE: testing",
"body"=>"<p>poop and fart</p>",
"parent_id"=>22,
"user_id"=>1,
"created_at"=>Fri, 03 Oct 2014 02:01:40 UTC +00:00,
"updated_at"=>Fri, 03 Oct 2014 02:01:40 UTC +00:00,
"category_id"=>1,
"tsv"=>"'fart':7 'poop':5 're':1,2,3 'test':4",
"slug"=>"re-re-re-testing",
"comments"=>[],
"name"=>"Devin"}],
"name"=>"Devin"},
23=>
{"id"=>23,
"title"=>"RE: RE: RE: testing",
"body"=>"<p>poop and fart</p>",
"parent_id"=>22,
"user_id"=>1,
"created_at"=>Fri, 03 Oct 2014 02:01:40 UTC +00:00,
"updated_at"=>Fri, 03 Oct 2014 02:01:40 UTC +00:00,
"category_id"=>1,
"tsv"=>"'fart':7 'poop':5 're':1,2,3 'test':4",
"slug"=>"re-re-re-testing",
"comments"=>[],
"name"=>"Devin"}}

 

We now have populated sub-comments.

The final step is to make sure sub-comments are only displayed in their respective array.

nested_hash.reject{|id, item| 
nested_hash.has_key? item['parent_id']
}.values

Iterate over the hash, rejecting anything that has a parent_id that exists in the top-most level of the hash, and return the values of the "good" keys.

Giving us:

[{"id"=>20,
"title"=>"RE: testing",
"body"=>"<p>asfsafasd</p>",
"parent_id"=>7,
"user_id"=>1,
"created_at"=>Thu, 02 Oct 2014 20:04:45 UTC +00:00,
"updated_at"=>Thu, 02 Oct 2014 20:04:45 UTC +00:00,
"category_id"=>1,
"tsv"=>"'asfsafasd':3 're':1 'test':2",
"slug"=>"re-testing",
"comments"=>[],
"name"=>"Devin"},
{"id"=>21,
"title"=>"RE: testing",
"body"=>"<p>poop</p>",
"parent_id"=>7,
"user_id"=>1,
"created_at"=>Fri, 03 Oct 2014 02:01:17 UTC +00:00,
"updated_at"=>Fri, 03 Oct 2014 02:01:17 UTC +00:00,
"category_id"=>1,
"tsv"=>"'poop':3 're':1 'test':2",
"slug"=>"re-testing-4d35d96b-1c8b-4749-bf4b-052af7baf3cf",
"comments"=>
[{"id"=>22,
"title"=>"RE: RE: testing",
"body"=>"<p>poop fart</p>",
"parent_id"=>21,
"user_id"=>1,
"created_at"=>Fri, 03 Oct 2014 02:01:28 UTC +00:00,
"updated_at"=>Fri, 03 Oct 2014 02:01:28 UTC +00:00,
"category_id"=>1,
"tsv"=>"'fart':5 'poop':4 're':1,2 'test':3",
"slug"=>"re-re-testing",
"comments"=>
[{"id"=>23,
"title"=>"RE: RE: RE: testing",
"body"=>"<p>poop and fart</p>",
"parent_id"=>22,
"user_id"=>1,
"created_at"=>Fri, 03 Oct 2014 02:01:40 UTC +00:00,
"updated_at"=>Fri, 03 Oct 2014 02:01:40 UTC +00:00,
"category_id"=>1,
"tsv"=>"'fart':7 'poop':5 're':1,2,3 'test':4",
"slug"=>"re-re-re-testing",
"comments"=>[],
"name"=>"Devin"}],
"name"=>"Devin"}],
"name"=>"Devin"}]

...a nice tree-like structure we can iterate over in whatever we choose for a view.  Disregard the extra "name"=>".." bits, I'm still working out how to best retrieve author data and am currently using a hacky and ugly method to do so.

 

That's all for now.  Hopefully this sheds some light on this sort of thing.  Some improvements right off the bat would be to put the nested tree construction in the treeify gem, and to make the SQL less clunky so we can mold it a little more, and get associated data easier (like author info). 


Replies Are Enabled

So I've managed to get replies working to my liking.  I'll write more about the implementation later.


Post Every Day

I've been trying to post every day, but haven't been superbly successful.  I've been fairly busy, with this, and a bunch of migration tasks at work, and not sitting down and drafting up a useful blog post.

 

My whole goal for this blog is really to be a repository for things I've learned, notes I've taken, the quick and dirty to remember how to do something.  That, and post as much Venture Bros,Big Lebowski, Mega Man, Battletech and Halo content as humanly possible.  SOMETHING has to drive traffic here for revenue.

 

I'll hopefully have a more content-rich post up later today.  Right now, I'm working on finishing up reply trees, but am stuck on displaying a subtree properly.

 

MORE SOON.


PHAT Workout "Regiment"

I've been doing Layne Norton's PHAT training system for a few months now.  I've seen some pretty great results in terms of strength increases on "big lifts" (squat, bench, rows and cleans to an extent), and I've noticed a reasonable amount of change in things I've had trouble with in the past, including my arms.  I've done Strong Lifts, various circuit training programs, P90X, and serious kung fu training, so I have a good foundation for fitness, but I've still seen good strength increases, progressively at that, with this program.

 

Here's what my week usually looks like:

 

Sunday Monday (upper body power day) Tuesday (lower body power day) Wednesday Thursday (shoulder/arm hypertrophy) Friday (lower body hypertrophy) Saturday (chest and arms hypertrophy)
rest 3x5 bent over rows 3x5 squats rest/jog 6x3 power cleans 6x3 squats punching bag work
  2x8 perfect form pull ups     6x3 bent over barbell rows jog skull crushers
  3x5 barbell bench press     3x8 barbell overhead press    
  2x8 overhead dumb bell shoulder press     3x12 barbell shrugs    
  2x10 skull crushers          

 

This isn't 100% true to the program, but it gets me 2 days of squats (3 leg days, if you include cleans), and some good back work.

 

I'd like to modify this in the future so I'm doing more running and I'm doing more back work.  I'd like to do a pull up program, like the Armstrong pull up program, and be more diligent with tricep work.  I'd ALSO like to even out my strength more.  My left arm has always been stronger, but even more so since my shoulder surgery in 2006.

 

I'll post some stats with graphs and what not at some point.  All of my progress is being tracked manually as it's easiest to go over to a book and journal what I've done right after I complete a set.


Bitching About Bootstrap 3 Dropdown Menus

I decided to implement dropdown menus for a few items, including the "posts" link in the navbar (for logged in users).  It works great, but I'm frustrated that there isn't really an option that allows you to make the parent item clickable, so I can't click the item that actually says "posts", and say, go to the posts listing page.  I have to hover over (which was sort of a pain in the ass to get working properly), and click on a link that will take me to the listing page.  That's too much work.

Bootstrap is pretty great, I sure wouldn't want to put this together manually in any capacity, but some of the assumptions are irritating.