Ok, I have been meaning to run a few tests with actual data to see if Tableau join culling feature
- Whatever you can do in DB, do it in DB including performance optimizations. Tableau is great being able to overcome some of the DBA laziness (with features like Join culling) but it still won’t overcome everything
- You must know Join Culling doesn’t always work such as in scenarios like outer joins, etc. (if you need a list of those, there are other blogs that cover that)
- Now, also keep in mind that depending on how you build the Tableau model, you might get confused why it doesn’t always work even if it is all inner joins and even we have checked the “Assume Referential Integrity” dialog box
- You should also understand that the SQL generation is also affected by the the order in which you bring the tables into the Tableau model
So, regarding the test, here are specifics
Here are some of the tests that I just ran
- There are 4 scenarios using two large tables with identical data
- Both fact tables have 2 million records, exactly same data
- One fact table has DB keys, the other doesn’t
- There are 4 datasources in Tableau, both of these fact tables joined in Tableau; once with Assume RI checked and other without checked
- This makes a total of 4 scenarios, the dashboard below has 2 queries to each of the 4 scenarios
I ran the following dashboard that covers these scenarios across various tables:-
Here are results of performance recording. If you see and compare, you will see it’s better to have the keys in the DB.
- Compare the Orange with Red, this is where we are telling Tableau that there are no Keys in DB so Assume RI. The performance of these queries is worse than the case (Red) where DB keys are present.
- Also, adding Assume RI flag may add some overhead (comparing Blue to Red). In this case, both sources have DB Keys. Adding this option only adds overhead because of our logic.
- Now, if you compare Orange with Teal, this is where we are telling Tableau to overcome lack of Keys, you see that we improve the experience but not by a whole lot.
But, don’t think that Assume Referential Integrity check-box will overcome the “lack of optimizations” you should do on the database. In other words, DBA laziness
I will also write more follow on blogs on this topic of performance