I was advised by a senior DBA that you should never use Partitioning, instead use schema to group data. I am talking on large databases. I am not convinced with this, please advice
An offline application deployed in 350+ locations was converted into an Online Application Single Database (10 Million rows).
They created schema's for 350+ locations and created tables inside these respective schemas.
When I was asked to look into the issues, I wondered why this setup and was because they were advised by Senior DBA to go like this (My Concerns were, How to perform search operations across all schema, how to manage any changes in table 350+ is huge, if it is a single table with partitioning things will become easy)
When I consulted the senior DBA regarding this and why he didnt propose Table Partitioning Instead of different schema, "He said partitioning is logical" and will not work, if you split it into schema the performance will be better because database operations will be limited to that schema.
These statements confused me and thus seeking your views and experiences
When to use many schemas?
In PostgreSQL databases of multi-tenant setup, it is not unusual to see a separate schema for every tenant. This serves mostly to separate data of the different tenants, as schemas give a very convenient way of setting security/access boundaries. This means, if I used such a setup, customer A would not have access to Customer B's data. If necessary, it is easy to set up overarching roles that can work across the different schemas (doing maintenance, producing reports, and so on).
In these scenarios, one might have data that is common across all tenants. I would store these in a common(ly accessible) schema, to prevent unnecessary duplications, thus making maintenance a bit easier.
Addressing one of your concerns, doing cross-schema queries is easy when you know which schemas to go for. If not, it might be tricky. However, maintenance in such cases might need some (self-built) infrastructure (as a script to roll out table changes on all tables named
your_table one by one) - but when it's already in place, the single tasks are not so complicated.
Do schemas help with performance?
On the other hand, having 350+ schemas for 10M rows is definitely not about performance. With 10M (not too wide) rows on a decent machine one would not expect serious performance issues in the first place. If they still appear, there are other means than partitioning that one should try first (meaning different indexing strategies, usually). A meaningful partitioning scheme for such data is very hard to come up with: 350+ partitions are just that much a pain in the ass as 350+ schemas. Not to mention that the per-partition data volume will be small (like 30k rows), and in certain cases planning the queries would take a hit - especially, when you try to have multiple tables partitioned the same way.
I believe the "Sr DBA" may be familiar with some other database and is trying to use his previous knowledge on Postgresql.
If you have data which is completely identical you are far better using partitioning than having it all in a single table for a large data set. Please note that by postgresql standards, and most DBs, a set of 10 million records is really not all that large.
There is added complexity on using partitioning so for a 10 million data set I would recommend to just put it all in a single table.
A few things to consider:
How wide are your rows? If you have narrow rows then for sure you can just put it all in a single table.
Is this OLAP or OLTP? If you are doing lots of small transactions and primarily accessing a few of them at a time using indexes (OLTP) then the size of the table, aside from maintenance, is not very important.
If you are doing OLAP/analytics and you need to do big reports depending on how often data is refreshed you could do materialized views or do summary tables.
More information on what this database is used for and it's usage patter could help us give better advice, but in general I don't see any scenario where having 350 schemas would be advisable. If anything it would be much harder to manage.