I have a system that has been up-and-running for almost 3 years now. Previously (and currently) the front-end was using NHibernate for all of the database access.
Right now, I'm in the process of converting over to Dapper and utilizing stored procedures for EVERYTHING (reads, writes, etc).
Now that NHibernate is gone, I don't believe the current structure of tablename, tablenameid (clustered primary key) is optimal anymore for every table. Lots of my tables will NEVER be accessed via the primary key, UNLESS it's for a delete (even that can be avoided).
So, here are a couple examples of table structures, and which fields I believe should be the clustered index.
Event table ------------------------------------------ EventId int primary key clustered, Season int not null, EventDate datetime not null, EventName varchar(100) not null, EventType varchar(10) not null,-- (soon to be it's own table, but not right now...there are only 2 possible values, Tournament or Dual) SchoolId int not null, OpponentId int NULL, etc...
So currently, I have the clustered index on the primary key. This is one table that is accessed 99% of the time by
SchoolId. (Very rare that the
EventId is used, when it is, it's for a delete)
Here's where the trickiness comes in. If the
EventType is 'Dual', then uniqueness is by
OpponentId. If the
EventType is 'Tournament', Then uniqueness is on
OpponentId will be null).
Under this architecture, I don't believe I can/should have a unique key, which isn't really my issue...
Am I safe to assume that the clustered index for this table should be on the
WrestlerRanking table ----------------------------- WrestlerRankingId int primary key clustered, Season int not null, Week int not null, IsCurrent bit not null, WrestlerId int not null, etc...
In this scenario, similar to the one above, 99% of the this table is accessed by
IsCurrent. Uniqueness can be set by
Would/should the clustered index for this table be on
WrestlerId, even though the majority of joins to this table will include the
IsCurrent column? This is definitely not unique as there could be around 20 records for the Season and WrestlerId combination (20 different weeks of rankings).
Have you tried to run an execution plan on the series of queries/statements that will be running against the table? The execution plan (especially in the later versions) is pretty good at suggesting indexes. If you have and it is not suggesting any, can you post the execution plans for review?
is there a particular reason you need it to be clustered? Obviously it will perform better, but I would try non-clustered first to see if it gets you the desired performance.
Here's where the trickiness comes in. If the EventType is 'Dual', then uniqueness >is by Season, EventDate, SchoolId, OpponentId. If the EventType is 'Tournament', >Then uniqueness is on Season, EventDate, SchoolId, EventName (OpponentId will be >null).
I'd look at what's common between these two queries to determine what columns to use in the Clustered Index. Remember, the Clustered index is effectively how the data is arranged on the disk, so it should have a structure similar to the hierarchy that queries against the table will reference. If every query, or the most critical ones, all look at Season, SchoolId and EventDate regardless of EventType, then I'd use those in the Clustered index. Additional Non-clustered covering indexes can be created for specific queries (around EventType or OpponentId, for example) which INCLUDE the columns returned to prevent key lookups.
Clustered indexes do not need to be unique, they need to specify how the data should be arranged so that queries can quickly seek to the desired location. The primary key should be unique, but it doesn't have to be a clustered index. If it's just a reference for a row value that doesn't have any useful information that queries will use (i.e. you never join on it), then it should not be part of the clustered index.
For the second example, I would probably use Season, Week, and WrestlerId if those are most queried. The order should be in the logical heirarchy that the data is queried in...which brings me to my next point:
All this advice is contingent on the execution plans of your queries.
Always, ALWAYS, use actual execution information to determine what the best indexes for your queries are. Got some table scan issues? Implement a clustered index. Got a Clustered index scan slowing you down? Look into implementing a non-clustered covering index for that query based on the columns it's looking up (often suggested by the DB engine itself). Got key lookup? Try using an INCLUDES on the affected index with the looked-up columns. Indexing hypothetically, before you're super familiar with the DB engine, often leads to wasted time and rework. I'm not saying to shut your brain off and just blindly follow the query plan's suggested indexes, but instead use the query plan to build your indexing strategy rather than trying to preempt it, or second guess it.