Multi-page report: Multiple Data Files

We have a large Database (91 GB) that is currently in one large data file. Now that we have muliple disk arrays I can split that up on I would like to have a couple data files. My question is, what is the best way to split this up? Should I keep one primary file group and just create another file, or should I create a file group for indexes and put those on that? This database is used for reporting only so it doesn't really have any writes being done on it.

Thanks much.do you have a lot aof medium size tables, or several very large ones?

The answer is always..."it depends"|||Unfortunately it is pretty much one big table. It is horribly designed.|||What's the DDL of the table?

How many physical arrarys do you have?

Let's say you have 3

Create 9 partitions for a partitioned view

put file 1,4,7 on array 1
put 2,5,8 on array 2
put 3,6,9 on array 3

Arrange the partitions so they are approximatley 1 GB each

The more files on to independant drives the better..and I think I would keep the indexes with the tables...

Anyone?|||Well, here is the DDL, it is a beast though.

CREATE TABLE [dbo].[CLARITY_TDL] (
[TDL_ID] [numeric](19, 0) NOT NULL ,
[DETAIL_TYPE] [int] NOT NULL ,
[TYPE] [int] NOT NULL ,
[POST_DATE] [datetime] NULL ,
[ORIG_POST_DATE] [datetime] NULL ,
[ORIG_SERVICE_DATE] [datetime] NULL ,
[PERIOD] [varchar] (6) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[TX_ID] [int] NULL ,
[TRAN_TYPE] [int] NULL ,
[ALLOWED_AMOUNT] [numeric](12, 2) NULL ,
[CHARGE_SLIP_NUMBER] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[TYPE_OF_SERVICE] [int] NULL ,
[MATCH_TRX_ID] [int] NULL ,
[MATCH_TX_TYPE] [int] NULL ,
[MATCH_PROC_ID] [int] NULL ,
[MATCH_PROV_ID] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[MATCH_LOC_ID] [int] NULL ,
[ACCOUNT_ID] [int] NULL ,
[PAT_ID] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[AMOUNT] [numeric](12, 2) NULL ,
[PATIENT_AMOUNT] [numeric](12, 2) NULL ,
[INSURANCE_AMOUNT] [numeric](12, 2) NULL ,
[RELATIVE_VALUE_UNIT] [numeric](12, 2) NULL ,
[CUR_CVG_ID] [int] NULL ,
[CUR_PLAN_ID] [int] NULL ,
[CUR_PAYOR_ID] [int] NULL ,
[CUR_FIN_CLASS] [int] NULL ,
[PERFORMING_PROV_ID] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[BILLING_PROVIDER_ID] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ORIGINAL_CVG_ID] [int] NULL ,
[ORIGINAL_PLAN_ID] [int] NULL ,
[ORIGINAL_PAYOR_ID] [int] NULL ,
[ORIGINAL_FIN_CLASS] [int] NULL ,
[PROC_ID] [int] NULL ,
[PROCEDURE_QUANTITY] [int] NULL ,
[CPT_CODE] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[MODIFIER_ONE] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[MODIFIER_TWO] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[MODIFIER_THREE] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[MODIFIER_FOUR] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DX_ONE_ID] [int] NULL ,
[DX_TWO_ID] [int] NULL ,
[DX_THREE_ID] [int] NULL ,
[DX_FOUR_ID] [int] NULL ,
[DX_FIVE_ID] [int] NULL ,
[DX_SIX_ID] [int] NULL ,
[SERV_AREA_ID] [int] NULL ,
[LOC_ID] [int] NULL ,
[DEPT_ID] [int] NULL ,
[POS_ID] [int] NULL ,
[CUSTOMER_ITEM_ONE] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[CUSTOMER_ITEM_TWO] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[CUSTOMER_ITEM_THREE] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[CUSTOMER_ITEM_FOUR] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[CUSTOMER_ITEM_FIVE] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[BILL_CLAIM_AMOUNT] [numeric](12, 2) NULL ,
[INVOICE_NUMBER] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ABF_FORM_ID] [int] NULL ,
[CLM_CLAIM_ID] [int] NULL ,
[BILL_HOLD_AMOUNT] [numeric](12, 2) NULL ,
[PAT_AGING_DAYS] [int] NULL ,
[INS_AGING_DAYS] [int] NULL ,
[ACTION_CVG_ID] [int] NULL ,
[ACTION_PLAN_ID] [int] NULL ,
[ACTION_PAYOR_ID] [int] NULL ,
[ACTION_FIN_CLASS] [int] NULL ,
[DEBIT_GL_NUM] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[CREDIT_GL_NUM] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[REASON_CODE_ID] [int] NULL ,
[USER_ID] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[COPAY_INDICATOR] [int] NULL ,
[TX_NUM] [int] NULL ,
[ORIG_PRICE] [numeric](12, 2) NULL ,
[PRICE_CONTRACT_ID] [int] NULL ,
[CONTRACT_DISCOUNT] [numeric](12, 2) NULL ,
[INT_PAT_ID] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ORIG_AMT] [numeric](12, 2) NULL ,
[PAT_TYPE_C] [int] NULL ,
[PRIM_CARE_PROV] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[REFERRAL_ID] [int] NULL ,
[REFERRAL_SOURCE_ID] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PRIM_LOCATION] [int] NULL ,
[RVU_WORK] [numeric](12, 2) NULL ,
[RVU_OVERHEAD] [numeric](12, 2) NULL ,
[RVU_MALPRACTICE] [numeric](12, 2) NULL ,
[POSTING_BATCH_NUM] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[VOID_USER_ID] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[R_NEW_CHG_TX_ID] [int] NULL ,
[R_ORIG_CHG_TX_ID] [int] NULL ,
[R_ORIG_POST_DATE] [datetime] NULL ,
[R_ORIG_AMOUNT] [numeric](12, 2) NULL ,
[R_ORIG_PAT_AMOUNT] [numeric](12, 2) NULL ,
[R_ORIG_FIN_CLASS] [int] NULL ,
[R_ORIG_PAYOR_ID] [int] NULL ,
[R_ORIG_PLAN_ID] [int] NULL ,
[R_ORIG_CVG_ID] [int] NULL ,
[REFERENCE_NUMBER] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PAYMENT_SOURCE_C] [int] NULL ,
[TX_COMMENT] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ANESTHESIA_TYPE_C] [int] NULL ,
[EMERGENCY_STATUS_C] [int] NULL ,
[PHYSICAL_STATUS_C] [int] NULL ,
[CONCURRENCY_CALC] [numeric](6, 2) NULL ,
[CONCURRENCY_OVER] [numeric](6, 2) NULL ,
[BASE_UNITS] [numeric](6, 2) NULL ,
[TIMED_UNITS] [numeric](6, 2) NULL ,
[PHY_STATUS_UNITS] [numeric](6, 2) NULL ,
[EMERGENCY_UNITS] [numeric](6, 2) NULL ,
[AGE_UNITS] [numeric](6, 2) NULL ,
[MED_SUP_MOD_OVER_C] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ANESTHESIA_CERT] [varchar] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[TECH_CHARGE_FLAG] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[NIA_OUTCOME_CODE_C] [int] NULL ,
[PROC_MED_NEC_FLAG] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[CRD_CHARGE_SLIP_NO] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[VISIT_NUMBER] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[CRNA_CHARGE_ID] [int] NULL ,
[ORIG_ANES_CHG_ID] [int] NULL ,
[CONTESTED_FLAG_YN] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PAT_ENC_CSN_ID] [int] NULL ,
[ACTION_USER_ID] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[TDL_EXTRACT_DATE] [datetime] NULL ,
[INSURANCE_ID] [varchar] (17) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[SPECIMEN_ID] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[TEST_ID] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[LAB_ID] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PANEL_ID] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PROV_SPECIALTY_C] [int] NULL
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[CLARITY_TDL] WITH NOCHECK ADD
CONSTRAINT [PK__CLARITY_TDL__746F28F1] PRIMARY KEY CLUSTERED
(
[TDL_ID]
) WITH FILLFACTOR = 80 ON [PRIMARY]
GO

CREATE INDEX [CLARITY_TDL__DETAIL_TYPE] ON [dbo].[CLARITY_TDL]([DETAIL_TYPE]) WITH FILLFACTOR = 80, PAD_INDEX ON [PRIMARY]
GO

CREATE INDEX [CLARITY_TDL__POST_DATE] ON [dbo].[CLARITY_TDL]([POST_DATE]) WITH FILLFACTOR = 80, PAD_INDEX ON [PRIMARY]
GO

CREATE INDEX [CLARITY_TDL__PERIOD] ON [dbo].[CLARITY_TDL]([PERIOD]) WITH FILLFACTOR = 80, PAD_INDEX ON [PRIMARY]
GO

CREATE INDEX [CLARITY_TDL__SERV_AREA_ID] ON [dbo].[CLARITY_TDL]([SERV_AREA_ID]) WITH FILLFACTOR = 80, PAD_INDEX ON [PRIMARY]
GO

CREATE INDEX [CLARITY_TDL__ORIG_POST_DATE] ON [dbo].[CLARITY_TDL]([ORIG_POST_DATE]) WITH FILLFACTOR = 80, PAD_INDEX ON [PRIMARY]
GO

CREATE INDEX [CLARITY_TDL__TX_ID] ON [dbo].[CLARITY_TDL]([TX_ID]) WITH FILLFACTOR = 80, PAD_INDEX ON [PRIMARY]
GO

CREATE INDEX [CLARITY_TDL__ORIG_SERVICE_DATE] ON [dbo].[CLARITY_TDL]([ORIG_SERVICE_DATE]) WITH FILLFACTOR = 80, PAD_INDEX ON [PRIMARY]
GO

We do have 3 physical disk arrays for this.

What is the advantage of putting multiple files on one array? That wouldn't decrease the load at all.|||I think Brett's idea of splitting out the filegroups is to increase the odds that subsequent reads will come from a different array. It doesn't decrease your load, but it should improve the distribution of your load across your arrays. I don't know what the optimum number is...

The only thing that jumps out at me, other than the fact most of your columns are NULL, is that you are leaving 20% of your pages empty because you don't really have any writes. My guess is your writes are done based on your PK, TDL_ID, which I'm also guessing is a monotonically increasing value that will result in all your inserts being done at the end of your page chain. Thus, your pages stay 20% empty.

If these assumptions are correct, changing your FILLFACTOR to 100% will decrease your i/o's by ~ 20%.|||I left the pages at 20% empty because we get a fairly large load on the database every night (about 50 MB). TDL_ID is however monotonically increasing, so you are right, the inserts would be at the end of the chain, I never thought of that. Thanks for the help.|||It'll increase the number of threads due to parallelism AND will reduce contention because the file will be on different arrays. And by placing the "next of kin" data on separate arrays, I'm assuming that a particular request for data that is "close" will also thread out.

I was just reading about Federated Servers (http://vyaskn.tripod.com/what_are_federated_databases.htm), but I've never used them, and I suspect that not many people do.

Since they're also using partitioned views as well, I don't see how dragging the network in to the equation is a plus.

I would just try the partitioning on the single box first.

Actually, I would consider normalizing the table first.|||Great, thanks for all of your help on this.

I would love to normalize that table, but unfortunately it is out of my hands. That table is created and used by a third party product. I know it is ugly, but there isnt much I can do about it.|||I wonder how many level you can do partitioning...I'll have to play with that..

BUT...if you could, you could partition out the data, AND create another view that is a join to normalized tables and a partition view.

You would just have the final view as the exact same name as your messy table.

AND, can someone PLEASE tell me why all 3rd party stuff looks like crap all the time?|||That is a good idea. I have been wanting to normalize that thing since the first time I saw it.

Wouldn't that have a problem with inserts though? The data is also loaded through a third party product.

If I partition the table and use a view to UNION them all together, the load would have to be changed, correct? That is why I originally just thought I would put all the indexes on a seperate data file.

Friday, March 9, 2012

Multiple Data Files

No comments:

Post a Comment

Multi-page report

Blog Archive

About Me