In Sync Framework, the tables to be synchronized including the columns to be included in synchronization and the corresponding filters are defined in the scope definition. In SQL Azure Data Sync Service, the counterpart for this is the Sync Group’s Dataset definition (which is a Sync Framework scope internally, with SQL Azure Data Sync Service actually built on top of Sync Framework).
I’ve seen more than a couple of forum questions asking for best practices, guidelines or considerations that need to be taken into account when defining scope definitions (or Datasets in the case of SQL Azure Data Sync).
The following factors are the things you should be considering before creating scopes or datasets.
What needs to be synched?
Not all tables or columns or rows needs to be synchronized. Choose which tables you want to synchronize. Then decide whether you need to include all rows and if you want all columns as well. In some scenarios, you might only need a subset of data either vertically sliced (subset of columns) or horizontally sliced (subset of rows). In some cases, lookup tables like States (WA, CA, IL, etc.), Salutations (Mr., Ms., Dr., etc.) are static tables that don’t change that much over time and can be copied one time or even wiped out and refreshed periodically rather than be queried every now and then if there are changes that needs to be synchronized.
What’s the frequency/volume of updates?
Not all tables receive the same amount of updates. Create separate scope or dataset definitions that separate less frequently updated tables from frequently updated ones. If a table is less frequently updated, you’ll save some time executing change enumeration queries when there are no changes to be synched. Likewise, you can vary the schedule of the sync such that less updated tables are synched less frequent and heavily updated tables synched often. Combining heavily updated tables with less updated tables in the same scope or dataset means the heavily updated tables will actually be holding up the less frequently updated tables in the same sync session.
What’s the likelihood of a schema change?
The sync knowledge in both Sync Framework and SQL Azure Data Sync is stored at the scope level. Since neither of them currently supports scope and sync group dataset modification to add/remove tables and columns, you are forced to de-provision and re-provision the scope and sync group to accommodate the changes. De-provisioning wipes out the sync knowledge effectively forcing re-initialization of existing replicas. Consider having a 100-table scope/dataset and you need to add a column for one of the tables, the de-provisioning would wipe out the knowledge for everything including the 99 other tables whose structure hasn’t changed and forcing a re-initialization of those tables as well.
What’s the likelihood of conflicts?
Conflicts are expensive especially on IO and Memory since they require loading copies of the conflicting row. Conflict resolution slows down the sync because a resolution policy needs to be applied before the sync can continue. You may want to put tables where there’s a high chance of conflicts into separate scopes/datasets, so these tables don’t hold up other tables’ transactions while conflicts are resolved.
What’s the conflict resolution policy?
Group tables according to how you want to resolve conflicts. You might want to group all tables whose conflict resolution policy is Client Wins and another group for tables where the policy is Server or Hub Wins. For example, in Sync Framework, if you have tables in the same scope where you need to apply a different policy for each table, this means that in the conflict handling event, you need to check for the table name first before you can set which row wins.
What’s the Sync Direction?
Some tables may be Download only, Upload only or both (Bidirectional). Create separate scope or dataset definitions by Sync Direction such that you have a grouping of tables for Download only syncs, Upload only syncs and Bidirectional syncs. That way you can schedule the sync separately or even run them in parallel.
What’s the relationship between the tables?
You should group related tables together. For example, Order and Order Details should be in one group because when you sync an Order, you will certainly require the corresponding Order Details.
What’s the size of the tables?
Tables with less rows will most likely have less metadata (not unless you do a lot of delete operations). Less rows, faster joins between the table being synched and the corresponding tracking table during change enumeration. Grouping smaller tables separate from the bigger tables means you can get the small tables’ changes immediately applied and available rather than having to wait for the bigger tables to finish if they were in the same scope and same sync session.