Sync Framework WCF-based Synchronization for Offline scenario – Using custom dataset serialization


One of the key performance issues with Sync Framework WCF-based synchronization in offline scenarios is the performance, memory usage and message size of serializing datasets.

During one of my projects, I’ve been faced with the same dilemma of how to optimize the message size and consequently the memory usage and performance over WCF. (I’m based in Australia where internet plans are capped., i.e., 15GB plans, 30GB plans, etc).

Looking at some examples, you’ve probably came across this walkthrough on how to build occasionally connected applications on devices using Sync Framework: Walkthrough- Creating an Occasionally Connected Smart Device Application

If you looked at the ApplyChanges operation, you’ll noticed that one of the parameters is a dataset. Likewise, the GetChanges returns a SyncContext, inside of which are datasets containing the changes to be downloaded.

Code Snippet
  1. public virtual SyncContext ApplyChanges(SyncGroupMetadata groupMetadata, DataSet dataSet, SyncSession syncSession)
  2. {
  3.     return this._serverSyncProvider.ApplyChanges(groupMetadata, dataSet, syncSession);
  4. }
  5.  
  6. public virtual SyncContext GetChanges(SyncGroupMetadata groupMetadata, SyncSession syncSession) {
  7.     return this._serverSyncProvider.GetChanges(groupMetadata, syncSession);

The problem with DataSet serialization is the that the actual payload size explodes because of all the XML tags surrounding the data. Moreover, each row is represented twice in the XML data, a before and after copy of the row (DiffGrams). In SyncFramework, this becomes even worst because of multiple datasets.

Take for example of ApplyChanges, for uploads, you have a parameter dataSet. Noticed that the return value is a SyncContext. SyncContext actually contains several copies of the dataset that have just been uploaded (in worst case, bigger than what was uploaded).

This is where you’d find other copies of data:

Code Snippet
  1. syncContext.DataSet
  2. syncContext.GroupProgress.Changes
  3. syncContext.GroupProgress.TablesProgress

To reduce the payload during sync, you’ve probably came across some postings on using DataSetSurrogates (check KB article 829740 ) or custom serialization. Unfortunately, you probably haven’t come across an example on how to go about implementing it.

You’ll find performance comparison at the following links:
DbSyncProvider- Improving Memory Performance In WCF Based Synchronization 
and
DbSyncProvider WCF Based Synchronization– Memory Performance Analysis Of DataSet Binary SerializationFormat Vs DataSet Surrogates

Unfortunately, no sample code is also provided. However, if you look closer inside Sync Framework v2.0, you’ll find that there is a DataSetSurrogate implementation inside. And this is the same approach used by the memory-based batching support in V2.0 for collaboration scenarios.

If you’re using the offline-scenario (either hand-coded or the designer generated Local Database Cache in Visual Studio), then you might still be wondering how to go about it.

I’ve been spending some time in the Sync Framework forums and I have suggested the use of DataSetSurrogate for some time and I did say I’ll do a post on how to go about it. So I created a sample that you can download from here: OCSDemo.zip

This example app is similar to the Walkthrough- Creating an Occasionally Connected Smart Device Application. However, I used the AdventureWorksLT database. Likewise, i have create a custom sync provider proxy on the client side and used a freely downloadable custom serializer (http://www.codeproject.com/KB/cs/CF_serializer.aspx). (I wanted to demo passing the dataset as byte[] instead of a DataSetSurrogate and the Compact Framework doesn’t support BinaryFormatter.)

So let’s look at the code changes.

Here’s how it looks like on the WCF Service interface and the service itself: (I’ve included the original sample codes in comment)

Code Snippet
  1. //[OperationContract()]
  2. //SyncContext ApplyChanges(SyncGroupMetadata groupMetadata, DataSet dataSet, SyncSession syncSession);
  3.  
  4. [OperationContract()]
  5. SyncContext ApplyChanges(SyncGroupMetadata groupMetadata, byte[] dataSetSurrogate, SyncSession syncSession);
  6.  
  7. //[OperationContract()]
  8. //SyncContext GetChanges(SyncGroupMetadata groupMetadata, SyncSession syncSession);
  9.  
  10. [OperationContract()]
  11. SyncContext GetChanges(SyncGroupMetadata groupMetadata, SyncSession syncSession, out byte[] dataSetSurrogateByteArray);

The code below simply does the conversion from byte array back to a dataset before invoking the built-in sync provider ApplyChanges and GetChanges.

Code Snippet
  1. [System.Diagnostics.DebuggerNonUserCodeAttribute()]
  2.  //public virtual SyncContext ApplyChanges(SyncGroupMetadata groupMetadata, DataSet dataSet, SyncSession syncSession)
  3.  //{
  4.  //    return this._serverSyncProvider.ApplyChanges(groupMetadata, dataSet, syncSession);
  5.  //}
  6.  
  7.  public virtual SyncContext ApplyChanges(SyncGroupMetadata groupMetadata, Byte[] dataSetSurrogate, SyncSession syncSession)
  8. {
  9.      DataSet dataSet = new DataSet();
  10.      CompactFormatter.CompactFormatterPlus cf = new CompactFormatter.CompactFormatterPlus();
  11.  
  12.      // let’s deserialize the byte array back to dataset
  13.      MemoryStream memStream = new MemoryStream(dataSetSurrogate);
  14.  
  15.      object obj = cf.Deserialize(memStream);
  16.      dataSet = obj as DataSet;
  17.  
  18.      // pass the dataset to SyncFx
  19.      SyncContext syncContext = this._serverSyncProvider.ApplyChanges(groupMetadata, dataSet, syncSession);
  20.  
  21.      //the SyncContext return value contains the same changes we uploaded
  22.      //so we clear them so they dont get sent down to the client
  23.      syncContext.DataSet = null;
  24.      syncContext.GroupProgress.Changes.Clear();
  25.  
  26.      //assuming the conflicts had been handled on the service side
  27.      // we don’t need to download them anymore, so let’s clear the conflict collection
  28.      foreach (SyncTableProgress syncTableProgress in syncContext.GroupProgress.TablesProgress)
  29.      {
  30.          syncTableProgress.Conflicts.Clear();
  31.      }
  32.      return syncContext;
  33. }
  34.  
  35.  
  36. [System.Diagnostics.DebuggerNonUserCodeAttribute()]
  37.  //public virtual SyncContext GetChanges(SyncGroupMetadata groupMetadata, SyncSession syncSession) {
  38.  //    return this._serverSyncProvider.GetChanges(groupMetadata, syncSession);
  39.  
  40.  public virtual SyncContext GetChanges(SyncGroupMetadata groupMetadata, SyncSession syncSession, out Byte[] dataSetSurrogateByteArray)
  41. {
  42.      SyncContext syncContext = this._serverSyncProvider.GetChanges(groupMetadata, syncSession);
  43.  
  44.      MemoryStream memStream = new MemoryStream();
  45.  
  46.      CompactFormatter.CompactFormatterPlus cf = new CompactFormatter.CompactFormatterPlus();
  47.  
  48.      cf.Serialize(memStream, syncContext.DataSet);
  49.  
  50.      dataSetSurrogateByteArray = memStream.ToArray();
  51.  
  52.      // we’re sending back the dataset via a byte array
  53.      // so, let’s clear it from the SyncContext
  54.      syncContext.DataSet = null;
  55.      syncContext.GroupProgress = null;
  56.  
  57.      return syncContext;
  58.  
  59. }

Notice how we also clear the different datasets from the SyncContext prior to returning the call.

And on the client side, here’s the proxy code which is more or less the same with the default implementation of the ServerSyncProviderProxy:

Code Snippet
  1. public class RemoteServerSyncProviderProxy : Microsoft.Synchronization.Data.ServerSyncProviderProxy
  2.     {
  3.         private AWServiceWebRef.AWCacheSyncService _serviceProxy;
  4.         
  5.         public RemoteServerSyncProviderProxy(AWServiceWebRef.AWCacheSyncService serviceProxy)
  6.             : base(serviceProxy)
  7.         {            
  8.             this._serviceProxy = serviceProxy;
  9.         }
  10.  
  11.         public override SyncContext ApplyChanges(SyncGroupMetadata groupMetadata, DataSet dataSet, SyncSession syncSession)
  12.         {
  13.             MemoryStream memStream = new MemoryStream();
  14.  
  15.             CompactFormatter.CompactFormatterPlus cf = new CompactFormatter.CompactFormatterPlus();
  16.             
  17.             // serialize the dataset into stream
  18.             cf.Serialize(memStream, dataSet);
  19.  
  20.             // pass the stream containing the serialized dataset as byte array
  21.             return this._serviceProxy.ApplyChanges(groupMetadata, memStream.ToArray(), syncSession);
  22.         }
  23.     
  24.         public override void Dispose()
  25.         {
  26.             base.Dispose();
  27.         }
  28.  
  29.         public override SyncContext GetChanges(SyncGroupMetadata groupMetadata, SyncSession syncSession)
  30.         {
  31.             DataSet dataSet = new DataSet();
  32.             CompactFormatter.CompactFormatterPlus cf = new CompactFormatter.CompactFormatterPlus();
  33.  
  34.             byte[] dataSetSurrogateByteArray;
  35.  
  36.             SyncContext syncContext = this._serviceProxy.GetChanges(groupMetadata, syncSession, out dataSetSurrogateByteArray);
  37.  
  38.             MemoryStream memStream = new MemoryStream(dataSetSurrogateByteArray);
  39.  
  40.             // let’s deserialize the byte array back to dataset
  41.             memStream.Position = 0;
  42.             object obj = cf.Deserialize(memStream);
  43.             dataSet = obj as DataSet;
  44.            
  45.             // let’s assign back the dataset to SyncContext
  46.             syncContext.DataSet = dataSet;
  47.             syncContext.GroupProgress = new SyncGroupProgress(groupMetadata, dataSet);
  48.            
  49.             return syncContext;
  50.         }
  51.  
  52.         public override SyncSchema GetSchema(Collection<string> tableNames, SyncSession syncSession)
  53.         {
  54.             return this._serviceProxy.GetSchema(tableNames.ToArray(), syncSession);
  55.         }
  56.  
  57.         public override SyncServerInfo GetServerInfo(SyncSession syncSession)
  58.         {
  59.             return this._serviceProxy.GetServerInfo(syncSession);
  60.         }
  61.     }

We needed the above code so we can intercept the calls to ApplyChanges and GetChanges and create the byte arrays for transmission via WCF.

And here’s how we substitute our proxy in SyncAgent, notice how we substitute our RemoteServerSyncProviderProxy with the original ServerSyncProviderProxy in the walkthrough.

Code Snippet
  1. // The WCF Service
  2.             AWServiceWebRef.AWCacheSyncService webSvcProxy = new
  3.                 OCSDeviceApp.AWServiceWebRef.AWCacheSyncService();
  4.  
  5.             // The Remote Server Provider Proxy
  6.             RemoteServerSyncProviderProxy serverProvider = new
  7.                 RemoteServerSyncProviderProxy(webSvcProxy);
  8.  
  9.             // The Sync Agent
  10.             AWCacheSyncAgent syncAgent = new AWCacheSyncAgent();
  11.             syncAgent.RemoteProvider = serverProvider;
  12.             syncAgent.SalesLT_Customer.SyncDirection = Microsoft.Synchronization.Data.SyncDirection.Bidirectional;
  13.  
  14.             // Synchronize the databases
  15.             Microsoft.Synchronization.Data.SyncStatistics stats = syncAgent.Synchronize();

 

Complete Source Code:OCSDemo.zip

I hope this helps. Again, would love to hear your feedback.

9 comments

  1. June,I have, on my to-do list, to optimize the WCF transmission.I am using WCF with a contract based on that found at MSFT in their examples. I think it is peer-peer. Anyway it is SqlSyncProvider/RelationalSyncProvider based.The ApplyChanges in the contract is: [OperationContract] SyncSessionStatistics ApplyChanges(ConflictResolutionPolicy resolutionPolicy, ChangeBatch sourceChanges, object changeData);So the changedata is passed as an Object. You then cast it back: DbSyncContext dataRetriever = changeData as DbSyncContext;What are your thoughts on byte[] versus Object. Clearly the latter is simpler to code. But at what cost?The contract does use byte[] for moving files: [OperationContract(IsOneWay = true)] void UploadBatchFile(string batchFileId, byte[] batchContents, string remotePeerId); [OperationContract] byte[] DownloadBatchFile(string batchFileId);I was starting to do some research into the use of MTOM for optimization of byte[] in WCF. Do you have any thoughts on this?And what about WCF compression?Finally, on a tangential note, you are doing WCF-based proxy sync here. What is your thread model? I have STA because of GUI requirements, but on the SyncFx forum have been advised that for proxy & WCF I need to do MTA (see http://social.microsoft.com/Forums/en/syncdevdiscussions/thread/c2961e27-7435-404d-87ce-514a6cc2c33d). Are you using MTA, and if so, how did you go about it?

  2. Unknown · · Reply

    Hi June,Thanks for your sample code, works quiet well.Nevertheless, let me share what i experienced while trying this, maybe someone else has similar problems.The first time a tried your sample code, it took even more time to sync then the basic "autogenerated" version.So "format" and "unformat" each Syncgroup on the device took even more time than handling datasets itself.I could solve this issue with reducing the syncgroups from 10 to 3. Now i have three syncgroups. With your sample code synchronisation time went down from 9 minutes to 6 minutes (applying ~13.000 changes)thanks

  3. Hi June,

    I am running SyncFx 1.0 for devices Sp1 on WM 6.0. I am running into memory issues when UPLOADING 2000+ records. Memory skyrockets to 24+MB and then the application crashes (I believe CF will only allow 24MB of mem for a process)

    So my question is will implementing the DataSetSurrogate help with these memory issues? I looked at the links you provided and it appeared that those benchmarks were all done on desktop.

    I realize that serializing that much data takes lot of memory and will always be an issue. However, I would still like a resolution. As far as I know, batching is for download so that does not help me any.

    Is there a way to limit the rows sent up for a table?? Something like Sync these 500 then the next 500 and so forth? That would also solve my problem if something like that is possible.

    Thanks so much,
    Shane

  4. There is a hotfix for Microsoft Sync Framework for Ado.net (2009 release). Was that the version you used with surrogate serialization or the previous, original one? And the sec questions, for 24 000 records it takes 1:20 to take a snapshot, Can i somehow decrease the amount of time?

    1. not sure I applied any hotfix when I wrote this post. what’s the environment where you got your benchmarks?

  5. Jagadeesh · · Reply

    Will this work for compact framework based windows mobile sync application?

    1. If you read it, you’ll find out its based on CF/WM app

  6. HI
    My clients apps doesn’t have any IP static or IP valid for connecting to the server. So how can I sync my dbs without any connectionstring. Can I use something like http request(like webapi)?

  7. I know this is a bit old.. However, I need this solution for another project I have that has memory issues syncing over 300+ items. So, I loaded this up in vs2008, loaded AW database, reconfigured data sync, built solution and apparently the sync does not work.. It shows message uploaded 700+ items one time and 3000+ items another after adding rows to the DBs and when I go back and check after the sync, the table is exactly the same, no sync occurs… Any ideas?

Leave a comment