Code Focused
Use the Microsoft Sync Framework to synchronize data changes between SQL Server 2008 Standard, SQL Server 2008 Express and SQL Server Compact Edition 3.5.
Despite the availability of wireless and cellular networks, many applications must support data presentation and collection services while offline or in remote offices. The Microsoft Sync Framework is a highly flexible framework for synchronizing files and data between a client and a master data store. With great flexibility often comes complexity and confusion, however.
Last spring, I lead the Allen Park Neighborhood Center (APNC) team at the annual Lansing Michigan Give Camp during a very sleep-deprived weekend. At the event, more than 100 developers and designers donated their time to create a dozen Web sites and database applications for local charities. The APNC has a service program in which volunteers canvas the entire neighborhood every spring. They survey residents on numerous community health and well-being measures and then work to connect people with available services. The APNC needed a way to collect these surveys while out in the neighborhood and then synchronize the data back to the SQL Server database.
This type of scenario occurs at many businesses and non-profit organizations. A central SQL Server 2008 Standard (or Enterprise) instance will synchronize to multiple SQL Server 2008 Express instances at remote locations, perhaps connected over slower network lines, with the need to take the data offline on laptops or tablets at each remote location. Once the data is updated on the laptop, it needs to be synchronized with the SQL Server 2008 Express instance, and the Express instance is then synchronized with the central SQL Server 2008 Standard/Enterprise instance. The code in this article is a sample database synchronization library for this common scenario – the need to synchronize SQL Server between server editions and offline databases. The sample application is written as a hybrid C# and Visual Basic client.
Synchronization might appear relatively straightforward, but it can quickly become complicated and problematic, driving the need for a tool such as the Microsoft Sync Framework. The Microsoft Sync Framework 2.1 Software Development Kit (SDK) is available here.
Developer Challenges
Synchronization typically refers to file or database row-level synchronization. Most of us have used applications with offline synchronization such as Microsoft Outlook or Live Mesh, or third-party applications such as Dropbox or Evernote. These applications work well and can lull us into believing that synchronization is easy, when it’s quite the opposite. Synchronization is hard for the developer, particularly when multiple nodes are independently updating a master database.
How can records be created at each node so that primary keys remain truly unique across all nodes? If multiple nodes attempt to update the same row, how are physical and logical conflicts resolved? The obvious conflict is if two nodes modify the same record. Often, both updates are permitted, if they impact different columns. But what if one node updates the first name and another node updates the last name in the same row? It’s not a physical column conflict, but it’s reasonable to believe that the nodes were not referring to the same person. Therefore, the updates present a logical conflict. These conflicts need to be detected and addressed.
Synchronization loops can occur in poorly designed architectures when one node synchronizes a change with one or more nodes. The same update is ultimately received back by that node as new data from another node, which is unaware that the receiving node already has the update. This problem is avoided by carefully defining the synchronization topology; a hub and spoke arrangement is often used to prevent loops.
Each table to be synchronized must have a unique primary key for each row. New records must generate primary key values that are unique across all nodes of the master database. If the schema for all nodes is exactly the same, the commonly used Identity Seed integer primary keys can cause problems. Each node would generate the same ID value upon its next insert. GUIDs are an obvious choice for guaranteed unique keys, but they’re difficult to work with and their 16-byte size can bloat the database. The recommended option is to create a composite primary key for each table, consisting of a constant Node ID (per node), plus the Identity Seed integer column. This method is used in the tables of the database included in the accompanying code download, which can be accessed at VisualStudioMagazine.com/Kunk0612. (For more information, see the MSDN Library page, “Selecting an Appropriate Primary Key for a Distributed Environment.”)
Tracking Tables Method
In order for any SQL database to synchronize with an offline application, both the remote and master databases need the ability to track changes made to the database between synchronization actions. The Microsoft Sync Framework providers currently support the Tracking Tables method.
SQL Server 2008 offers two options for collecting change tracking information within the database engine, using built-in features that offer reliability and performance advantages over tracking tables. I’ll briefly introduce these options, but they’re not fully compatible with the Microsoft Sync Framework database providers at this time. They can be used by writing a custom synchronization provider (see the MSDN Library page, “Synchronizing Data Stores by Using Custom Providers“).
In order for SQL Server to track the changes made to the database, the database is modified to add a table with the _tracking suffix for each table to be synchronized, plus the three tables (schema_info, scope_config and scope_info) in a process called provisioning the database.
Figure 1 shows the tables in the Neighborhood_TrackingTables database after it has been provisioned. The database originally had four tables named Neighbor, Question, Response and Street. The provisioning process also adds three triggers to each original table as shown under the Street table in Figure 1.
The Tracking Tables method has the obvious disadvantage of requiring schema modifications to the database, which might conflict with existing schema triggers or not even be possible, in some cases. Despite these potential disadvantages, it’s the only tracking method supported out of the box by the Microsoft Sync Framework.
SQL Server 2008 Change Data Capture has the advantage of automatically maintaining parallel tracking tables with change metadata, and filling these tables via asynchronous threads that parse the database logs to populate the tracking tables. This approach provides very low impact to core database performance. It also provides a full history of all data changes made – and not just the net result of those changes – making it useful for populating data warehouses and data marts. This tracking feature is available only in the Enterprise edition of SQL Server 2008.
Change Tracking
All versions of SQL Server 2008, including SQL Express, have the Change Tracking feature, which can be used in the Microsoft Sync Framework with a custom provider. Change Tracking differs from Change Data Capture in that it doesn’t maintain a history of discrete changes. Using Change Tracking to track changes offers advantages over tracking tables:
SQL Server 2008 Change Tracking isn’t enabled by default. Queries for changes should always use snapshot isolation transactions in order to have consistent results. The T-SQL to enable snapshot isolation transactions and Change Tracking with a two-day expiration of tracking data is shown here (note that the expiration period should be a multiple of the expected synchronization frequency to ensure changes are not lost before the next synchronization):
Change Tracking must be enabled for each table to be tracked. Only tables that are actively synchronized should have Change Tracking enabled. Databases have numerous tables, so I recommend using the following query to generate an ALTER statement string for each table (then copy the ALTER statements for the desired tables to a new query window to be executed):
For more information on Change Tracking in SQL Server 2008, see the MSDN Library pages, “Enable and Disable Change Tracking” and “Obtaining Changes by Using the Change Tracking Functions“.
Synchronization Application
In order to make adding Microsoft Sync Framework synchronization to an application as easy as possible, I created a custom class library to perform the synchronizations. In the common APNC scenario, you’d write your application to update the SQL Server Standard/Enterprise database when running local to that server, or the SQL Server 2008 Express database when in a remote office, or the SQL Server Compact Edition database when offline.
In each case, the application functions as a standard database maintenance application with little concern for the synchronization process. The custom library DoSync function is invoked when the databases need to be synchronized as a separate application function. The DoSync method resides in the SynchronizationHelper class and uses a custom DatabaseSyncInfo class to hold the needed configuration settings.
The database synchronization providers are found in the Microsoft.Synchronization.Data namespace. SQL Server 2008 Standard and SQL Server 2008 Express use the same SqlServer.SqlSyncProvider class. The database synchronization provider for SQL Server Compact Edition version 3.5 uses the SqlServerCe.SqlCeSyncProvider class. Because the SQL Server 2008 Standard and Express versions both use the same database synchronization provider, the library supports the bidirectional synchronization scenarios shown in Table 1 .
Table 1 Bidirectional synchronization scenarios available with the custom library.
The application assumes the local database has all the tables and data that need to be synchronized. For the first synchronization, it’s best if the remote database is an exact copy of the local database, but it doesn’t need to be – it can be empty or a copy of the local database with modified data.
At each synchronization, DoSync will create tracking tables in the local database and the base and tracking tables in the remote database as needed, and then synchronize the data between both databases.
In Figure 2, the SyncSQL2008_Standard_Express_Compact35 project is a C# project. That’s because the project is an enhanced composite of several sample projects written in C#, which are available at the Microsoft Sync Framework Development Center. Keeping the code in C# makes it easier to update the library to include new functionality from other samples.
Printable Format
Microsoft is always soliciting feedback to help guide its development tooling efforts, most recently asking developers to weigh in with their requests to Santa for new debugging functionality in the form of a “debug-tastic” gift.
As we have reported before, Microsoft’s super-popular Visual Studio Code is the No. 1 code editor for multiple programming languages, and Google’s Go is on that list.
Microsoft’s C# programming language led the race to be named “programming language of the year” for 2022 by the TIOBE Index before being overtaken at the very end, but it’s expected to attain the throne this year.
What better way to learn about cutting-edge AI than creating a “copilot” assistant that can act as a mixologist, suggesting new cocktail recipes based on available ingredients?
New features and functionality boost productivity, C++ game development and .NET/C# (.NET MAUI).
Subscribe on YouTube
> More Webcasts
Problems? Questions? Feedback? E-mail us.
Leave a Reply