Introducing Job Hooks in Neosync
You can now run pre and post job hooks to further customize your workflows
December 9th, 2024
One of the biggest challenges that data synchronization tools like Neosync face is dealing with foreign key relationships correctly. You need to ensure that when you sync data across databases, you maintain referential integrity - meaning that if table A references table B, you need to sync table B's data first. But it gets even more complicated because there could be circular dependencies in the table, across tables or both!
In this blog, I'm going to walk through a basic, sample implementatinhow we can implement an in-memory foreign key tracker that helps us handle these dependencies efficiently. We'll use Go for the implementation since we write in Go. This blog won't cover every edge case (there are too many to list), but the goal is for you to get an idea of how to approach this problem.
Before we dive into the code, let's talk about why we'd want to track foreign keys in memory versus just querying the database for this information.
The data structure is arguably the most important part of this. Nailing the data structure will make writing the actual algorithm much easier. Since we're going to be doing a lot of look-ups, we'll heavily rely on maps.
Let's start with defining our core types:
type ForeignKey struct {
TableName string
ColumnName string
RefTable string
RefColumn string
}
type TableGraph struct {
// Map of table name to its foreign key constraints
tables map[string][]ForeignKey
// Map to track circular dependencies
visited map[string]bool
// Mutex for concurrent access
mu sync.RWMutex
}
func NewTableGraph() *TableGraph {
return &TableGraph{
tables: make(map[string][]ForeignKey),
visited: make(map[string]bool),
}
}
The ForeignKey
struct represents a single foreign key relationship. The TableGraph
struct is our main data structure that tracks all relationships between tables.
Now let's implement methods to add foreign key relationships:
func (g *TableGraph) AddForeignKey(fk ForeignKey) error {
g.mu.Lock()
defer g.mu.Unlock()
// Initialize slice if table doesn't exist
if _, exists := g.tables[fk.TableName]; !exists {
g.tables[fk.TableName] = make([]ForeignKey, 0)
}
// Add the foreign key
g.tables[fk.TableName] = append(g.tables[fk.TableName], fk)
return nil
}
One of the trickiest parts of foreign key tracking is handling circular dependencies. For example, if table A references table B and table B references table A, we need a way to detect and handle this.
Here's how we can implement cycle detection:
func (g *TableGraph) HasCycle(tableName string, visited map[string]bool, path map[string]bool) bool {
visited[tableName] = true
path[tableName] = true
for _, fk := range g.tables[tableName] {
next := fk.RefTable
if !visited[next] {
if g.HasCycle(next, visited, path) {
return true
}
} else if path[next] {
return true // Found a cycle
}
}
path[tableName] = false
return false
}
This uses a depth-first search approach to detect cycles in our graph of table relationships.
Now for the most important part - determining the order in which to sync tables. We'll use a topological sort to get this:
func (g *TableGraph) GetSyncOrder() ([]string, error) {
g.mu.RLock()
defer g.mu.RUnlock()
visited := make(map[string]bool)
order := make([]string, 0)
var visit func(string) error
visit = func(table string) error {
if visited[table] {
return nil
}
visited[table] = true
// Visit all dependencies first
for _, fk := range g.tables[table] {
if err := visit(fk.RefTable); err != nil {
return err
}
}
order = append(order, table)
return nil
}
// Visit all tables
for table := range g.tables {
if err := visit(table); err != nil {
return nil, err
}
}
return order, nil
}
This returns tables in an order where dependencies are synced before the tables that depend on them.
Here's how you might use this in a real application:
func main() {
graph := NewTableGraph()
// Add some relationships
graph.AddForeignKey(ForeignKey{
TableName: "orders",
ColumnName: "user_id",
RefTable: "users",
RefColumn: "id",
})
graph.AddForeignKey(ForeignKey{
TableName: "order_items",
ColumnName: "order_id",
RefTable: "orders",
RefColumn: "id",
})
// Get sync order
order, err := graph.GetSyncOrder()
if err != nil {
log.Fatal(err)
}
// order will be: [users, orders, order_items]
for _, table := range order {
fmt.Printf("Sync table: %s\n", table)
}
}
This memory footprint of this implementation is O(N) where N is the number of foreign key relationships in your database. This would probably work for most applications - even those with thousands of tables and relationships. There are definitely edge cases with this like virtual foreign keys (which we didn't account for) and others but this is a good enough start.
Building an in-memory foreign key tracker might seem complex at first, but breaking it down into these components makes it manageable. The key is having a clear data structure and handling the edge cases like circular dependencies properly.
We use this pattern extensively in Neosync to ensure data integrity when syncing across databases. It's also useful for other database tools like schema migration systems or backup tools.
The complete code for this implementation can be found in our open source repo if you want to see how we use it in production.
You can now run pre and post job hooks to further customize your workflows
December 9th, 2024
Product highlights from November
December 2nd, 2024
Nucleus Cloud Corp. 2024