What are XML feed best practices?
31st January 2012
My answer to What are XML feed best practices? on Quora
It sounds like you’re pretty much screwed already, if you’re dealing with companies that still think FTPing XML around is a sensible thing to do.
I would suggest focusing on what you can control. Assume that you will be passed bad data—weird formats, not-well-formed XML, duplicate entries etc. Your job is to handle all of this without going mad, and without your codebase turning in to an unmanageable ball of mud.
So, start by figuring out your own core data model / abstraction. It will need to be VERY loose—as few required fields as possible, since you can be sure some if the feeds you are consuming will come in with stuff missing at some point or another.
Separate your feed consumers from the rest of your code. Having your own good internal Web API (which could consume JSON rather than XML since you control it) might be smart, since that will provide a solid separation and you can then write all of your feed consumers as separate pieces of code that just POST new items to the API.
Learn to love, respect and cherish unique identifiers... but be very wary of supposedly unique identifiers from external sources unless you can be absolutely sure they won’t change on you. Create your own unique IDs at the first available opportunity, treat them properly within your own system and map external identifiers to them whenever you can.
Write your consumers in a dynamic language with a solid interactive prompt, like Python or Ruby. This will make them much easier to write and debug. Use whatever you like for your core data storage / API.
Since your incoming data will come in all shapes and sizes, consider a document store such as MongoDB or Riak over a SQL database. Avoiding SQL migrations will help you out a lot.
Log and store absolutely everything. Ideally you should be able to re-execute every import that the system has ever executed, in order, to make debugging and fixing errors non terrifying. That will almost certainly prove impossible, but it’s a nice thought.
Good luck!
More recent articles
- Storing times for human events - 27th November 2024
- Ask questions of SQLite databases and CSV/JSON files in your terminal - 25th November 2024
- Weeknotes: asynchronous LLMs, synchronous embeddings, and I kind of started a podcast - 22nd November 2024