Feature A: As the API will be used internally and will be consumed by other API applications, fuzzy searching doesn't make sense Using the SQL operator ILIKE is well suited for this case.
Feature B: The JSON provide multiple top tier apps (free, paid, grossing) I got stuck here on taking a decision on which top tier app to take, because I assumed that feature A should not display duplicates. I decided to take the free tier only, but after doing it, I thought it will underserve the usecase of the given business goal which I read a bit too late A better solution would have to write table dedicated to ranking, so we separate the ranks from the games, and populate that ranking table and games table accordingly. As you asked to not spend so much time on this exercise, this is what has been done.
Question 1: We are planning to put this project in production. According to you, what are the missing pieces to make this project production ready?
There's a lot that could be done to make this project production ready. We could/should :
- as sqlite is writing in a single file, concurrency writing would be a problem, so switching to PSQL is a good idea
- have a transaction context system would ensure acid semantics
- implement validation on queries made for security issues
- avoid S3 hardcoded url using environments variables
- as it's an internal service, api rate limiting and authentication should be welcomed
- preventing pushing bad code with a CI/CD pipeline
- add structure logging and ingest those logs in an opensearch / any observability tool to monitor this api health.
Let's pretend our data team is now delivering new files every day into the S3 bucket, and our service needs to ingest those files every day through the populate API. Could you describe a suitable solution to automate this? Feel free to propose architectural changes.
I'll replace the http call with a scheduled cron job, add alerts if ingestion fails + dashboard I'll look for idempotency by adding a feed date and modify routes internally to always give the most updated one I'll decide on a lifecycle strategy to avoid the db growing overtime