Blog: Batch processing in Fatman API
As we mentioned in our previous technical blog post https://fatman.fi/en/blog-from-point-to-point-integrations-to-apis-and-beyond/, an API is a modern way of communication between technical systems. In simpler words, it is a language which lets the systems communicate like humans do, with a specified set of commands.
Millions of API requests are happening as we speak. Most modern APIs incorporate REST https://en.wikipedia.org/wiki/Representational_state_transfer and OpenAPI https://www.openapis.org/about design. In that design an API is a collection of resources, and each resource is a collection of homogeneous objects. A typical set of operations includes retrieving a single object by its identifier, retrieving a mass of objects via some search criteria, creating a new object to the system, or updating an existing object. The system providing the API aims to provide the consistency and security of the operations. To facilitate this, in a standardized API the actions usually operate on one object at a time.
However, with an increased usage of APIs and business demands, some systems need to process large amounts of objects, and sometimes move those between systems. This could happen in industries where a large amount of data is naturally present, such as measured time-series or satellite data. Additionally, some operations might be scheduled to process some target data at a specific later period which grows the size of the final operation. For example, if all the relevant reports are to be recalculated at the end of the month, the final operation could be processing large amounts of data.
One batch to rule them all
In cases like this, the combined cost of the API requests per object processed becomes high, and optimizations become desired. In API systems some code fragments are executed on every request regardless of the object being processed, such as opening a web socket connection between the server and the client or initializing and disposing a connection to the persistent storage (database). Due to that one request adding a thousand objects is significantly more efficient than a thousand requests. Combining several individual object operations into one operation for many objects is called batch processing. With that in mind, it could be beneficial to postpone operations on objects and finally execute them later as a single operation in order to save on the performance costs. This optimization is already performed in various low-level operating system jobs. However, this optimization requires that changes to objects in the system are not required to be effective immediately according to business requirements which is not always the case. Most often the API consumer expects to see the effect of their operation immediately after a success response is returned. So technically it is certainly possible to take advantage of the several known operations in advance and process them as a batch. But what are the implications of such an API architecture? Operating multiple objects at the same time creates several technical challenges.
MODERN SOLUTIONS CREATE MODERN CHALLENGES
Firstly, input validations must be performed on several objects at once, and potentially against each other. For example, if a field has to have a unique value, it has to be checked not only against the existing system in a consistent state, but also against the other parts of the inputs in a batch, keeping in mind the side effects of the transformations such as business rules affecting the incoming field values as the system processes them. Then, if one of the validation fails, should the whole operation be aborted, or should the system attempt to ignore the faulty bit and process the rest of the request, risking the consistency of the outcome? The API design should provide intuitive and consistent handling of these cases so that the consumer integrations are reliable and easy to implement and maintain.
Secondly, implementation of batch operations requires good handling of concurrency safety. In an API if two batch operations happen at the same time concerning a shared object, it becomes a lot more possible that one of the consumers’ operation gets lost without realization than in case of single object handling. Luckily, modern software development frameworks provide well-tested and documented algorithms for implementing safe and secure API operations. By following the API development guidelines and best software practices any developer can reliably incorporate batch processing into their projects.
The way of Fatman
In Fatman API, for the purposes where a large mass of objects needs to be added at the same time, we implement batch operations. They have a separate endpoint and accept an array of objects instead of one object. A validation failure causes no changes being done to the system, and the first failure description returned. If the input contains no errors, the data is durably written into the database and can be retrieved immediately afterwards. The server responses with a successful code 204 and no additional data. Make sure to use batch processing operations where appropriate as it helps to reduce the bandwidth in traffic between the systems.
Aleksandr Makarov, Lead Developer