
Senior Software Engineer (Orchestration)
Spydra TechnologiesResearch and Development Engineer
Kerala Blockchain AcademySAP HANA Database Administrator
Tata Consultancy Services
Helm

Ansible

Kubernetes

Git

Terraform

AWS
Azure

GCP

Prometheus
.jpg)
Grafana
.png)
Jenkins
.png)
Docker
I have started my career as a teacher. Before that, I'd be happy to explain my graduation details. I have started and pursued my bachelor's of technology at Fitness University, Gondor. And once that was done, I got a campus placement with Tata Consultancy Services, got selected at the campus interview, and once I got recruited, I got posted in a train that went to Trivandrum and got posted in Bangalore in a SAP project. So, I have worked on a SAP project where I worked as a HANA database administrator for one and a half years. So where I learned how the production environment would be. And once that was done, I learned that blockchain is something which helps the enterprise a lot and, especially in terms of tracking and tracing. And with that, I learned blockchain as my personal interest and once that was done, based on my own personal project, I mean, I applied for KELA Blockchain Academy, which is under a project of AAA TMK, KELA and now it is called as Digital University, KELA. So based on my profile, my interest and my personal project which I'd done on Hyperledger Fabric, they were interested, gave me an offer, and I have worked there as a research and development engineer for a couple of years. There, I have worked as a trainer. I have trained the public directly and not only apart from training, I have worked as a network engineer in the couple of projects. One is Singapore-based and another is with the department of excise and customs. They wanted to have a project on recording their customs products they are importing. So they wanted to have a record on that, and they were using Hyperledger Fabric for that. And we have done this on our on-premise. Now at present, I am working on Spydera Technologies where it is a platform where Hyperledger Fabric is used, and we have reengineered it to scale up the nodes. I mean, we have bootstrapped the Hyperledger Fabric network in just 5 to 10 minutes. So I have used Ansible and Terraform, and we have deployed that on Kubernetes AWS Cloud. So there we have engineered like automated full orchestration of Hyperledger Fabric where creating of certificates, CA certificates, and creating the peers and orals and joining them. And deploying the chain code based on the chain code life cycle by the Hyperledger Fabric. So we are using Kubernetes. And not only that, we are also having different operations like creating a non-organization, creating a peer, and all are automated, and it's just done with a single click. So end-to-end orchestration is being sorted out in terms of all the operations on the latest version of Hyperledger Fabric, which is 2.5.
So there was a situation where the I just explained in the situation of Khela Blockchain Academy, Triple ATM. Okay? The measurement of throughput, which is TPS, was not up to the mark. So for them, they were not satisfied with the throughput which we have offered them. So what we have done is we have used an intermediary database, which is MongoDB. Hyperledger Fabric uses GaussDB, but that is not a performant database at all, which is given in the documentation itself. Right? So what we have done is we have modified our API service, which calls the chain codes such that it will first store the data in the Mongo database, which will offer a throughput of 300 to 500 TPS. Right? And based on the client's transactions. So the client will always keep the transactions that are done in the MongoDB. And we have introduced a Cron job in between so that the API service, the cron job, will take a batch of transactions from the MongoDB. It will read the batch of transactions based on the state of the payload. So let's say a user has created a payload, and that payload has a failed status. Right? The status when it is in MongoDB is inactive. Right? I'm just telling this as an example. And once we commit that transaction on top of Hyperledger Fabric, then the state will be changed to active. Right? So it is on the ledger now, on the chain, and this MongoDB will be acted as off-chain. So this is kind of a queuing service. Not only MongoDB, there can be another queuing service like RabbitMQ and all to have optimized measurements. So when we measured the throughput, we have used Hyperledger Caliper as well as Jmeter, so to measure the performance of the network. So basically, and not only that, we have changed, optimized, or fine-tuned the parameters like batch size of order, and the size of the payload in the order constraints. Right? And this is how we have optimized, and keeping a queuing mechanism in between so that we have a better performance at the user end, as well as they will have an off-chain as well as on-chain. Off-chain is also used for proof of some transactions, it will act as a proof for the transaction as well, and we are showing on-chain tracing and tracking will be also easy when they are querying from the ledger.
Networking issues for network infrastructure. So basically, what network infrastructure issues I faced was, so POD getting down. And in terms of Hyperledger Fabric, it's like when a port uses the storage completely, the storage is filled up with the storage we have allocated to in terms of persistent volumes in Kubernetes. So first, we need to check how to keep the alerts, first of all. We need to have alerts on top of the volumes. And once the alerts are triggered, after a particular threshold, we need to have the alerts. And once the alerts are reached, immediately, we need to expand the volume. That is one thing. And first of all, whatever it is, maybe the volume or any other thing, whether it's a certificate issue or a communication issue. Therefore, there should be specific alerts for everything. And apart from that, if any issues are there in terms of logical wise, like, let's say, chain code has failed. Chain code had a chain code stream terminated. So the port will restart anyway. So for even restarting the ports, there should be some alerts. And if any alert comes, first, priority is to see what are the locks and what was the reason why it failed. And we need to backtrace why it happened like that, and we need to immediately fix it. And apart from this, there can be certificate renewal issues also. So basically, we from Hyperledger Fabric, by default, the certificate has 1 year validity. So as I said before, there should be some alerting mechanism, which there should be some cron job, which runs over the, which reads the certificates which are issued for different entities like peers and orders and the users who are on top of the network. And once the expiry is nearing, then based on some threshold value, we need to have that. We need to take it into consideration and renew the certificate immediately. And apart from this, there could be communication issues. Communication issues can be like, let's say, 2 servers are running. Let's say a network is running on a different number of servers. So the servers can be distributed over the different cloud regions as well. Like, let's say, AWS has different regions, and one server is on one region, and the other is on another region. And there should be in some scenarios, there will be disaster recovery systems and high availability systems as well. So we should be in a position where, let's say, we have high availability and disaster recovery. So we should be in a position where if anything fails in the primary region, we need to immediately route the IP service or the user call to the secondary region, and we need to regularly check the switching off.
Network throughput test, the amount of transactions per second. So, how many transactions the network processes per second, right? So TPS, right? So it will, when the user hits any API call or any APIs on the network. Right? So our API service should be able to take it. Once taken, it should process to the client, and the client should be processed to the peers, the network hyperlogic fabric peers. And once that is done, it should get the response. Once it commits, the peer will give a response. It executes the chaincode function, whatever function the client is calling from the user, and it returns back the response. For all this process to happen, it will depend on this as one transaction, and this transaction process the number of processes of this parallelism is what we define as a throughput in Hyperledger Fabric. So to measure it, we will have tools like JMeter, where we can configure users as well as the payload. Right? So based on the number of users and the payload size, we can send the transactions. Let's say, if we configure JMeter to send 100 transactions per second, I mean, the input throughput is also something important. Right? So input throughput is like 100 users sending transactions parallelly in one second, which is 100 TPS in JMeter. Right? So that is one thing. And based on it, we can send one user per second, and you can even increase it based on the number of seconds. So this is one thing, and Hyperledger Caliper is a direct integration tool offered by Hyperledger. So, in this case, it's specifically used for Hyperledger projects, where you can define the hyperledger caliper based on the throughput and the number of users. This is something similar to JMeter, but personally, I felt JMeter was good. And even Caliper, it will send the transactions based on the payload we trigger, and based on that, it will measure the throughput and give us the result. And based on that, we need to take action. And if the measured throughput is very less in terms of the requirement, then we need to use the queuing mechanism because blockchain transactions will take time to process. Definitely, it's not like just writing data on top of our database; it's just to process and hash it.
So to test a network for weaknesses, first thing I will do is I will call the DNS records. So in terms of Kubernetes, we will have a number of ingresses defined. Right? So those ingresses, which are there, can be available on public as well. Let's say, we have defined an ingress for CouchDB. So if that CouchDB ingress endpoint is public, then it's not necessary to have a private blockchain. Right? So basically, I will first test the communications to see if they are secured or not, whether the nodes, the servers are inside the private cloud, which we call a virtual private cloud VPC. Right? And every node, every server will be and should be part of that VPC so that no external communication will hit the servers. And to access the servers, I mean, for the servers to access the public network. Like, let's say, for installing dependencies on servers, we need to have public access. So servers need to have public access so that they can get the dependencies and install and update their repositories. Right? Whatever tools they are using, like Internet-based network tools or VIM or whatever. Just for example, like Docker or Kubernetes. Whatever. So for this, we need to have a NAT gateway. So where our virtual machines can use the NAT gateway to access the rerouted routes, the subnets, whatever are created inside the VPC can be routed to the NAT. And so whatever internal communication or they are sending outbound traffic will be through the NAT gateway. And that NAT gateway, if at all, there is another machine which is expected to have the traffic of this VPC cloud, this NAT gateway can be white-listed on top of that load balancer. Right? So like this, there should be some private communication between the servers, and it should be a secured network. And still we need to have different channels for having public access. this is one thing. So and for testing the tools, this is one thing. So tools coming to the tools, I will use Dig. Dig is one tool which pings the DNS and gets the names and namespaces over. Either ping or curl, whatever it is. And other weaknesses could be like, security credentials, right? So credentials should be stored in a secure place. So certificates even. So the certificates, the private key certificates of the nodes should be stored in certain places like a vault or secret key manager. So we should have these credentials securely saved at one place so that these secure credentials will not be compromised.
Hyperledger Fabric is a permissioned private blockchain platform offered, and we have many other blockchain platforms like Polygon, Ethereum, Bitcoin. So, all these are public chains, but coming to Hyperledger Fabric, it's completely private and permission-based. The term permission is used because it can have its own certificate authority, and we can get the certificates based on the certificate authority which we have, which we bootstrap along with the network as a part of it. So, the certificates are completely under our control. And, in Hyperledger Fabric, we can say that since it is private, not all can come and do the transaction on top of Hyperledger Fabric. Only the users with certificates issued by each certificate authority can do the transaction. I mean, the certificate authority can be different. So, we can have different certificate authorities in between, and we can even have intermediate certificates as well. Apart from that, we can have control over the nodes, peers, I mean, nodes in the sense, peers and orders. Right? So, smart contract is something which has its own importance. We can even write business contracts, and we can use this on a private blockchain. If you want to use a private blockchain for your use case, then Hyperledger Fabric will be a good choice. Hyperledger Fabric is a project under the Hyperledger umbrella, and it is based on the Linux Foundation. So, basically, it's a public and permissioned blockchain system where you have your own certificate authority and its own peers to commit the transaction. It has its own order for consensus mechanism, right, which is an essential part of blockchain. And coming to the role in network infrastructure, Hyperledger Fabric documents suggest that it should use Kubernetes. So, coming to the network infrastructure, Hyperledger Fabric needs its own infrastructure. It can be the number of nodes and number of nodes can be based on the use case, which we can design. It is completely customizable, and it can have two architectures. One is the system channel architecture where the order will act as a separate organization, and it will have peers as the network organization, I mean, business organizations. Right? And it can also have a different architecture where every organization can be a business organization, and every organization can have its own order and peers. So, if every organization can define its number of peers, number of orders, based on the business requirement, and it is used for traceability use cases and most use cases in business.
So best practices. Without causing downtime, we need to have multiple nodes in the network. So coming to Hyperledger Fabric, let's say, there appear to be 2 peers: peer 1 and peer 2. If peer 2 is down or its volume is exhausted, or for some reason, peer 0 is down, still, the transactions are being processed by peer 1, which is the beauty of Hyperledger Fabric and doesn't cause downtime at all. Since peer 0 is running in parallel, we can troubleshoot what happened to peer 2 and bring it up. And after bringing it up, peer 2 will sync up with peer 0, provided they are in the same channel. Right? Apart from that, if the infrastructure has 3 different zones, which I explained earlier in one of the previous questions, let's say we have a primary and 2 other zones, and all the orders of the same network are distributed among these 3 zones, like primary in the US and one in India, then there will be no downtime at all. Orders are always in consensus, so even if one order fails, the other will run. And even if a peer is down among these 3 zones, then we can have a, I mean, a distributed architecture without any downtime. Right. One thing is, orders are always in consensus. So even if one order fails, the other will run. And if these 3 zones are distributed, then it will be easy without any downtime. And excuse me. One second. I'm having the interview, please. Yeah. Sorry about that. So yeah. So before, in terms of downtime, if we have a distributed architecture, there's no need for downtime in Hyperledger Fabric.
Performance metrics. Yeah. So Hyperledger fabric, Not only Hyperledger Fabric, there are Kubernetes also. So both Hyperledger Fabric and Kubernetes provide network metrics and performance metrics based on which we can read and analyze the network performance. Right? So there is one example, like number of blocks committed. There is a metric called number of blocks committed, and there is a metric in Hyperledger Fabric, which is provided by Hyperledger Fabric. So based on this metric, we can critically analyze the example. Yeah. So there is 1 metric in Kubernetes, which will give us the details of the number of ports alive. Right? So since we have the count of the number of ports, it should be run. And if the number of ports alive metric is less than that, then we can analyze or we can see that definitely there is 1 part which is down. So this is how these metrics happen. So, if any pod is down and it is not matching, it is less than the total number of ports which we have deployed. The Kubernetes metric will help us to understand that there is some problem in the ports running the number of running ports. Right? So let's say that metric is the number of ports alive. And even for storage, we have metrics like if the storage, if a certain part of storage metric is crossing the threshold which we have kept, and that storage metric can mean we can say that some volume is already done and we need to increase the volume of that particular persistent volume.
So yeah, I will always have a look at the protocols, networking. Let's say, take an example of HTTP. Right? So it's not just HTTP. Definitely, we need to use HTTPS where we need to provide a certificate. So an SSL certificate. It's always good to have an SSL certificate between two communication entities. I mean, on top of any layer. So, we use XeroSSL. We have used XeroSSL for the certificate issuer, and we have also used gRPC. In terms of Hyperledger Fabric, the internal communication happens over gRPC, where a TLS certificate will be whatever TLS certificate we have generated using a certificate authority. That will be used for communications between peers and auditors, and yeah. We're using I mean, even the connection profile, let's say, for example. So, when we create a transaction from the client, the client will consume the connection profile to interact with the discovery service. The discovery service will consume the network, I mean, the node entity, the endpoint, as well as the certificate, which is very important. So basically, every communication should have a certificate issued by a certificate authority, similar to SSL, and that should be kept. These are two protocols I came through, gRPC and HTTPS, and I always recommend having a certificate for communication between entities.
Innovative approach. So one thing is to provide more resources, CPU resources, and memory resources for the network. I mean, the ports in the Hyperledger fabric and to have a more performant network. So I'm not saying we need to give more and more CPU and memory. It's good to fine-tune it. Like, what is the CPU usage based on, I mean, while we test it, during testing performance? When we test the performance, we need to also keep in mind to measure what is the CPU used and the memory used. And based on that, we need to fine-tune our servers for ensuring maximum performance. And even at the pod level in Kubernetes, we need to have the limits set properly. That is one thing. And coming to the servers, enterprise server performance is one thing. We should have parallel processes if needed. So if we go with a Kubernetes approach, it's an all-parallel approach where different parts will be running, and every part will have its own CPU and memory. They will be using their resources parallelly. In terms of server performance, this is it. The CPU architecture also matters. So we need to select the CPU on which architecture is built, like Intel or Nvidia, or something else. We need to decide based on the performance and architecture of the CPU and how many cores it can process, like how many cores it has and how many threads it can process at a time. And also, the memory. Memory can be like SSD or hard disk type. If the server is attached to a SSD, then the performance is set to be improved for writing data. So if we have volumes based on the SSDs, then there will be significant improvement in transactions. Because the process is writing data based on the storage.