Senior Software Engineer (Orchestration)
Spydra TechnologiesResearch and Development Engineer
Kerala Blockchain AcademySAP HANA Database Administrator
Tata Consultancy ServicesHelm
Ansible
Kubernetes
Git
Terraform
AWS
Azure
GCP
Prometheus
Grafana
Jenkins
Docker
Yeah. I have started, uh, my career with teacher. So before that, I will help you with explaining my graduation details. So I have started I have pursued my bachelor's of technology in, uh, Fitness University, Gondor. And, uh, once that is done, I got a campus placement with Tata Consultancy Services, uh, got selected at the campus interview, and, uh, once got recruited, I got posting in uh, a train that, uh, Trivandrum and got posting in Bangalore in, uh, SAP project. So, uh, I have worked on a SAP project where I worked on a as a HANA database administrator for one and a half years. So where I learned, uh, how the production environment will be. And once that is done, I have learned that blockchain is something which helps, uh, the enterprise, uh, a lot and, especially in in terms of tracking and tracing. And with that, I have learned blockchain as my personal interest and once that is done based on my own personal project, I mean, I have applied for KELA Blockchain Academy, which is under a project of AAA TMK, KELA and now it is called as Digital University, KELA. So based on my profile, my interest and my personal project which I've done on Hyperledger Fabric, Uh, they they were interested, gave me offer, and I have worked there as a r and d engineer for a couple of years. Uh, there, I have, did as a trainer. I have trained the public directly and not only apart from training, I have worked as a network engineer in the, uh, in a couple of projects. 1 is Singapore based and another another one is customs. So department of excise and customs wanted to have a project on, uh, for the for recording their, uh, you know, custom whatever products they are importing. So they wanted to have a record on that, and they were using Hyperledger Fabric for that. And we have done, uh, this on our on premise. Now at present, I am working on Spydera Technologies where it is a platform where, uh, Hyperledger Fabric is used, and we have reengineered it to scale up the, uh, nodes. I mean, uh, bootstrap the network hyperledger fabric network in just uh, 5 minutes. Right? 5 to 10 minutes. So I have used, uh, Ansible and Terraform, and we have deployed that on Kubernetes AWS Cloud. So there we have, uh, engineered, like automated full orchestration of Hyperledger Fabric where creating of certificates, CA certificates, and, uh, creating the peers and orals and joining them. And, uh, deploying the, uh, chain code based on the chain code life cycle by the Hyperledger Fabric. So we are using Kubernetes. And not only that, we are also having different operations like creating non organization, creating peer, and all are automated, and it's just done, uh, with a single click. So end to end orchestration is being sorted out, uh, in terms of all the operations, uh, on the latest version of Hyperledger Fabric, which is 2.5.
So there was a situation where, uh, the, uh, I just explained in the situation of Khela Blockchain Academy, Triple ATM. Okay? The um, measurement of throughput, uh, which is TPS was not, uh, up to the mark. So for them, they were not satisfied with the, uh, the throughput which we have offered them. So what we have done is we have used an intermediary database, which is MongoDB. Uh, Hyperledger Fabric uses GaussDB, but that is not a performant database at all, which is given in the documentation itself. Right? So what we have done is we have, uh, modified our API service, which calls the chain codes such a way that it will first call it will first store the data in the Mongo database, which will offer, um, a throughput of 300 to 500 TPS. Right? And, uh, based after the client has done all the uh, the transactions. So the client will always, uh, keep the I mean, the user, the end user, whatever transactions are done, they those will be kept in, uh, the MongoDB. And we have introduced a Cron job in between so that, uh, the API service, the cron job, which is there, it will take a batch of transactions from the MongoDB. It will read the batch of transaction based on the state of the, uh, payload. So let's say, uh, a user has created a payload, and that payload has a failed status. Right? The the status when it is in MongoDB is, uh, inactive. Right? Uh, I'm just telling as an example. And once we, uh, commit that transactions on top of Hyperledger Fabric, then the state will be changed to active. Right? So it is on ledger now, on chain, and this MongoDB will be acted as off chain. So this is kind of a queuing service. Not only MongoDB, there can be another queuing sub service like RabbitMQ and all, uh, to have optimized, uh, measurements. So so when we have measured, uh, the throughput, we have used the, uh, hyperledger caliper as well as Jmeter, uh, so to that to measure the performance of the network. So, basically, uh, and not only that, uh, we have changed, optimized, or fine tuned the parameters like batch size of order, uh, and, uh, the size of the payload in the order constraints. Right? And this is how we have optimized, uh, and keeping a queue queuing mechanism in between so that we have a better performance at the user end, as well as they will have a off chain as well as on chain. Off chain is also used for proof of I mean, for proof of some it will act as a proof for the transaction as well as we are since we are showing on the on chain, uh, tracing and tracking will be also easy when they are querying, uh, from the ledger.
So networking, uh, issues for network infrastructure. So basically, what, uh, network infrastructure issues which I faced was, uh, so, uh, POD getting down. And, um, in terms of Hyperledger Fabric, it's like when a port uses the storage completely uses out the storage, the storage can is filled up whatever storage we have allocated to in terms of persistent volumes in Kubernetes. So first approach is we need to check how what is the we need to keep the alerts, first of all. We need to have alerts on top of the volumes. And once the alerts are, uh, so once the volume is done, uh, after a particular threshold, after, uh, we need to have the alerts. And once the alerts are reached, immediately, we need to expand the volume. That is one thing. And first of all, whatever it is, maybe the volume or any other thing, whether it is a certificate issue, whether it is a communication issue. Therefore, there should be specific alerts for everything. And apart from that, uh, if any issues are there for in terms of logical wise, like, let's say, chain code has failed. Right? Chain code, uh, had a chain code stream terminated at about something like that. So the port will restart anyways. So for even restarting the ports, uh, there should be some alerts. And if any alert comes, first, uh, priority is to see what are the locks and what for what what was the reason why it failed. Right? And, uh, we need to back trace why it happened like that, and we need to immediately fix it. And apart from this, there can be certificate renewal issues also. So basically, we, uh, from Hyperledger Fabric, by default, the certificate is having 1 year, uh, validity. Right? So so as I said before, there should be some alerting mechanism, which, uh, there should be some cron job, which runs over the, uh, which reads the certificates which are issued for different entities like peers and orders and, uh, the users who are on top of the network. And, uh, once the the, uh, I mean, expiry is nearing, then based on some threshold value, we need to have that. We need to take it, uh, consider a certificate renewal immediately. Right? And, uh, apart from this, there could be communication issues. Communications, uh, issues can be like, uh, let's say 2 servers are running. Uh, let's say a network is running on a different number of, um, servers. Right? So the servers can be distributed over the different cloud regions as well. Like, let's say, uh, AWS has different regions, and one server is on one region, other is on another region. And there should there will be, uh, in some scenarios, there will be disaster recovery systems and high availability systems as well. So we should be in a position where, let's say, we have and their high availability and their, uh, disaster recovery. So we should be in a position where if anything, uh, fails in the primary region, we need to immediately the IP service or the user call should be immediately routed towards, uh, or Doctor, and we need to uh, regularly check the switching off
Network throughput test, uh, the amount of, uh, let's say, it's just the in terms of hyperdigitated fabric, it's the transactions per second. So how many transactions does the network accepts per second, right? Not accepts processes per seconds, right? So TPS, right? So it will, uh, so when the user hits, uh, any API call or, I mean, any APIs on the, uh, the network. Right? So our API service should be able to take it. Once, uh, taken, uh, it should process to the client of, um, and the client should be processed to the peers, the network hyper logic fabric peers. And once that is done, it should, uh, it should get the response. Once it commits, the pair will give a response. It executes the chaincode function whatever function the client is calling from the user, and it returns back the response. For all this process to happen, it will depend this as 1 transaction and this transaction this process the number of processes of this paralleling is what we define as a throughput in Hyperledger Fabric. Uh, right? So to measure it, we will we have tools like JMeter where we will be sending, uh, where where we can configure users as well as the payload. Right? So based on number of users and number of, uh, the payload size, we can send the transactions. Let's say, if we configure JMeter to send 100 transactions per a second, I mean, going, uh, I mean, not only the response, input throughput is also something important. Right? So input throughput is like 100 users sending, uh, transactions parallelly in one second is what? 100 TPS in a G meter defense. Right? So that is one thing. And based on it, so even we can send 1 user per second, and you can even increase it based on the time second, uh, number of second. So this is one thing, and, uh, Hyperledger caliper is direct integration. It's a tool offered by Hyperledger, uh, umbrella project itself. So, so in, this is specifically used for hyperledger projects where you can define the, I mean, define, uh, I mean, configure the hyperledger caliper based on the throughput and the number of users. This is something similar to Jmeter, but, uh, personally, I felt Jmeter was good. And even, uh, caliper also, like, it will send the transactions based on the payload which we trigger I mean, provide. And based on that, it will measure the throughput and give us the result. And based on that, we need to take what could be done. And if the measured throughput is very less in terms of the requirement, then as I said before, we need to use the queuing mechanism because blockchain transactions will take time to process. Definitely, it's not like just writing a data on top of our database. It's just to process and hashing it also.
So to test a network, uh, for weaknesses, first thing I will do is I will call the DNS records. So in terms of Kubernetes, we will have number of ingresses defined. Right? So those ingress, uh, endpoints which are there can be available on public as well. Let's say, we have defined, uh, ingress for CouchDB. So if that CouchDB ingress the endpoint is public, then it's it can be, uh, no use of having a private blockchain. Right? So so basically, I will first test the communications are secured or not whether whether the nodes, the servers are, uh, inside the, uh, private cloud, which we call as virtual private cloud VPC. Right? And every node, every servers will be and should be under part of that VPC so that no external communication will come and hit the servers. And to access the servers, I mean, for to act to servers access the public network. Like, let's say, for installing the dependencies on servers, we need to have the public, uh, access. So servers needs to have public access so that they will get the dependencies and install and update their, uh, repositories. Right? Whatever tools they are using, Internet, uh, in servers, uh, using Internet, like like, uh, network tools or VIM or whatever. Just for example, like Docker. Right? Or Kubernetes. Whatever. So so for this, we need to have a NAT gateway. Right? So where our, uh, virtual machines can be using the NAT gateway to access the rerouted. The routes, the subnets, whatever are created inside the VPC can be routed to NAT. And so whatever, uh, internal communication or whatever, uh, uh, they are sending outbound traffic will be there, That will be through the NAT gateway. And that NAT gateway, if at all, there is another, uh, machine which is expected to have the traffic of this may VPC cloud, this NAT gateway can be, uh, white listed on top of that load balancer. Right? So like this, there should be some private communication between the servers, and it should be a secured network. And still we need to have different channels for having public access. Uh, this is one thing. So and and for testing the tools, this is one thing. So tools coming to the tools, I will use Dig. Dig is one tool which, uh, pings the DNS and get the, uh, names namespaces over. Either even ping or curl, whatever it is. And, uh, we other weaknesses could be like, Uh, security, uh, credentials, right? So credentials should be, uh, more often be stored in a secure place. So, uh, certificates even. So the certificates the private key certificates of the nodes should be, uh, stored in certain, uh, places like vault or secret key manager. So we should have these credentials securely saved at one place so that, uh, these secure, uh, these credentials will be not,
Hyperledger Fabric is a permissioned private blockchain platform offer, and we we have many other blockchain platforms like Polygon, Ethereum, Bitcoin. Right? So all these are a public change, but coming to Hetero Logic Fabric, it's completely private time permission. And the term permission, I'm saying because it can have its own certificate authority, and, uh, we can get the certificates based on, uh, the certificate authority which we have, which we bootstrap along with the as a part of network. So so the certificates are completely on on our control. And and, uh, in hyper larger fabric, we can, So as I said, I have, uh, I, uh, so since it is private, not all can come and do the transaction on top of Hyperledger Fabric. Only the users which are having certificates issued by each certificate authority. I mean, uh, certificate authority can be different. So it, uh, we can have different certificate authorities in between, and the week even we can have intermediate, uh, intermediary certificate, uh, that also. So apart from that, it will we can have control over the notes, peers, uh, peers, uh, I mean, nodes in the sense, peers and orders. Right? So um, so and smart contract is something which, uh, has its own importance. So we can even write this business contract, and we whatever public sorry. Whatever private blockchain I mean, if you if you want to your use case needs a private blockchain, then definitely, Hyperledger Fabric will be a go through and Hyperledger Fabric is a project under Hyperledger umbrella, right, and it is based on the Linux Foundation. So basically, on a whole, it's a public, uh, and permissioned blockchain system where you have certificate authority its own certificate authority. It it has its own peers to commit the transaction. It has orders for consensus mechanism, right, which is essential part of, uh, blockchain. And coming to the role in network infrastructure, uh, Hyperledger Fabric document suggests that it to use the Kubernetes. Right? So hyper Right. So coming to the network infrastructure, Hyperledger Fabric needs its own infrastructure. Uh, and, uh, it can be, uh, the number of nodes and number of nodes can be, uh, based on the use case, which we can design. It is completely customizable, network. It can have 2 architectures. 1 is system channel architecture where order will act as a separate organization, and it will have, uh, peers as the network organization I mean, business organizations. Right? And it can also have the different architecture where every organization can be a business organization, and every organization can have it can have its own order and peers. So if every organization can define its number of peers, number of orders, uh, based on the business requirement, and it is used for traceability use cases and most of the use cases in the business.
So best practices. So without causing downtime in the sense, definitely, we need to have multiple, um, nodes, uh, in the network. So coming to Hyperledger Fabric, let's say, appears there are 2 peers. So let's say, peer 1 and peer 2. So peer 2 is down for some time or its volume got exhausted. Or some for some reason, let's say, peer 0 pays 1. Still still, the uh, transactions are being processed by peer 1, which is what that is the beauty of Hyperledger Fabric and which doesn't cause the downtime at all. So since PL zero one is, uh, running on parallel, we can, uh, troubleshoot what happened to peer 2 and bring it up. And after bringing up bringing it up, peer 2 will be syncing up with PR 0 one provided they are in the same channel. Right? So, uh, apart from that, uh, if the infrastructure is having 3 different zones, which I explained earlier in one of the previous questions. So let's say we have a primary and TR and orders of all the, I mean, orders of the same network are distributed among the different, uh, I mean, uh, between these 3 zones like primary hedge, India, then there's there will be no downtime at all. So there can be no issue of downtime at all. One thing is, orders are always in consensus. So even 1 order fails, the other will be running. And, uh, even it applies for the peer if the peer is disputed among these 3 uh, 3, uh, zones, then, uh, we can have the, uh, I mean, uh, what do you call I mean, if these 3 are distributed among the, uh, what do you call it? The zones, then it will be easy without any having any downtime. Uh, right. And excuse me. One second. I'm having the interview, please. Yeah. Sorry about that. Uh, hey. My roommate has don't didn't know it as a, uh, interview. Sorry about that, please. So yeah. So before, uh, so based on the downtime, so if we have multiple architecture distributed architecture, there is no need of downtime, uh, in the hyperledger fabric.
Performance metrics. Yeah. So Hyperledger fabric, Not only Hyperledger Fabric, there are Kubernetes also. So both Hyperledger Fabric and Kubernetes provide network metrics performance metrics based on, uh, which we can read and analyze the, uh, the network performance. Right? So so there is, uh, one example, like, uh, number of blocks committed. Uh, there is a matrix called number of blocks committed, and there is a in in Hyperledger Fabric, which is provided by Hyperledger Fabric. So based on this metrics, we can, but, um, so there should be an example to critical analyze. Example. Example. Yeah. So there is there is 1 metric in Kubernetes, which will give us the details of number of ports alive. Right? So since we have the count of number of ports, it should be ran. And if the number of ports alive matrix is less than that, then we can analyze or we can see that definitely that there is 1 part which is down. So this is how we, uh, these metrics happen. So, uh, so if if any pod is down and it is not matching, it is less than the total number of ports which we have deployed. That Kubernetes metric will help us to understand that there is some problem in the ports running the number of running ports. Right? So let's say, uh, that metric is, uh, a number of ports alive. And even for storage, we have metrics like if the storage, uh, if a certain part of storage metric is is having, uh, is crossing the threshold which we have kept and that storage metric can I mean, we can say that some volume is already done and we need to increase the volume of that, uh, of a of a particular persistent
So yeah. So I will always, like, have a look on the protocols, uh, networking. Like, uh, let's let's say, take a example of HTTP. Right? So it's not just like HTTP. Definitely, we need to use HTTPS where we need to provide the certificate. So SSL certificate. So it's always good to have us SSL certificate between, uh, between, uh, 2 communication entities. I mean, on top of any layer. So, uh, we I I use XeroSSL. We have used XeroSSL for the certificate issuer, and we have also used gRPC. So in terms of Hyperledger Fabric, uh, the internal communication happens over the gRPCS where TLS certificate will be whatever TLS certificate we have generated using certificate authority, that will be used for, uh, between the communications between the peers and auditors and yeah. And we're using I mean, even the connection profile, let's say, for example. So, uh, when we create a transaction, uh, we from the client, and client will consume the connection profile to interact with the discovery service. So that discovery the discovery service service will consume the hyper I mean, uh, the network sorry, the node entity sorry, the endpoint, entity's endpoint, as well as the certificate which is which is very important. So basically, on a brief, every communication should have a certificate uh, uh, authority certificate authorities, issued certificate, basically, like, similar to SSL, and that should be keeping. So these are 2 protocols which I came through, gRPCES and HTTPS, and which I always recommend to have certificate for the communication between.
Innovative approach. So one thing is to provide more resources, CPU resources, and, uh, and memory resources for the network. I mean, the ports in the Hyperledger fabric and, uh, to have a more performance. So which I'm not saying, like, we need to give more and more CPU and memory. It's good if we need to always fine tune it. Like, what is the CPU usage based on I mean, while we test it, when we have a testing, uh, performance. Right? When we test the performance, we need to also keep in mind to measure what is the CPU used and the memory used. And based on that, we need to fine tune our servers for, uh, ensuring maximum performance. And even at the poll level in Kubernetes, we need to have more, uh, I mean, the limit should be set properly. Uh, it should be evaluated and set properly. Um, that is one thing. And coming to the, uh, servers, uh, enterprise server's performance is one thing. This is one thing. And and also, We should have parallel, uh, processes if needed. So if we go with Kubernetes approach, it's all anyways a parallel approach where different parts will be running and every part will have its own CPU and memory. And they will be using their own, uh, things parallelly. And uh, and also, um, so in terms of server performance, yeah. I think this is it. And, also, the CPU architecture also matters. So we need to select the what is the CPU on which architecture is built in and, like, it can be Intel or Nvidia or, uh, something else. So So we need to decide based on the performance and architecture of the CPU and how many cores it can process, um, like how many cores it has and how many, uh, threads it can process at a time. And, uh, also the memory. Memory, it can be like, uh, SSD or hard disk type. So if the server is attached to a SSD, then the performance is set to be improved for writing the data. Right? So if we have a volumes based on the SSDs, then, um, and there's there there will be significant improvement in the transaction. So because uh, the process is writing the data, uh, based on the, uh, the storage