
Database Developer
InfosysDatabase Developer
SunTrust MortgageDatabase Developer
TELUS MVNE
Microsoft Visual Studio

Business Intelligence

Python

C#
Jira
Hi, so I have around 5-6 years of experience in SQL, and this was my third company. My last company was Capgemini, and I worked with the financial, healthcare, and telecom industries. My first job was with Infosys as an ETL developer, where I worked in the ETL sector on a financial project called SunTrust. I was there as an ETL developer, then promoted to senior ETL developer. After that, I moved to Accenture as an Application Development Analyst and worked there for about 8 months. However, I had personal constraints and couldn't move out of the location, so I had to stay back in Bangalore and leave the company. I then moved to my last company, Capgemini, where I worked on Abbott, a healthcare solutions project. I was an ETL developer mostly, and I worked on SQL and ETL processes along with it. That's my background, thank you.
I have done data validations with respect to the ETL, I mean, after doing the upgrade or including a change in the project. Coming to the validation tool, we used to prepare validation scripts pertaining to the changes we made and the data stability and data validation. This involved SQL jobs and everything that ran during the process, wherever the change was involved or if it was a monitoring of SQL jobs or daily running packages or scripts. We would then validate that based on the data flow of the change from end to end and also the time taken along with it.
When it comes to handling data conflicts and vendor conflicts, we would initially identify the kind of data we are dealing with and the end result we want to achieve. When combining data from multiple sources, we would assess the data types, storage locations, and potential conflicts. To resolve data conflicts, we would consider the following steps: When there is a change in source, we would evaluate the data types and storage locations to determine the best approach. We would then decide whether to add data to the other source, remove data from the other source, or derive new columns through processing. We would take a data-driven approach to resolve vendor conflicts, focusing on the specific requirements and outcomes we want to achieve.
It requires both OLTP and OLAP operations on the same data set. So it depends on the type of system we're going to define. If we have the data accumulated for OLTP and OLAP separately or we're going to use the raw data present in the data lake, then we'll determine if we need to perform transformations or analysis on it. It depends on the type of data we're getting, and based on the requirement, we would prefer the data to be a OLTP or OLAP source, or OLAP result, and then we'll move forward with whatever is required on it.
So most of the time, we mostly use schema stability. We would avoid doing any major schema changes and then make all the changes, if we are proceeding with the schema changes. Then, we try to maintain stability by giving the schema to all the database references of the schema, and then we proceed with the changes as required based on the business needs.
The most efficient indexes created for query performance. The indexes created on a database or a table depend on the usage of indexes, most probably the type of data we are storing. So indexes are mostly used to reduce the search time or to reduce table scanning of the data. Where the usage of an index will reduce the vast time being used to fetch data or do operations on data. So the primary thing would be to use the viable indexes which would improve the performance of the tables, database, or any process.
In this example, there is a potential documentation issue when dealing with large sizes of data. What it is, I suggest how to optimize it. The main issue in this scenario would be searching for a certain order ID, and if it has multiple unit prices or multiple entries in the same table, it would need to calculate all of them and also multiply the unit price by the quantity, then give the sum of the total money. And when there is a performance issue when dealing with large data sets, it comes down to the multiplication and the summation of the products. To improve this, we would prefer to add a group by clause and minimize the summation before doing the unit price and quantity calculation, which would improve the performance of the machine.
If you identify the debit and tax issue, run the code from executing successfully multiple times. When there is an alias defined as a price category, we cannot use it in a WHERE clause where the price category is just defined and we are giving it as 'expensive', and then the query would directly throw an error saying that the alias is not defined in that scope. The error would occur in an affordable situation where the list price is not greater than 1000, then it would throw an error and that would cause an issue with it.
So for the applications, wherever the recovery is required, then we would prefer to have a snapshot of previous installations or the working conditions so that we will be able to do the database recovery or else we can also have a disaster recovery database separately for all the databases or all the systems where we would be able to recover the data. Archival is also one more kind of a recovery procedure or recovery tool where you can store the data of vast amounts in a ready archive database where the reference would be there and then we can take it from the archival database on the data whichever is required. If the runtime on the main database is too much, then we would go for archival. If there's a chance of crashing, then we would go for disaster recovery.
The basic thing when it comes to SQL Server Analysis Services, SSAS, in database solutions would be to accumulate the data and then analyze the data on business requirements wherever it is required. And when it comes to the analysis of the data, it majorly depends on the usage of data and the type of data that is coming in from various sources and the type of data that is going to various downstreams as well. And most of the time, the data would be varied from source to destination or destination to lower environments where the data is needed for screening, reporting, or for various uses. So, basically, analysis would play a major role in determining what has to be done and how it can be achieved in a simpler way, and it also helps in creating a predefined structure to analyze the data and process it in an efficient way.
We've been using a relational database management system to store our data. The schema is designed to support our current application, but we're considering a migration to a NoSQL database. The main advantage of a NoSQL database is its ability to handle large amounts of unstructured data, which is becoming increasingly important for our application. In terms of SQL code, we're using stored procedures to encapsulate complex logic and improve performance. However, we're also looking at using more modern SQL features, such as Common Table Expressions (CTEs) and window functions, to simplify our queries and improve readability. One area where we're struggling is with data normalization. While it's essential for maintaining data integrity, it can sometimes make it difficult to query and join related data. We're exploring ways to balance normalization with query performance, such as using denormalization techniques or indexing frequently used columns. Overall, our database design is serving us well, but we're always looking for ways to improve and adapt to changing requirements.