
Data Scientist
INCIF Technologies Pvt. LtdAssociate Consultant (DS)
Capgemini Technologies
AWS

DevOps

Tableau

ETL

Linux

Html

CSS

Bootstrap

Git
.png)
Docker
Pyspark

Airflow
.png)
Heroku
.png)
Jenkins

EC2

VPC

EBS

S3

Postman

Microsoft Azure
Yes. Could you help me understand more about background by giving a brief introduction of yourself. Yes, sure. Hello. First of all, thank you for giving this opportunity to introduce myself. I'm Shumitin Ozre. Sorry, I think there was an interruption. Okay. So, let me continue from the first. Yes. Thank you for giving this opportunity to introduce myself. I'm Shivniti Nozary. I'm delighted to introduce myself as a data scientist with 3 years of experience in the IT industry. I completed my BSc from Pune University in 2019, and recently, I completed a postgraduation diploma from the University of Texas at Austin. My passion for data analysis and problem solving led me to pursue a career in this ever-evolving and dynamic field. I work on diverse projects, from predictive modeling to data-driven business strategies. I work on diverse projects from predictive modeling to data-driven business strategies. I excel at extracting value and insights from complex data with various tools and technologies, including Python, TensorFlow, PyTorch, PySpark, GitHub, Django, Docker, Computer Vision, NoSQL, SQL, deep learning, machine learning, MLOps, and I'm well-versed in communicating technical findings to non-technical stakeholders, also making data-driven decisions within the organization. And, yes, I'm super excited about the opportunity to continue contributing my expertise and driving a data-driven innovation role in Macno. Yeah. And I work on various project domains, like telecom, e-commerce, and payment. And my contribution in that is leveraging my expertise in Python, applying statistical analysis, data manipulation techniques to optimize and analyze workflows. Also, I contributed by collaborating with cross-functional teams to ensure alignment of data science initiatives and project objectives. Also, I employed ETL techniques.
Providing an example, how would you implement a sequence to sequence model in TensorFlow for machine translation task? To implement a sequence to sequence model in TensorFlow for machine translation, there are various steps actually. So, data preparation is there. After that, to implement the sequence to sequence model in TensorFlow for machine translation, we follow these steps. Data preparation is there. Model architecture is next. After that, define the model. Then, training and evaluation. So, tokenize and preprocess. You are now a source and target text. Use a TensorFlow tokenizer and pad sequences to prepare input sequences and target sequences. Model architecture, in that, we have the sub-steps: encoder and decoder. The encoder is used with an LSTM or GRU layer to encode the input sequence into a fixed-size context vector. We can also stack multiple layers for better performance. The decoder uses another LSTM or GRU layer with an attention mechanism to generate the output sequence. After that, we define the model by coding part. Then, the last one is, as I said, is training and evaluation. So, train the model using pairs of source and target sequences. Evaluate the performance using metrics like BLEU score for translation quality.
So how would you benchmark the performance of a NoSQL database against SQL when dealing with large unstructured datasets using Python? Okay, actually, there are also steps we can follow to set up and configure the NoSQL database. Choose a NoSQL database, for example, MongoDB or Cassandra. All we can do is set it up. After that, choose an SQL database, for example, PostgreSQL or MySQL, and set it up. The second part is data preparation. In this, we can generate detailed data and create a large unstructured dataset to use as a benchmark. This could be a collection of documents with varied fields for NoSQL and similar tables with large volumes of rows for SQL. Benchmarking tasks include measuring insertion performance, the time taken to insert a large number of records or documents, and query performance, executing various queries, such as simple retrievals and complex aggregations, and measuring response times for both databases. We can also test update performance and deletion performance. With Python, we can use a code example to analyze the results and prepare performance metrics in terms of insertion time and query response time to determine which database performs better under the given conditions. We should consider factors like scalability, ease of use, and specific use case requirements in addition to raw performance metrics. And when considering this, we should ensure the environment, hardware, and network are consistent when running benchmarks. After that, test with a variety of operations and test sizes to get a comprehensive view of performance.
So what factors will you consider when choosing between convulational neural networks and recurrent neural networks in computer vision task. So what factors would you consider when choosing between convolutional neural networks and recurrent neural networks in computer vision task? wait. And, unable to recall it. what factors would you consider when choosing between convolutional neural network and recurrent neural networks in computer vision task? What factors would you consider when choosing between convolutional neural networks and recurrent neural networks in competition task? Something here. What is going on?
K. Which Python classes or frameworks will assist you in developing an anomaly detection system with PyTorch, and what will be your validation strategy? Okay. strategy, we can now follow various steps in that. 1st step is import necessary libraries after that, generating synthetic data, creating sequences, defining the autoencoder model, converting sequences in PyTorch sensors. No? After that yeah. This is the steps we can follow.
Which Python tools you would use for text tokenization and sentiment analysis in an NLP pipeline, And why would you choose them? According to SoundScrapers, text block Okay. So in that case, we use, text blob. Text blob is a must for developers who are, starting NLP in Python and want to make, want to make the most of their first encounter with NLTK. It provides beginners with an easy interface to help them learn the most, basic NLP tasks like sentiment analysis, postaging, or noun phrase extraction. Yeah.
Oh, give the following Python code snippet. What is the issue that will prevent, it from currently creating and machine learning model pipeline? I'll show the original code in, the issue in the original code. No? there is improper importer. A typo is there. From sklearn.svmimport s v, s v c is correct, but the pipeline definition, s v c should be replaced with s, capital s, SVC. So the correct last name is, capital SVC with uppercase, we can say, not a small smaller case. Syntax error in pipeline steps. In the original code. the pipeline steps are correct incorrectly formatted. You have, tested a test in a instead of a proper comma. No? So and separated and incorrect brackets bracket usage. It will be a list of tuples, with each tuple containing the name of step and the corresponding estimator or transformers. And, assuming x train and y train are defined, while not syntax error in that ensures that the x train, y train are properly defined and contained the data you intend to use for fitting the model. So yeah. So there are several issues. So yeah.
The import statement is there. No, it's that we have to correct it. Import, light 3. It should be on a separate line, and the corrected line is from Flask import Flask, JSONify. And the second one is Flask app initialization. App = Flask(__name__) using the '==' operator to assign the Flask instance to the variable app and the method __init__ formatting. So __init__ = lambda: [app = Flask(__name__)], use normal code and make sure the list is properly formatted. Data fetching and return. So data = db.fetch_all() should be data = db.fetch() assign the result of those two. Okay. And also, jsonify data as soon as data is in a format that can be directly serialized to JSON. A SQLite fetch returns a list of tuples which need conversion to a JSON serializable format. For example, convert it to a list of dictionaries if required. So yeah. Also, we can add some additional considerations. So database column names to convert the query results into dictionaries with column names. You might need to know or retrieve column names from the cursor descriptions. Error handling for production and we should consider adding error handling to manage exceptions during database operations.
We can devise a Python workflow that applies both deep learning and NLP techniques to extract insights from visual and textual data simultaneously. Yes, we can use Fastai, a Python-based open source machine learning framework that offers a high-level abstraction of deep learning model training. And, yes, so we can devise a Python workflow that applies both deep learning and NLP techniques. There are various methods and sources we have. After PyTorch, TensorFlow is a popular choice, and Keras is also widely used. OpenCV is also a useful library for computer vision tasks.
Can you illustrate how version control with Jira would add in collaboration for a remote data science team deploying a TensorFlow model? Actually, Git is a version control system that tracks file changes. GitHub is a platform that allows developers to collaborate and store their code in the cloud. So think of it this way: Git is responsible for everything GitHub-related that happens locally on your computer. So, that's the basic main reason we can illustrate version control with the help of GitHub. We can also use no version control device as an alternative to GitHub. This allows you to integrate various local machines as a developer, so you can work together. We get to know the status of the task completion, pending tasks, and requirements, as well as any changes that have happened. Git, GitLab, and other platforms allow us to track these records.
Deleting the columns with the machine data missing data. In this case, let's delete the column edge and then, feed the model, and check the accuracy. This is one method. After that, another imputation method is there. Filling the missing values is there. K-NN is there. Dealing with the row with missing data, which no. Suppose the column has more than half of its data missing or null values, then we can just delete or drop the whole column from the database. So, on that basis, we deal with the missing values and the corrupted data. K-NN is there. Also, we can also replace with the specific mean or median based on the type of data we get. So, on that basis, we replace the values with mean and median values accordingly.