profile-pic
Vetted Talent

Shivam Nitin Vazare

Vetted Talent
A challenging carrier as Data Scientist / Web Developer where my Python Machine Learning / Data Intelligence/ Django REST skills can be effectively used and upgraded. Data Scientist with strong Statistics and Mathematics background and Overall 3 years of experience using Predictive Modeling, Data Processing, and Data Mining Algorithms to solve challenging business problems. Involved in Python Open Source Community and passionate about Deep Reinforcement Learning. Looking for a challenging career in the field of IT-Software Industry especially for roles such as Django REST /Data Scientist/ML/AI +Python Programming where my strong a SQL and UNIX knowledge and experience in Programming Concepts and Methodologies in Software Development are shared and my all-rounder development is encouraged.
  • Role

    Data Scientist

  • Years of Experience

    3 years

  • Professional Portfolio

    View here

Skillsets

  • XML
  • Python - 3.1 Years
  • SciPy
  • Seaborn
  • SOAP
  • SQL
  • Tableau
  • TCP/IP
  • TensorFlow - 3.1 Years
  • Unix
  • Websphere
  • PySpark
  • Kml
  • Beautiful Soup
  • MapReduce
  • WebLogic
  • Computer Vision - 3.1 Years
  • NO SQL - 3.1 Years
  • Deep Learning - 3.1 Years
  • PyTorch - 3.1 Years
  • NLP - 3.1 Years
  • HTML
  • AWS Cloud Computing
  • Bootstrap
  • CSS
  • DHTML
  • Django REST
  • ETL
  • Hadoop
  • HDFS
  • Hive
  • Apache Tomcat
  • HTTP/HTTPs
  • JavaScript
  • JBoss
  • jQuery
  • JSON
  • Matplotlib
  • Mongo DB
  • NumPy
  • pandas

Vetted For

12Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Data Scientist (Remote)AI Screening
  • 57%
    icon-arrow-down
  • Skills assessed :Communication Skills, Jira, Retrieval-Augmented Generation, Computer Vision, Deep Learning, PyTorch, TensorFlow, GitLab, machine_learning, NLP, NO SQL, Python
  • Score: 51/90

Professional Summary

3Years
  • Jun, 2023 - Present3 yr

    Data Scientist

    INCIF Technologies Pvt. Ltd
  • Jul, 2021 - May, 20231 yr 10 months

    Associate Consultant (DS)

    Capgemini Technologies

Applications & Tools Known

  • icon-tool

    AWS

  • icon-tool

    DevOps

  • icon-tool

    Tableau

  • icon-tool

    ETL

  • icon-tool

    Linux

  • icon-tool

    Html

  • icon-tool

    CSS

  • icon-tool

    Bootstrap

  • icon-tool

    Git

  • icon-tool

    Docker

  • icon-tool

    Pyspark

  • icon-tool

    Airflow

  • icon-tool

    Heroku

  • icon-tool

    Jenkins

  • icon-tool

    EC2

  • icon-tool

    VPC

  • icon-tool

    EBS

  • icon-tool

    S3

  • icon-tool

    Postman

  • icon-tool

    Microsoft Azure

Work History

3Years

Data Scientist

INCIF Technologies Pvt. Ltd
Jun, 2023 - Present3 yr
    Highly efficient Data Scientist/Data Analyst with 3+ years of experience in Data Analysis, Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Data Cleaning, Data Engineering, Features Scaling, Features Engineering, Statistical Modeling, Dimensionality Reduction, Testing and Validation, Data Visualization.

Associate Consultant (DS)

Capgemini Technologies
Jul, 2021 - May, 20231 yr 10 months
    Experience in design, development, testing and implementation of various stand-alone and client-server architecture-based enterprise application software in Python on different domains. Managed the entire data science project life cycle including Data Acquisition, Data Cleaning, Data Engineering, Features Scaling, Features Engineering, Statistical Modeling, Dimensionality Reduction, Testing and Validation, and Data Visualization.

Achievements

  • Received Star Performer Award for good performance.
  • Received appreciation for E2E Delivery from client Mercado Liber, Mexico

Major Projects

2Projects

Omni-Channel Merchandising

    This E-Commerce analytics solution is a key component of the digital transformation of businesses. It makes it possible to track the customer journey across Omni-channel touch points and build a comprehensive view of what drives revenue. This insight informs better business decision-making.

Cloud Networking for Industrial Ethernet Switches - [ CNofIES ]

Jun, 2023 - Present3 yr
    Cisco's Catalyst IE3400 Rugged Series switches combine full Gigabit Ethernet switch solutions with advanced features in a modular, future-proof design. Expandable up to 26 ports in a compact form factor, these rugged switches are optimized for size and power, and bring Cisco intent-based networking to Industrial Ethernet applications. Provides secure access for new high-speed applications in the industrial space.

Education

  • PG in Artificial Intelligence And Machine Learning

    Pravara College Of Engineering, Ahmednagar (2023)
  • Bachelors Of Engineering (B.E.)

    Pravara College Of Engineering, Ahmednagar (2019)

Certifications

  • Google cloud data and machine learning fundamentals

  • Domain foundation by automation academy

  • Power bi by automation academy foundation level certification

  • Python programming by great learning 2023

  • Introduction to cyber-security

  • Aws cloud practitioner essentials

  • Istqb foundation level 1 certificated

Interests

  • Driving
  • Bike Rides
  • Technology Research
  • Youtube Learning
  • Travelling
  • AI-interview Questions & Answers

    Yes. Could you help me understand more about background by giving a brief introduction of yourself. Yes, sure. Hello. First of all, thank you for giving this opportunity to introduce myself. I'm Shumitin Ozre. Sorry, I think there was an interruption. Okay. So, let me continue from the first. Yes. Thank you for giving this opportunity to introduce myself. I'm Shivniti Nozary. I'm delighted to introduce myself as a data scientist with 3 years of experience in the IT industry. I completed my BSc from Pune University in 2019, and recently, I completed a postgraduation diploma from the University of Texas at Austin. My passion for data analysis and problem solving led me to pursue a career in this ever-evolving and dynamic field. I work on diverse projects, from predictive modeling to data-driven business strategies. I work on diverse projects from predictive modeling to data-driven business strategies. I excel at extracting value and insights from complex data with various tools and technologies, including Python, TensorFlow, PyTorch, PySpark, GitHub, Django, Docker, Computer Vision, NoSQL, SQL, deep learning, machine learning, MLOps, and I'm well-versed in communicating technical findings to non-technical stakeholders, also making data-driven decisions within the organization. And, yes, I'm super excited about the opportunity to continue contributing my expertise and driving a data-driven innovation role in Macno. Yeah. And I work on various project domains, like telecom, e-commerce, and payment. And my contribution in that is leveraging my expertise in Python, applying statistical analysis, data manipulation techniques to optimize and analyze workflows. Also, I contributed by collaborating with cross-functional teams to ensure alignment of data science initiatives and project objectives. Also, I employed ETL techniques.

    Providing an example, how would you implement a sequence to sequence model in TensorFlow for machine translation task? To implement a sequence to sequence model in TensorFlow for machine translation, there are various steps actually. So, data preparation is there. After that, to implement the sequence to sequence model in TensorFlow for machine translation, we follow these steps. Data preparation is there. Model architecture is next. After that, define the model. Then, training and evaluation. So, tokenize and preprocess. You are now a source and target text. Use a TensorFlow tokenizer and pad sequences to prepare input sequences and target sequences. Model architecture, in that, we have the sub-steps: encoder and decoder. The encoder is used with an LSTM or GRU layer to encode the input sequence into a fixed-size context vector. We can also stack multiple layers for better performance. The decoder uses another LSTM or GRU layer with an attention mechanism to generate the output sequence. After that, we define the model by coding part. Then, the last one is, as I said, is training and evaluation. So, train the model using pairs of source and target sequences. Evaluate the performance using metrics like BLEU score for translation quality.

    So how would you benchmark the performance of a NoSQL database against SQL when dealing with large unstructured datasets using Python? Okay, actually, there are also steps we can follow to set up and configure the NoSQL database. Choose a NoSQL database, for example, MongoDB or Cassandra. All we can do is set it up. After that, choose an SQL database, for example, PostgreSQL or MySQL, and set it up. The second part is data preparation. In this, we can generate detailed data and create a large unstructured dataset to use as a benchmark. This could be a collection of documents with varied fields for NoSQL and similar tables with large volumes of rows for SQL. Benchmarking tasks include measuring insertion performance, the time taken to insert a large number of records or documents, and query performance, executing various queries, such as simple retrievals and complex aggregations, and measuring response times for both databases. We can also test update performance and deletion performance. With Python, we can use a code example to analyze the results and prepare performance metrics in terms of insertion time and query response time to determine which database performs better under the given conditions. We should consider factors like scalability, ease of use, and specific use case requirements in addition to raw performance metrics. And when considering this, we should ensure the environment, hardware, and network are consistent when running benchmarks. After that, test with a variety of operations and test sizes to get a comprehensive view of performance.

    So what factors will you consider when choosing between convulational neural networks and recurrent neural networks in computer vision task. So what factors would you consider when choosing between convolutional neural networks and recurrent neural networks in computer vision task? wait. And, unable to recall it. what factors would you consider when choosing between convolutional neural network and recurrent neural networks in computer vision task? What factors would you consider when choosing between convolutional neural networks and recurrent neural networks in competition task? Something here. What is going on?

    K. Which Python classes or frameworks will assist you in developing an anomaly detection system with PyTorch, and what will be your validation strategy? Okay. strategy, we can now follow various steps in that. 1st step is import necessary libraries after that, generating synthetic data, creating sequences, defining the autoencoder model, converting sequences in PyTorch sensors. No? After that yeah. This is the steps we can follow.

    Which Python tools you would use for text tokenization and sentiment analysis in an NLP pipeline, And why would you choose them? According to SoundScrapers, text block Okay. So in that case, we use, text blob. Text blob is a must for developers who are, starting NLP in Python and want to make, want to make the most of their first encounter with NLTK. It provides beginners with an easy interface to help them learn the most, basic NLP tasks like sentiment analysis, postaging, or noun phrase extraction. Yeah.

    Oh, give the following Python code snippet. What is the issue that will prevent, it from currently creating and machine learning model pipeline? I'll show the original code in, the issue in the original code. No? there is improper importer. A typo is there. From sklearn.svmimport s v, s v c is correct, but the pipeline definition, s v c should be replaced with s, capital s, SVC. So the correct last name is, capital SVC with uppercase, we can say, not a small smaller case. Syntax error in pipeline steps. In the original code. the pipeline steps are correct incorrectly formatted. You have, tested a test in a instead of a proper comma. No? So and separated and incorrect brackets bracket usage. It will be a list of tuples, with each tuple containing the name of step and the corresponding estimator or transformers. And, assuming x train and y train are defined, while not syntax error in that ensures that the x train, y train are properly defined and contained the data you intend to use for fitting the model. So yeah. So there are several issues. So yeah.

    The import statement is there. No, it's that we have to correct it. Import, light 3. It should be on a separate line, and the corrected line is from Flask import Flask, JSONify. And the second one is Flask app initialization. App = Flask(__name__) using the '==' operator to assign the Flask instance to the variable app and the method __init__ formatting. So __init__ = lambda: [app = Flask(__name__)], use normal code and make sure the list is properly formatted. Data fetching and return. So data = db.fetch_all() should be data = db.fetch() assign the result of those two. Okay. And also, jsonify data as soon as data is in a format that can be directly serialized to JSON. A SQLite fetch returns a list of tuples which need conversion to a JSON serializable format. For example, convert it to a list of dictionaries if required. So yeah. Also, we can add some additional considerations. So database column names to convert the query results into dictionaries with column names. You might need to know or retrieve column names from the cursor descriptions. Error handling for production and we should consider adding error handling to manage exceptions during database operations.

    We can devise a Python workflow that applies both deep learning and NLP techniques to extract insights from visual and textual data simultaneously. Yes, we can use Fastai, a Python-based open source machine learning framework that offers a high-level abstraction of deep learning model training. And, yes, so we can devise a Python workflow that applies both deep learning and NLP techniques. There are various methods and sources we have. After PyTorch, TensorFlow is a popular choice, and Keras is also widely used. OpenCV is also a useful library for computer vision tasks.

    Can you illustrate how version control with Jira would add in collaboration for a remote data science team deploying a TensorFlow model? Actually, Git is a version control system that tracks file changes. GitHub is a platform that allows developers to collaborate and store their code in the cloud. So think of it this way: Git is responsible for everything GitHub-related that happens locally on your computer. So, that's the basic main reason we can illustrate version control with the help of GitHub. We can also use no version control device as an alternative to GitHub. This allows you to integrate various local machines as a developer, so you can work together. We get to know the status of the task completion, pending tasks, and requirements, as well as any changes that have happened. Git, GitLab, and other platforms allow us to track these records.

    Deleting the columns with the machine data missing data. In this case, let's delete the column edge and then, feed the model, and check the accuracy. This is one method. After that, another imputation method is there. Filling the missing values is there. K-NN is there. Dealing with the row with missing data, which no. Suppose the column has more than half of its data missing or null values, then we can just delete or drop the whole column from the database. So, on that basis, we deal with the missing values and the corrupted data. K-NN is there. Also, we can also replace with the specific mean or median based on the type of data we get. So, on that basis, we replace the values with mean and median values accordingly.