• PROFESSIONAL CERTIFICATION IN SUPPLY CHAIN MANAGEMENT AND ANALYTICS
    Co-created with IIT Roorkee
    4.8 out of 5 by 469 learners
    4x
  • CERTIFICATION IN SOFTWARE ENGINEERING FOR CLOUD, BLOCKCHAIN AND IOT
    Co-created with IIT Guwahati
    4.8 out of 5 by 815 learners
    4x
  • CERTIFICATION IN ARTIFICIAL INTELLIGENCE and MACHINE LEARNING
    Co-created with IIT Guwahati
    4.8 out of 5 by 621 learners
    4x
  • Post Graduate Program in Analytics and Artificial Intelligence
    Co-created with UCLA Extension
    4.6 out of 5 by 1937 learners
    12 X industry demand
  • Machine Learning and Deep Learning Prodegree
    Co-created with IBM
    4.6 out of 5 by 3487 learners
    32 X industry demand
  • POST GRADUATE PROGRAM IN DATA ANALYTICS and MACHINE LEARNING
    4.8 out of 5 by 3278 learners
    14 X industry demand
  • Data Science Prodegree
    Co-created with KPMG in India
    4.7 out of 5 by 6233 learners
    16 X industry demand

Every data science project starts with data. We need to acquire a huge amount of data to train our machine learning models. There are various ways to collect data. Surfing websites and downloading the structured datasets present on them is one of the most common methods for data collection. But there are times when this data is not enough. Certain problem statement datasets are not easily available on the web. And to deal with this situation, we need to create our own datasets.

In this article, we will discuss the method of creating a custom image dataset and labeling it using Python. First, let us talk about acquiring images through web scraping.

Web Scrapping

Web Scraping refers to the process of data scraping from websites. It surfs the world wide web and stores the extracted data in the system. Beautifulsoup is one of the most popular Python libraries for image scraping. The requests library requests the essential webpage.

How To:

When we go to the developer tool by clicking on a picture on the webpage, there displays a format starting with images.pexel.com/photos after which a number is listed, which is unique for every photo. One can get a similar image using the regex (regular expression).

Using this method, our images get scrapped. We can also print the links if we want to see those links and make a directory of them. After this, we will download the images. Once the process is complete, you can see the scrapped images through the specified path where images are stored.

Labeling

After scrapping and storing the images, we need to classify them through labeling. Labeling software is used for this purpose. It is a pip installable annotation tool. It provides two annotations YOLO and PASCAL VOC.

How To:

You can open the labeling software using the command: (base) C:\Users\Jayita\labeling

There will be specified options on the left-hand side of the screen. On the right-hand side, you will see the image file information. Select ‘Open dir’ to see all images. Press ‘a’ to view the previous image and ‘d’ to view the next image.

To get the annotations, draw a rectangular box and press ‘w’. A window will pop up to store the image’s class name. Once you are done with drawing the box and labeling the image, it’s time to save it. To generate the annotations, you need to store the image in PASCAL VOC or YOLO format.

One can learn about this in detail in a data science course. Web scrapping and labeling is not a hard process once you understand the basics of it. You need to be careful while scrapping a website and obey the rules so that you do not harm the website you are scrapping. Take time to consider your requirements and research accordingly to find a suitable website for this process. For example, if you plan to develop a model for fashion, then online shopping websites should be on your scrapping list.

Learning web scraping and labeling is important if you want to build a data science career in the future. It will provide you with a deep understanding of image datasets. You can use these techniques to increase the data in situations where available data for a project is less. You can apply this process to multiple classes if they share the same folder and get the desired results.

For Online Course Enquiries
About Imarticus
Imarticus Learning is India’s leading professional education institute that offers training in Financial Services, Data Analytics & Technology. We’ve successfully transformed careers of over 35,000+ individuals globally through our Certification, Prodegree, and Post Graduate programs offered in association with leading and renowned global organisations in the Financial Services, Data Analytics & Technology domain.
Related course
  • Analytics
    PROFESSIONAL CERTIFICATION IN SUPPLY CHAIN MANAGEMENT AND ANALYTICS
    Co-created with IIT Roorkee
    Course duration()
    Upcoming batches
    1
    Organizations enrolled
    20
    4.8 out of 5 by 469 learners
    4x
    Upcoming Batches
    Date Location Schedule
    21st November ONLINE Online
    Date Location Schedule
  • Placement Assistance
    CERTIFICATION IN SOFTWARE ENGINEERING FOR CLOUD, BLOCKCHAIN AND IOT
    Co-created with IIT Guwahati
    Course duration()
    Upcoming batches
    1
    Organizations enrolled
    20
    4.8 out of 5 by 815 learners
    4x
    Upcoming Batches
    Date Location Schedule
    ONLINE Online
    Date Location Schedule
  • Placement Assistance
    CERTIFICATION IN ARTIFICIAL INTELLIGENCE and MACHINE LEARNING
    Co-created with IIT Guwahati
    Course duration(Months)
    8
    Upcoming batches
    1
    Organizations enrolled
    20
    4.8 out of 5 by 621 learners
    4x
    Upcoming Batches
    Date Location Schedule
    23rd October ONLINE Online
    Date Location Schedule
  • POST GRADUATE PROGRAM
    Post Graduate Program in Analytics and Artificial Intelligence
    Co-created with UCLA Extension
    Course duration(Weeks)
    28
    Upcoming batches
    2
    Organizations enrolled
    20
    4.6 out of 5 by 1937 learners
    12 X industry demand
    Upcoming Batches
    Date Location Schedule
    10th March CHENNAI Weekend
    Date Location Schedule
    27th March BANGALORE-KORAMANGALA Weekend
  • Prodegree
    Machine Learning and Deep Learning Prodegree
    Co-created with IBM
    Course duration(Months)
    4
    Upcoming batches
    3
    Organizations enrolled
    20
    4.6 out of 5 by 3487 learners
    32 X industry demand
    Upcoming Batches
    Date Location Schedule
    20th March CHENNAI Weekend
    27th March BANGALORE-KORAMANGALA Weekday
    Date Location Schedule
    20th March BANGALORE-KORAMANGALA Weekend
  • Post Graduation
    POST GRADUATE PROGRAM IN DATA ANALYTICS and MACHINE LEARNING
    Course duration(Months)
    5
    Upcoming batches
    1
    Organizations enrolled
    20
    4.8 out of 5 by 3278 learners
    14 X industry demand
    Upcoming Batches
    Date Location Schedule
    30th October CHENNAI Weekend
    Date Location Schedule
  • Prodegree
    Data Science Prodegree
    Co-created with KPMG in India
    Course duration(Months)
    2-4
    Upcoming batches
    1
    Organizations enrolled
    20
    4.7 out of 5 by 6233 learners
    16 X industry demand
    Upcoming Batches
    Date Location Schedule
    9th October ANDHERI Weekend
    Date Location Schedule