Beautiful Soup Tutorial – Web Scraping in Python



The Beautiful Soup module is used for web scraping in Python. Learn how to use the Beautiful Soup and Requests modules in this tutorial. After watching, you will be able to start scraping the web on your own.

💻Code: https://github.com/vprusso/youtube_tutorials/tree/master/web_scraping_and_automation/beautiful_soup

Tutorial from Vincent Russo of Lucid Programming. Check out his YouTube channel: http://bit.ly/lucidcode

🐦Vincent on Twitter: @captainhamptons

Learn to code for free and get a developer job: https://www.freecodecamp.org

Read hundreds of articles on programming: https://medium.freecodecamp.org

And subscribe for new videos on technology every day: https://youtube.com/subscription_center?add_user=freecodecamp

source

This Post Has 44 Comments

  1. LucidProgramming

    Hi everyone, LucidProgramming here, the creator of this video series. If you enjoyed the video, I'm really thrilled to hear that. If you like this type of content, head on over to my channel and consider subscribing for staying up-to-date with similar types of videos. If you have any requests, recommendations, I take these seriously to heart and am constantly trying to improve the content of my channel. Thanks again, and happy coding!

  2. Ali Alavi

    Thanks for the video. Could you explain how to scrap within parents or siblings of a tag?

  3. Aaron Rumley

    this is an awesome tutorial, thank you! very clear and concise, but with lots of content!

  4. pooja kumari

    How can we fetch specific tags for example- address I want to fetch from multiple files I have in .txt format.

    plzzzz provide solution ….

    for one file I can fetch but how can I fetch for multiples files:-

    addresses = []

    with open("/rawhtml/greerwilsonchapel.com_executives_contact_us.txt") as fp:

    soup = BeautifulSoup(fp)

    address = soup.find('div',class_="locator-titles").get_text().rstrip('n').split('n')

    addresses.append(address)

    print(addresses)

  5. Abdul Raheem

    Thanks, it really helps me to understand the uses of beautifulsoup functions.

  6. Project Vibe

    if you are getting an error and windows 32bit then use html.parser instead of lxml

  7. Kadir Durak

    Guys, I have a problem with this and would appreciate any help:
    Whenever I attempt to run this program I get this error. I have followed all the steps, typed it correctly, installed it into the same folder as my program and python, installed all the other software in the comments section (lxml, parser) but I still get this error, if I have missed anything please tell me as it is getting very annoying now. The error is "ModuleNotFoundError: No module named 'requests'"

  8. Mike W

    Boom!
    At 16:45 you find the a tags inside the h2 tags! That's what I needed.
    Very happy! Thank you!

  9. Mentor 101

    If any one getting lxml error while running the code just do:
    pip install lxml

  10. Another Look

    I am only 11 minutes in, and this is one of the clearest explanations of the data extraction syntax that I have come across so far.

  11. octo1

    For anyone getting the
    AttributeError: 'NoneType' object has no attribute 'attrs'
    error – the list returns two items that are empty for some reason.

    Use
    if a_tag is not None:
    before you append to urls

  12. Biff Bifford

    I would like to say that my first computer was an Osborne 8088 (cira, 1981). I begin programming in the mid-1980s in Dbase, Pascal, and C as a hobby. In the 1980s, we would log into bulletin boards that were someone's computer, which had games and other things to trade and share. This was my internet when I was a kid. I just want to say thank you, FreeCodeCamp, for bringing the joy back into computer programming for me with such rich and insightful content.

  13. Mian Ali

    We are providing services Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the World Wide Web by either implementing low-level Hypertext Transfer Protocol (HTTP), or embedding a fully-fledged web browser, such as Internet Explorer, Google Chrome and Mozilla Firefox

    Contact Us

  14. Ryan

    What editor is this? Makes moving around seem so easy

  15. Wilson Yu

    I used to be a chef that makes delicious soup, after seeing this I became a programmer that makes beautiful soups.

  16. God Plaz

    For some reason, it says ModuleNotFoundError when I try to type in "from bs4 import BeautifulSoup"

  17. pawan mishra

    he sounds like someone caught cold, btw it was very helpful.

  18. Apoorv

    at 18:18 run progam i am getting below error, pls help
    line 13, in <module>
    urls.append(a_tag.attrs['href'])

    AttributeError: 'NoneType' object has no attribute 'attrs'

  19. Business Opportunity in Ruvol

    I have invented a Board Game [still unpublished and not yet out in the market] that I believe is guaranteed to be as challenging and exciting as CHESS. I called it “RUVOL.”

    It is my hope that one day Ruvol may surpass chess as the “Number One Board Game in the World.”

    The weakness of chess is it always starts in fixed positions that the opening moves become “memorizable.” In fact, not a few have so mastered the moves that they can play against their opponents “blindfolded.” It is for this very reason that the great Bobby Fischer introduced his so-called “Fischer Random Chess,” where the starting position of the pieces is “randomized” to make the memorization of openings impracticable. Fortunately, it is also for this reason that I invented Ruvol where “every game” has been calculated to be a challenging one to play.

    HOW IS RUVOL PLAYED and HOW YOU CAN MONETIZE IT?

    I detailed everything in my YouTube video. Here is the link: https://www.youtube.com/watch?v=jcqth0m3-R0

    BIG MONEY POTENTIAL IN RUVOL!

    It is worthwhile to note that the people who play chess will be the same people who will play Ruvol. In my Google search, I learned there are around 800 million chess players in the world. Even just a small percentage of these 800 million is good enough to earn big money from Ruvol either as an ONLINE GAME BUSINESS or as a PHYSICAL PRODUCT DISTRIBUTOR.

    You may contact me at: rodolfovitangcol@gmail.com.

    Thanks and God bless!

    RODOLFO MARTIN VITANGCOL
    The Ruvol Inventor

  20. WHO KNOWS?

    this dude lowkey sounds like charlie/penguinz0

  21. Ijon Tichy

    apparenty in the whitehouse website, first <a> in <h2> is None, so to avoid error, before appending a_tag to urls, make sure a_tag is not None

    a_tag = …
    if a_tag is not None:
    urls.append(a_tag.attrs['href'])

  22. Tobor Two

    searching with tags "div" doesn't work with link.text. I don't think it recognizes div to be text. what do I need to do instead to find key words?

  23. Ke H

    It doesn't return any links containing 'about'even with the correct code..any ideas?

  24. CodeCSE

    if you are a mac user, then use pip3 instead of pip. for mac pip3

  25. Hamza Mehmood

    urls.append(a_tag.attrs['href'])

    AttributeError: 'NoneType' object has no attribute 'attrs'

    It is giving me this error while appending my list. Anyone can help ???

  26. I am getting "AttributeError: 'NoneType' object has no attribute 'attrs'" on urls.append(a_tag.attrs['href'])

  27. Mike Waters

    Super helpful that unlike other videos, goes into depth how to actually extract different things. Thank you.

  28. Seda Karanfil

    Hi, thanks for the tutorial. I am new to Python and almost totally ignorant. I downloaded Python 3.9.5 and trying to use IDLE for web scraping for a long while. You have this tab down there which starts with "NORMAL" and where you enter soup_and_requests.py. I don't have that part. Since I am totally ignorant, I do not know how to have that tab at the very bottom. I would be so grateful if you can answer my stupid question. Thanks so much.

  29. Adam Durrant

    I literally fall at the first hurdle with all of these tutorials:

    zsh: command not found: pip

  30. Aref Billah

    for h2_tag in soup.find_all("h2"):

    a_tag = h2_tag.find("a")

    if a_tag == None:

    urls.append("Empty")

    continue

    urls.append(a_tag.attrs['href'])

  31. Rohit Methare

    How do I create a scrapper to track product on Amazon specific category ? Can someone help

  32. D Israel

    i really liked the github notes make more vids like that please

  33. Samir EL ZEIN

    you just needed Jupyter to make your life easier with this demo 🙂

Leave a Reply