- Beautiful Soup - Discussion
- Beautiful Soup - Useful Resources
- Beautiful Soup - Quick Guide
- Beautiful Soup - Trouble Shooting
- Parsing Only Section of a Document
- Beautiful Soup - Beautiful Objects
- Beautiful Soup - Encoding
- Beautiful Soup - Modifying the Tree
- Beautiful Soup - Searching the Tree
- Beautiful Soup - Navigating by Tags
- Beautiful Soup - Kinds of objects
- Beautiful Soup - Souping the Page
- Beautiful Soup - Installation
- Beautiful Soup - Overview
- Beautiful Soup - Home
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Beautiful Soup - Beautiful Objects
The starting point of any BeautifulSoup project, is the BeautifulSoup object. A BeautifulSoup object represents the input HTML/XML document used for its creation.
We can either pass a string or a file-pke object for Beautiful Soup, where files (objects) are either locally stored in our machine or a web page.
The most common BeautifulSoup Objects are −
Tag
NavigableString
BeautifulSoup
Comment
Comparing objects for equapty
As per the beautiful soup, two navigable string or tag objects are equal if they represent the same HTML/XML markup.
Now let us see the below example, where the two <b> tags are treated as equal, even though they pve in different parts of the object tree, because they both look pke “<b>Java</b>”.
>>> markup = "<p>Learn Python and <b>Java</b> and advanced <b>Java</b>! from Tutorialspoint</p>" >>> soup = BeautifulSoup(markup, "html.parser") >>> first_b, second_b = soup.find_all( b ) >>> print(first_b == second_b) True >>> print(first_b.previous_element == second_b.previous_element) False
However, to check if the two variables refer to the same objects, you can use the following−
>>> print(first_b is second_b) False
Copying Beautiful Soup objects
To create a copy of any tag or NavigableString, use copy.copy() function, just pke below −
>>> import copy >>> p_copy = copy.copy(soup.p) >>> print(p_copy) <p>Learn Python and <b>Java</b> and advanced <b>Java</b>! from Tutorialspoint</p> >>>
Although the two copies (original and copied one) contain the same markup however, the two do not represent the same object −
>>> print(soup.p == p_copy) True >>> >>> print(soup.p is p_copy) False >>>
The only real difference is that the copy is completely detached from the original Beautiful Soup object tree, just as if extract() had been called on it.
>>> print(p_copy.parent) None
Above behavior is due to two different tag objects which cannot occupy the same space at the same time.
Advertisements