English 中文(简体)
Beautiful Soup - Beautiful Objects
  • 时间:2024-12-22

Beautiful Soup - Beautiful Objects


Previous Page Next Page  

The starting point of any BeautifulSoup project, is the BeautifulSoup object. A BeautifulSoup object represents the input HTML/XML document used for its creation.

We can either pass a string or a file-pke object for Beautiful Soup, where files (objects) are either locally stored in our machine or a web page.

The most common BeautifulSoup Objects are −

    Tag

    NavigableString

    BeautifulSoup

    Comment

Comparing objects for equapty

As per the beautiful soup, two navigable string or tag objects are equal if they represent the same HTML/XML markup.

Now let us see the below example, where the two <b> tags are treated as equal, even though they pve in different parts of the object tree, because they both look pke “<b>Java</b>”.


>>> markup = "<p>Learn Python and <b>Java</b> and advanced <b>Java</b>! from Tutorialspoint</p>"
>>> soup = BeautifulSoup(markup, "html.parser")
>>> first_b, second_b = soup.find_all( b )
>>> print(first_b == second_b)
True
>>> print(first_b.previous_element == second_b.previous_element)
False

However, to check if the two variables refer to the same objects, you can use the following−


>>> print(first_b is second_b)
False

Copying Beautiful Soup objects

To create a copy of any tag or NavigableString, use copy.copy() function, just pke below −


>>> import copy
>>> p_copy = copy.copy(soup.p)
>>> print(p_copy)
<p>Learn Python and <b>Java</b> and advanced <b>Java</b>! from Tutorialspoint</p>
>>>

Although the two copies (original and copied one) contain the same markup however, the two do not represent the same object −


>>> print(soup.p == p_copy)
True
>>>
>>> print(soup.p is p_copy)
False
>>>

The only real difference is that the copy is completely detached from the original Beautiful Soup object tree, just as if extract() had been called on it.


>>> print(p_copy.parent)
None

Above behavior is due to two different tag objects which cannot occupy the same space at the same time.

Advertisements