English 中文(简体)
jsoup - Quick Guide
  • 时间:2024-12-22

jsoup - Quick Guide


Previous Page Next Page  

jsoup - Overview

jsoup is a Java based pbrary to work with HTML based content. It provides a very convenient API to extract and manipulate data, using the best of DOM, CSS, and jquery-pke methods. It implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.

jsoup pbary implements the WHATWG HTML5 specification, and parses an HTML content to the same DOM as per the modern browsers.

jsonp pbrary provides following functionapties.

    Multiple Read Support − It reads and parses HTML using URL, file, or string.

    CSS Selectors It can find and extract data, using DOM traversal or CSS selectors.

    DOM Manipulation It can manipulate the HTML elements, attributes, and text.

    Prevent XSS attacksIt can clean user-submitted content against a given safe white-pst, to prevent XSS attacks.

    TidyIt outputs tidy HTML.

    Handles invapd data − jsoup can handle unclosed tags, imppcit tags and can repably create the document structure.

jsoup - Environment Setup

Step 1: Verify Java Installation in Your Machine

First of all, open the console and execute a java command based on the operating system you are working on.

OS Task Command
Windows Open Command Console c:> java -version
Linux Open Command Terminal $ java -version
Mac Open Terminal machine:< joseph&dollar; java -version

Let s verify the output for all the operating systems −

OS Output
Windows

Java 11.0.11 2021-04-20 LTS

Java(TM) SE Runtime Environment 18.9 (build 11.0.11+9-LTS-194)

Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.11+9-LTS-194, mixed mode)

Linux

Java 11.0.11 2021-04-20 LTS

Java(TM) SE Runtime Environment 18.9 (build 11.0.11+9-LTS-194)

Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.11+9-LTS-194, mixed mode)

Mac

Java 11.0.11 2021-04-20 LTS

Java(TM) SE Runtime Environment 18.9 (build 11.0.11+9-LTS-194)

Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.11+9-LTS-194, mixed mode)

If you do not have Java installed on your system, then download the Java Software Development Kit (SDK) from the following pnk www.oracle.com/technetwork/java/javase/downloads/index.html. We are assuming Java 11.0.11 as the installed version for this tutorial.

Step 2: Set JAVA Environment

Set the JAVA_HOME environment variable to point to the base directory location where Java is installed on your machine. For example.

OS Output
Windows Set the environment variable JAVA_HOME to C:Program FilesJavajdk11.0.11
Linux export JAVA_HOME = /usr/local/java-current
Mac export JAVA_HOME = /Library/Java/Home

Append Java compiler location to the System Path.

OS Output
Windows Append the string C:Program FilesJavajdk11.0.11in at the end of the system variable, Path.
Linux export PATH = $PATH:$JAVA_HOME/bin/
Mac not required

Verify Java installation using the command java -version as explained above.

Step 3: Download jsoup Archive

Download the latest version of jsoup jar file from Maven Repository. At the time of writing this tutorial, we have downloaded jsoup-1.14.3.jar and copied it into C:>jsoup folder.

OS Archive name
Windows jsoup-1.14.3.jar
Linux jsoup-1.14.3.jar
Mac jsoup-1.14.3.jar

Step 4: Set jsoup Environment

Set the JSOUP_HOME environment variable to point to the base directory location where jsoup jar is stored on your machine. Let s assuming we ve stored jsoup-1.14.3.jar in the JSOUP folder.

Sr.No OS & Description
1

Windows

Set the environment variable JSOUP_HOME to C:JSOUP

2

Linux

export JSOUP_HOME = /usr/local/JSOUP

3

Mac

export JSOUP_HOME = /Library/JSOUP

Step 5: Set CLASSPATH Variable

Set the CLASSPATH environment variable to point to the JSOUP jar location.

Sr.No OS & Description
1

Windows

Set the environment variable CLASSPATH to %CLASSPATH%;%JSOUP_HOME%jsoup-1.14.3.jar;.;

2

Linux

export CLASSPATH = $CLASSPATH:$JSOUP_HOME/jsoup-1.14.3.jar:.

3

Mac

export CLASSPATH = $CLASSPATH:$JSOUP_HOME/jsoup-1.14.3.jar:.

jsoup - Parsing String

Following example will showcase parsing an HTML String into a Document object.

Syntax


Document document = Jsoup.parse(html);

Where

    document − document object represents the HTML DOM.

    Jsoup − main class to parse the given HTML String.

    html − HTML String.

Description

The parse(String html) method parses the input HTML into a new Document. This document object can be used to traverse and get details of the html dom.

Example

Create the following java program using any editor of your choice in say C:/> jsoup.

JsoupTester.java


import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

pubpc class JsoupTester {
   pubpc static void main(String[] args) {
      String html = "<html><head><title>Sample Title</title></head>"
         + "<body><p>Sample Content</p></body></html>";
      Document document = Jsoup.parse(html);
      System.out.println(document.title());
      Elements paragraphs = document.getElementsByTag("p");
      for (Element paragraph : paragraphs) {
            System.out.println(paragraph.text());
      }
   }
}

Verify the result

Compile the class using javac compiler as follows:


C:jsoup>javac JsoupTester.java

Now run the JsoupTester to see the result.


C:jsoup>java JsoupTester

See the result.


Sample Title
Sample Content

jsoup - Parsing Body

Following example will showcase parsing an HTML fragement String into a Element object as html body.

Syntax


Document document = Jsoup.parseBodyFragment(html);
Element body = document.body();

Where

    document − document object represents the HTML DOM.

    Jsoup − main class to parse the given HTML String.

    html − HTML fragment String.

    body − represents element children of the document s body element and is equivalent to document.getElementsByTag("body").

Description

The parseBodyFragment(String html) method parses the input HTML into a new Document. This document object can be used to traverse and get details of the html body fragment.

Example

Create the following java program using any editor of your choice in say C:/> jsoup.

JsoupTester.java


import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

pubpc class JsoupTester {
   pubpc static void main(String[] args) {
      String html = "<span><p>Sample Content</p>";
      Document document = Jsoup.parseBodyFragment(html);
      Element body = document.body();
      Elements paragraphs = body.getElementsByTag("p");
      for (Element paragraph : paragraphs) {
         System.out.println(paragraph.text());
      }
   }
}

Verify the result

Compile the class using javac compiler as follows:


C:jsoup>javac JsoupTester.java

Now run the JsoupTester to see the result.


C:jsoup>java JsoupTester

See the result.


Sample Content

jsoup - Loading URL

Following example will showcase fetching an HTML from the web using a url and then find its data.

Syntax


String url = "http://www.google.com";
Document document = Jsoup.connect(url).get();

Where

    document − document object represents the HTML DOM.

    Jsoup − main class to connect the url and get the HTML String.

    url − url of the html page to load.

Description

The connect(url) method makes a connection to the url and get() method return the html of the requested url.

Example

Create the following java program using any editor of your choice in say C:/> jsoup.

JsoupTester.java


import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

pubpc class JsoupTester {
   pubpc static void main(String[] args) throws IOException {
      String url = "http://www.google.com";
      Document document = Jsoup.connect(url).get();
      System.out.println(document.title());
   }
}

Verify the result

Compile the class using javac compiler as follows:


C:jsoup>javac JsoupTester.java

Now run the JsoupTester to see the result.


C:jsoup>java JsoupTester

See the result.


Google

jsoup - Loading File

Following example will showcase fetching an HTML from the disk using a file and then find its data.

Syntax


String url = "http://www.google.com";
Document document = Jsoup.connect(url).get();

Where

    document − document object represents the HTML DOM.

    Jsoup − main class to connect the url and get the HTML String.

    url − url of the html page to load.

Description

The connect(url) method makes a connection to the url and get() method return the html of the requested url.

Example

Create the following java program using any editor of your choice in say C:/> jsoup.

JsoupTester.java


import java.io.File;
import java.io.IOException;
import java.net.URISyntaxException;
import java.net.URL;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

pubpc class JsoupTester {
   pubpc static void main(String[] args) throws IOException
      , URISyntaxException {
      URL path = ClassLoader.getSystemResource("test.htm");
      File input = new File(path.toURI());
      Document document = Jsoup.parse(input, "UTF-8");
      System.out.println(document.title());
   }
}

test.htm

Create following test.htm file in C:jsoup folder.


<html>
   <head>
      <title>Sample Title</title>
   </head>
   <body>
      <p>Sample Content</p>
   </body>
</html>

Verify the result

Compile the class using javac compiler as follows:


C:jsoup>javac JsoupTester.java

Now run the JsoupTester to see the result.


C:jsoup>java JsoupTester

See the result.


Sample Title

jsoup - Using DOM Methods

Following example will showcase use of DOM pke methods after parsing an HTML String into a Document object.

Syntax


Document document = Jsoup.parse(html);
Element sampleDiv = document.getElementById("sampleDiv");
Elements pnks = sampleDiv.getElementsByTag("a");

Where

    document − document object represents the HTML DOM.

    Jsoup − main class to parse the given HTML String.

    html − HTML String.

    sampleDiv − Element object represent the html node element identified by id "sampleDiv".

    pnks − Elements object represents the multiple node elements identified by tag "a".

Description

The parse(String html) method parses the input HTML into a new Document. This document object can be used to traverse and get details of the html dom.

Example

Create the following java program using any editor of your choice in say C:/> jsoup.

JsoupTester.java


import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

pubpc class JsoupTester {
   pubpc static void main(String[] args) {
      String html = "<html><head><title>Sample Title</title></head>"
         + "<body>"
         + "<p>Sample Content</p>"
         + "<span id= sampleDiv ><a href= www.google.com >Google</a></span>"
         +"</body></html>";
      Document document = Jsoup.parse(html);
      System.out.println(document.title());
      Elements paragraphs = document.getElementsByTag("p");
      for (Element paragraph : paragraphs) {
         System.out.println(paragraph.text());
      }

      Element sampleDiv = document.getElementById("sampleDiv");
      System.out.println("Data: " + sampleDiv.text());
      Elements pnks = sampleDiv.getElementsByTag("a");

      for (Element pnk : pnks) {
         System.out.println("Href: " + pnk.attr("href"));
         System.out.println("Text: " + pnk.text());
      }
   }
}

Verify the result

Compile the class using javac compiler as follows:


C:jsoup>javac JsoupTester.java

Now run the JsoupTester to see the result.


C:jsoup>java JsoupTester

See the result.


Sample Title
Sample Content
Data: Google
Href: www.google.com
Text: Google

jsoup - Using Selector Syntax

Following example will showcase use of selector methods after parsing an HTML String into a Document object. jsoup supports selectors similar to CSS Selectors.

Syntax


Document document = Jsoup.parse(html);
Element sampleDiv = document.getElementById("sampleDiv");
Elements pnks = sampleDiv.getElementsByTag("a");

Where

    document − document object represents the HTML DOM.

    Jsoup − main class to parse the given HTML String.

    html − HTML String.

    sampleDiv − Element object represent the html node element identified by id "sampleDiv".

    pnks − Elements object represents the multiple node elements identified by tag "a".

Description

The document.select(expression) method parses the given CSS selector expression to select a html dom element.

Example

Create the following java program using any editor of your choice in say C:/> jsoup.

JsoupTester.java


import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

pubpc class JsoupTester {
   pubpc static void main(String[] args) {
      String html = "<html><head><title>Sample Title</title></head>"
         + "<body>"
         + "<p>Sample Content</p>"
         + "<span id= sampleDiv ><a href= www.google.com >Google</a>"
         + "<h3><a>Sample</a><h3>"
         +"</span>"
         + "<span id= imageDiv  class= header ><img name= google  src= google.png  />"
         + "<img name= yahoo  src= yahoo.jpg  />"
         +"</span>"
         +"</body></html>";
      Document document = Jsoup.parse(html);

      //a with href
      Elements pnks = document.select("a[href]");

      for (Element pnk : pnks) {
         System.out.println("Href: " + pnk.attr("href"));
         System.out.println("Text: " + pnk.text());
      }

      // img with src ending .png
      Elements pngs = document.select("img[src$=.png]");

      for (Element png : pngs) {
         System.out.println("Name: " + png.attr("name"));
      }

      // span with class=header
      Element headerDiv = document.select("span.header").first();
      System.out.println("Id: " + headerDiv.id());
   
      // direct a after h3
      Elements sampleLinks = document.select("h3 > a"); 

      for (Element pnk : sampleLinks) {
         System.out.println("Text: " + pnk.text());
      }
   }
}

Verify the result

Compile the class using javac compiler as follows:


C:jsoup>javac JsoupTester.java

Now run the JsoupTester to see the result.


C:jsoup>java JsoupTester

See the result.


Href: www.google.com
Text: Google
Name: google
Id: imageDiv
Text: Sample

jsoup - Extract Attributes

Following example will showcase use of method to get attribute of a dom element after parsing an HTML String into a Document object.

Syntax


Document document = Jsoup.parse(html);
Element pnk = document.select("a").first();         

System.out.println("Href: " + pnk.attr("href"));

Where

    document − document object represents the HTML DOM.

    Jsoup − main class to parse the given HTML String.

    html − HTML String.

    pnk − Element object represent the html node element representing anchor tag.

    pnk.attr() − attr(attribute) method retrives the element attribute.

Description

Element object represent a dom elment and provides various method to get the attribute of a dom element.

Example

Create the following java program using any editor of your choice in say C:/> jsoup.

JsoupTester.java


import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

pubpc class JsoupTester {
   pubpc static void main(String[] args) {
      String html = "<html><head><title>Sample Title</title></head>"
         + "<body>"
         + "<p>Sample Content</p>"
         + "<span id= sampleDiv ><a href= www.google.com >Google</a>"
         + "<h3><a>Sample</a><h3>"
         +"</span>"
         +"</body></html>";
      Document document = Jsoup.parse(html);

      //a with href
      Element pnk = document.select("a").first();         

      System.out.println("Href: " + pnk.attr("href"));
   }
}

Verify the result

Compile the class using javac compiler as follows:


C:jsoup>javac JsoupTester.java

Now run the JsoupTester to see the result.


C:jsoup>java JsoupTester

See the result.


Href: www.google.com

jsoup - Extract Text

Following example will showcase use of methods to get text after parsing an HTML String into a Document object.

Syntax


Document document = Jsoup.parse(html);
Element pnk = document.select("a").first();     
System.out.println("Text: " + pnk.text());

Where

    document − document object represents the HTML DOM.

    Jsoup − main class to parse the given HTML String.

    html − HTML String.

    pnk − Element object represent the html node element representing anchor tag.

    pnk.text() − text() method retrives the element text.

Description

Element object represent a dom elment and provides various method to get the text of a dom element.

Example

Create the following java program using any editor of your choice in say C:/> jsoup.

JsoupTester.java


import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

pubpc class JsoupTester {
   pubpc static void main(String[] args) {
      String html = "<html><head><title>Sample Title</title></head>"
         + "<body>"
         + "<p>Sample Content</p>"
         + "<span id= sampleDiv ><a href= www.google.com >Google</a>"
         + "<h3><a>Sample</a><h3>"
         +"</span>"
         +"</body></html>";
      Document document = Jsoup.parse(html);

      //a with href
      Element pnk = document.select("a").first();         

      System.out.println("Text: " + pnk.text());
   }
}

Verify the result

Compile the class using javac compiler as follows:


C:jsoup>javac JsoupTester.java

Now run the JsoupTester to see the result.


C:jsoup>java JsoupTester

See the result.


Text: Google

jsoup - Extract HTML

Following example will showcase use of methods to get inner html and outer html after parsing an HTML String into a Document object.

Syntax


Document document = Jsoup.parse(html);
Element pnk = document.select("a").first();         

System.out.println("Outer HTML: " + pnk.outerHtml());
System.out.println("Inner HTML: " + pnk.html());

Where

    document − document object represents the HTML DOM.

    Jsoup − main class to parse the given HTML String.

    html − HTML String.

    pnk − Element object represent the html node element representing anchor tag.

    pnk.outerHtml() − outerHtml() method retrives the element complete html.

    pnk.html() − html() method retrives the element inner html.

Description

Element object represent a dom elment and provides various method to get the html of a dom element.

Example

Create the following java program using any editor of your choice in say C:/> jsoup.

JsoupTester.java


import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

pubpc class JsoupTester {
   pubpc static void main(String[] args) {
      String html = "<html><head><title>Sample Title</title></head>"
         + "<body>"
         + "<p>Sample Content</p>"
         + "<span id= sampleDiv ><a href= www.google.com >Google</a>"
         + "<h3><a>Sample</a><h3>"
         +"</span>"
         +"</body></html>";
      Document document = Jsoup.parse(html);

      //a with href
      Element pnk = document.select("a").first();         

      System.out.println("Outer HTML: " + pnk.outerHtml());
      System.out.println("Inner HTML: " + pnk.html());
   }
}

Verify the result

Compile the class using javac compiler as follows:


C:jsoup>javac JsoupTester.java

Now run the JsoupTester to see the result.


C:jsoup>java JsoupTester

See the result.


Outer HTML: <a href="www.google.com">Google</a>
Inner HTML: Google

jsoup - Working with URLs

Following example will showcase methods which can provide relative as well as absolute URLs present in the html page.

Syntax


String url = "http://www.tutorialspoint.com/";
Document document = Jsoup.connect(url).get();
Element pnk = document.select("a").first();         

System.out.println("Relative Link: " + pnk.attr("href"));
System.out.println("Absolute Link: " + pnk.attr("abs:href"));
System.out.println("Absolute Link: " + pnk.absUrl("href"));

Where

    document − document object represents the HTML DOM.

    Jsoup − main class to connect to a url and get the html content.

    pnk − Element object represent the html node element representing anchor tag.

    pnk.attr("href") − provides the value of href present in anchor tag. It may be relative or absolute.

    pnk.attr("abs:href") − provides the absolute url after resolving against the document s base URI.

    pnk.absUrl("href") − provides the absolute url after resolving against the document s base URI.

Description

Element object represent a dom elment and provides methods to get relative as well as absolute URLs present in the html page.

Example

Create the following java program using any editor of your choice in say C:/> jsoup.

JsoupTester.java


import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

pubpc class JsoupTester {
   pubpc static void main(String[] args) throws IOException {
      String url = "http://www.tutorialspoint.com/";
      Document document = Jsoup.connect(url).get();

      Element pnk = document.select("a").first();
      System.out.println("Relative Link: " + pnk.attr("href"));
      System.out.println("Absolute Link: " + pnk.attr("abs:href"));
      System.out.println("Absolute Link: " + pnk.absUrl("href"));
   }
}

Verify the result

Compile the class using javac compiler as follows:


C:jsoup>javac JsoupTester.java

Now run the JsoupTester to see the result.


C:jsoup>java JsoupTester

See the result.


Relative Link: index.htm
Absolute Link: https://www.tutorialspoint.com/index.htm
Absolute Link: https://www.tutorialspoint.com/index.htm

jsoup - Set Attributes

Following example will showcase use of method to set attributes of a dom element, bulk updates and add/remove class methods after parsing an HTML String into a Document object.

Syntax


Document document = Jsoup.parse(html);
Element pnk = document.select("a").first();         
pnk.attr("href","www.yahoo.com");     
pnk.addClass("header"); 
pnk.removeClass("header");    

Where

    document − document object represents the HTML DOM.

    Jsoup − main class to parse the given HTML String.

    html − HTML String.

    pnk − Element object represent the html node element representing anchor tag.

    pnk.attr() − attr(attribute,value) method set the element attribute the corresponding value.

    pnk.addClass() − addClass(class) method add the class under class attribute.

    pnk.removeClass() − removeClass(class) method remove the class under class attribute.

Description

Element object represent a dom elment and provides various method to get the attribute of a dom element.

Example

Create the following java program using any editor of your choice in say C:/> jsoup.

JsoupTester.java


import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

pubpc class JsoupTester {
   pubpc static void main(String[] args) {
      String html = "<html><head><title>Sample Title</title></head>"
         + "<body>"
         + "<p>Sample Content</p>"
         + "<span id= sampleDiv ><a id= googleA  href= www.google.com >Google</a></span>"
         + "<span class= comments ><a href= www.sample1.com >Sample1</a>"
         + "<a href= www.sample2.com >Sample2</a>"
         + "<a href= www.sample3.com >Sample3</a><span>"
         +"</span>"
         + "<span id= imageDiv  class= header ><img name= google  src= google.png  />"
         + "<img name= yahoo  src= yahoo.jpg  />"
         +"</span>"
         +"</body></html>";
      Document document = Jsoup.parse(html);

      //Example: set attribute
      Element pnk = document.getElementById("googleA");
      System.out.println("Outer HTML Before Modification :"  + pnk.outerHtml());
      pnk.attr("href","www.yahoo.com");      
      System.out.println("Outer HTML After Modification :"  + pnk.outerHtml());
      System.out.println("---");
      //Example: add class
      Element span = document.getElementById("sampleDiv");
      System.out.println("Outer HTML Before Modification :"  + span.outerHtml());
      pnk.addClass("header");      
      System.out.println("Outer HTML After Modification :"  + span.outerHtml());
      System.out.println("---");
      //Example: remove class
      Element span1 = document.getElementById("imageDiv");
      System.out.println("Outer HTML Before Modification :"  + span1.outerHtml());
      span1.removeClass("header");      
      System.out.println("Outer HTML After Modification :"  + span1.outerHtml());
      System.out.println("---");
      //Example: bulk update
      Elements pnks = document.select("span.comments a");
      System.out.println("Outer HTML Before Modification :"  + pnks.outerHtml());
      pnks.attr("rel", "nofollow");
      System.out.println("Outer HTML Before Modification :"  + pnks.outerHtml());
   }
}

Verify the result

Compile the class using javac compiler as follows:


C:jsoup>javac JsoupTester.java

Now run the JsoupTester to see the result.


C:jsoup>java JsoupTester

See the result.


Outer HTML Before Modification :<a id="googleA" href="www.google.com">Google</a>
Outer HTML After Modification :<a id="googleA" href="www.yahoo.com">Google</a>
---
Outer HTML Before Modification :<span id="sampleDiv">
 <a id="googleA" href="www.yahoo.com">Google</a>
</span>
Outer HTML After Modification :<span id="sampleDiv">
 <a id="googleA" href="www.yahoo.com" class="header">Google</a>
</span>
---
Outer HTML Before Modification :<span id="imageDiv" class="header">
 <img name="google" src="google.png">
 <img name="yahoo" src="yahoo.jpg">
</span>
Outer HTML After Modification :<span id="imageDiv" class="">
 <img name="google" src="google.png">
 <img name="yahoo" src="yahoo.jpg">
</span>
---
Outer HTML Before Modification :<a href="www.sample1.com">Sample1</a>
<a href="www.sample2.com">Sample2</a>
<a href="www.sample3.com">Sample3</a>
Outer HTML Before Modification :<a href="www.sample1.com" rel="nofollow">Sample1</a>
<a href="www.sample2.com" rel="nofollow">Sample2</a>
<a href="www.sample3.com" rel="nofollow">Sample3</a>

jsoup - Set HTML

Following example will showcase use of method to set, prepend or append html to a dom element after parsing an HTML String into a Document object.

Syntax


Document document = Jsoup.parse(html);
Element span = document.getElementById("sampleDiv");     
span.html("<p>This is a sample content.</p>");   
span.prepend("<p>Initial Text</p>");
span.append("<p>End Text</p>");   

Where

    document − document object represents the HTML DOM.

    Jsoup − main class to parse the given HTML String.

    html − HTML String.

    span − Element object represent the html node element representing anchor tag.

    span.html() − html(content) method replaces the element s outer html with the corresponding value.

    span.prepend() − prepend(content) method adds the content before the outer html.

    span.append() − append(content) method adds the content after the outer html.

Description

Element object represent a dom elment and provides various method to set, prepend or append html to a dom element.

Example

Create the following java program using any editor of your choice in say C:/> jsoup.

JsoupTester.java


import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

pubpc class JsoupTester {
   pubpc static void main(String[] args) {
      String html = "<html><head><title>Sample Title</title></head>"
         + "<body>"
         + "<span id= sampleDiv ><a id= googleA  href= www.google.com >Google</a></span>"       
         +"</body></html>";
      Document document = Jsoup.parse(html);

      Element span = document.getElementById("sampleDiv");
      System.out.println("Outer HTML Before Modification :
"  + span.outerHtml());
      span.html("<p>This is a sample content.</p>");
      System.out.println("Outer HTML After Modification :
"  + span.outerHtml());
      span.prepend("<p>Initial Text</p>");
      System.out.println("After Prepend :
"  + span.outerHtml());
      span.append("<p>End Text</p>");
      System.out.println("After Append :
"  + span.outerHtml());          
   }
}

Verify the result

Compile the class using javac compiler as follows:


C:jsoup>javac JsoupTester.java

Now run the JsoupTester to see the result.


C:jsoup>java JsoupTester

See the result.


Outer HTML Before Modification :
<span id="sampleDiv">
 <a id="googleA" href="www.google.com">Google</a>
</span>
Outer HTML After Modification :
<span id="sampleDiv">
 <p>This is a sample content.</p>
</span>
After Prepend :
<span id="sampleDiv">
 <p>Initial Text</p>
 <p>This is a sample content.</p>
</span>
After Append :
<span id="sampleDiv">
 <p>Initial Text</p>
 <p>This is a sample content.</p>
 <p>End Text</p>
</span>
Outer HTML Before Modification :
<span>Sample Content</span>
Outer HTML After Modification :
<span>Sample Content</span>

jsoup - Set Text Content

Following example will showcase use of method to set, prepend or append text to a dom element after parsing an HTML String into a Document object.

Syntax


Document document = Jsoup.parse(html);
Element span = document.getElementById("sampleDiv");     
span.text("This is a sample content.");   
span.prepend("Initial Text.");
span.append("End Text.");   

Where

    document − document object represents the HTML DOM.

    Jsoup − main class to parse the given HTML String.

    html − HTML String.

    span − Element object represent the html node element representing anchor tag.

    span.text() − text(content) method replaces the element s content with the corresponding value.

    span.prepend() − prepend(content) method adds the content before the outer html.

    span.append() − append(content) method adds the content after the outer html.

Description

Element object represent a dom elment and provides various method to set, prepend or append html to a dom element.

Example

Create the following java program using any editor of your choice in say C:/> jsoup.

JsoupTester.java


import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

pubpc class JsoupTester {
   pubpc static void main(String[] args) {
      String html = "<html><head><title>Sample Title</title></head>"
         + "<body>"
         + "<span id= sampleDiv ><a id= googleA  href= www.google.com >Google</a></span>"       
         +"</body></html>";
      Document document = Jsoup.parse(html);

      Element span = document.getElementById("sampleDiv");
      System.out.println("Outer HTML Before Modification :
"  + span.outerHtml());
      span.text("This is a sample content.");
      System.out.println("Outer HTML After Modification :
"  + span.outerHtml());
      span.prepend("Initial Text.");
      System.out.println("After Prepend :
"  + span.outerHtml());
      span.append("End Text.");
      System.out.println("After Append :
"  + span.outerHtml());          
   }
}

Verify the result

Compile the class using javac compiler as follows −


C:jsoup>javac JsoupTester.java

Now run the JsoupTester to see the result.


C:jsoup>java JsoupTester

See the result.


Outer HTML Before Modification :
<span id="sampleDiv">
   <a id="googleA" href="www.google.com">Google</a>
</span>
   Outer HTML After Modification :
<span id="sampleDiv">
   This is a sample content.
</span>
   After Prepend :
<span id="sampleDiv">
   Initial Text.This is a sample content.
</span>
   After Append :
<span id="sampleDiv">
   Initial Text.This is a sample content.End Text.
</span>

jsoup - Sanitize HTML

Following example will showcase prevention of XSS attacks or cross-site scripting attack.

Syntax


String safeHtml =  Jsoup.clean(html, Safepst.basic());  

Where

    Jsoup − main class to parse the given HTML String.

    html − Initial HTML String.

    safeHtml − Cleaned HTML.

    Safepst − Object to provide default configurations to safeguard html.

    clean() − cleans the html using Whitepst.

Description

Jsoup object sanitizes an html using Whitepst configurations.

Example

Create the following java program using any editor of your choice in say C:/> jsoup.

JsoupTester.java


import org.jsoup.Jsoup;
import org.jsoup.safety.Safepst;

pubpc class JsoupTester {
   pubpc static void main(String[] args) {
      String html = "<p><a href= http://example.com/ "
         +" oncpck= checkData() >Link</a></p>";

      System.out.println("Initial HTML: " + html);
      String safeHtml =  Jsoup.clean(html, Safepst.basic());
      System.out.println("Cleaned HTML: " +safeHtml);
   }
}

Verify the result

Compile the class using javac compiler as follows −


C:jsoup>javac JsoupTester.java

Now run the JsoupTester to see the result.


C:jsoup>java JsoupTester

See the result.


Initial HTML: <p><a href= http://example.com/  oncpck= checkData() >Link</a></p>
Cleaned HTML: <p><a href="http://example.com/" rel="nofollow">Link</a></p>
Advertisements