All about JSoup To Fetch Information From HTML
Last few (Jsoup Information) days I was trying to get data from an HTML page that is very much dynamic in nature.
In another word, I need a good HTML parser to read HTML files.
I tried with few readily available solutions. And I am happy to say that among all the solutions JSOUP is quite satisfactory.
JSoup A Nice Initiative To Fetch Information From HTML
Let me tell you what is good for this.
- it is a java library that can be easily attached to a leading Java editor(I tried with JDeveloper and Eclipse)
- It can be used to fetch and manipulate HTML data.
- Very useful for the report analysis.
- It can find the exact data with very easy steps.
- Very minimal code is required.
- Useful for structured and unstructured HTML.
- It is open-source and the code is available on Github
JSoup A Nice Initiative To Fetch Information From HTML
The documentation is available here.
The source code is available here.
Very nice examples and discussions can be found in the below links..
- http://www.mkyong.com/java/jsoup-html-parser-hello-world-examples/
- http://stackoverflow.com/questions/tagged/jsoup
- http://stackoverflow.com/questions/15853002/extract-and-parse-html-table-using-jsoup
- http://stackoverflow.com/questions/6236972/jsoup-second-element-instead-of-first
- http://java.dzone.com/articles/htmlunit-vs-jsoup-html-parsing
- http://stackoverflow.com/questions/12361925/html-parsing-with-jsoup
The jar can be downloaded from here
How To Read Data From HTML Via JSoup In Java?
How To Read Data From HTML Via JSoup In Java
I am having a requirement where an URL will be provided to me. where there will be multiple tables. Table 1 talks about the summary report and table 2 talks about the detailed report.
My objective is to get data from the first table.
I just checked the table class by seeing the source code. It is table[class=details]
In this table, I have one header and one row of data. The header will give me the table header information.
As I am processing a test result so it is having Total test, Pass, fail, time to execute, etc info
Let’s see how to fetch that info…
How To Read Data From HTML Via JSoup In Java
import java.io.IOException;
import java.net.URL;
import java.util.Iterator;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class readHTML {
public String getValues(String url) throws IOException
{
URL getUrl=new URL(gUrl);
Document doc = Jsoup.parse(getUrl, 3000);
Element table = doc.select("table[class=details]").first();
// As i Want to fetch table with details class
Iterator<Element> iteh = table.select("th").iterator();
//This is for fetching header values
String test= iteh.next().text();
String fail= iteh.next().text();
String err= iteh.next().text();
String knonIss=iteh.next().text();
String pass= iteh.next().text();
String skip= iteh.next().text();
String suc_rate=iteh.next().text();
String time=iteh.next().text();
Iterator<Element> ite = table.select("td").iterator();
//This is for fetching row values
String testV=ite.next().text();
String failV=ite.next().text();
String errorV=ite.next().text();
String knownIssueV=ite.next().text();
String passV=ite.next().text();
String skipV=ite.next().text();
String sucv=ite.next().text().split(":")[1].split("%")[0].trim();
String timeV=ite.next().text();
System.out.println("Value of: " +test+ " is " + testV );
System.out.println("Value of: " +fail+ " is " + failV);
System.out.println("Value of: " +err+ " is " +errorV);
System.out.println("Value of: " +knonIss+ " is " +knownIssueV );
System.out.println("Value of: " +pass+ " is " +passV);
System.out.println("Value of: " +skip+ " is " +skipV );
System.out.println("Value of: " +suc_rate+ " is " + sucv);
System.out.println("Value of: " +time+ " is " timeV );
}
}
I have successfully printed this info. Over to you. Try it and let me know what do you feel about jSoup.