Scraping a JavaScript object array from Website in Java. - Overclock.net - An Overclocking Community

Forum Jump: 

Scraping a JavaScript object array from Website in Java.

 
Thread Tools
post #1 of 3 (permalink) Old 02-12-2018, 03:57 PM - Thread Starter
New to Overclock.net
 
OverClockyNPC's Avatar
 
Join Date: Feb 2018
Posts: 1
Rep: 0
Scraping a JavaScript object array from Website in Java.

I am trying to scrape a website in Java, to extract some percentages from a table, which is this one (look the photo I've uploaded)

These percentages are rendered after the HTML source is processed. So we can know these elements are rendered via Javascript, which makes scraping harder (ops, problem)

So this is the difference between the element BEFORE being rendered:

Code:
 <div class="user_forecasts" id="57464" />
and AFTER being rendered:

Code:
 <div class="user_forecasts" id="57464"> <b>1</b> <div class="percents">61% | 25% | 14%</div> </div>
Obviously, I wanna get the "61% | 25% | 14%" string, and the rest of percents in the table...

Well, in fact, yes, it's rendered by Javascript, and I found the .js file and luckily I found the interesting part:

Code:
 // ajax user_forecast load - one call if ($('div.user_forecasts').length > 0) { $.ajax({ url: '/vote/percentage', global: false, type: 'GET', data: { a: $('#jornadaq').val() }, success: function(percentages) { perc_obj = eval(percentages); $('div.user_forecasts').each(function(ind, val) { if (ind == 14) { $(this).html("<b>" + perc_obj[ind].value + "</b><div class='percents'>" + perc_obj[ind].porcent + "%" + "</div>"); } else { $(this).html("<b>" + perc_obj[ind].forecast + "</b><div class='percents'>" + perc_obj[ind].local + "% | " + perc_obj[ind].tie + "% | " + perc_obj[ind].visitor + "%" + "</div>"); } }); } }); }
As you see, it's an AJAX call. I checked if I could get the percentages by pasting this code into the Chrome Developer Virtual Machine, and yes, I got what I wanted: the group of elements which contains the data I need for my program.

Please look the screenshot I've uploaded (Chrome Developer Virtual Machine)

The thing is I don't know how should I tell Java to code this XML Http Request and then get this data. What libraries do you recommend for this, and how could I use them especifically for this case?



Enviado desde mi SM-J530F mediante Tapatalk
OverClockyNPC is offline  
Sponsored Links
Advertisement
 
post #2 of 3 (permalink) Old 03-08-2018, 04:02 AM
C# Whisperer
 
chunkII123's Avatar
 
Join Date: Jan 2010
Location: Grand Rapids, MI
Posts: 200
Rep: 26 (Unique: 23)
Welcome Overclock.net, OverclockyNPC!

The long and short of gathering this data from the website you mentioned is utilizing the HttpURLConnection in lieu of a simple URLConnection. With that said, you need to add several request-header items. In your specific case, you're going to need to send 'Content-Type', 'Accept', 'Referer', 'User-Agent', and 'X-Requested-With' via GET to "https://www.quiniela15.com/vote/percentage?a=44".

Alternatively, and depending on your use-case, you may want to look into using the Spring Framework for Java. The framework provides WebRequest and AsyncWebRequest, which may be beneficial depending on the scope of your application. However, if you're only looking to learn Java, and the basis of your scraping-application is to learn, then implementing Spring would not be advised.

Hope the above info helps.

-Deuce

<div class="post-sig post-sig-limit shazam usersig-click"><div class="reparse-sig-lineheight"><div style="text-align:center;"><img alt="biggrin.gif" src="/images/smilies/biggrin.gif"><img alt="biggrin.gif" src="/images/smilies/biggrin.gif"><a href="http://valid.canardpc.com/show_oc.php?id=1376602" target="_blank"><span>CPU-Z Validation of The Deuce</span></a><img alt="biggrin.gif" src="/images/smilies/biggrin.gif"><img alt="biggrin.gif" src="/images/smilies/biggrin.gif"><br><a href="https://www.overclock.net/off-topic/169110-michigan-ocn-thread.html"><span style="font-size: 12px;"><span>MICHIGAN OVERCLOCKERS</span></span></a><span> </span><a href="https://www.overclock.net/amd-motherboards/593433-unofficial-foxconn-destroyer-club.html"><span style="font-size: 12px;"><span>|| Club Foxconn Destroyer</span></span></a><br></div></div></div>
chunkII123 is offline  
post #3 of 3 (permalink) Old 08-07-2018, 11:51 PM
New001
 
Join Date: Feb 2016
Posts: 2,668
If this is still active, you could Apache HttpClient to create the request (to the same url as the ajax request - quineila15.com/vote/percentage) with the same data (value contained in the node with id #jornadaq) and receive the response object.
Then you can parse the json response object using the org.json library into a java object (i'd assume a map (of maps)).
Once you work out how the response is formatted, make java object classes to reflect the levels of the tree in the json response, and use class types to navigate down the map and populate your objects.
spinFX is offline  
Reply

Quick Reply
Message:
Options

Register Now

In order to be able to post messages on the Overclock.net - An Overclocking Community forums, you must first register.
Please enter your desired user name, your email address and other required details in the form below.
User Name:
If you do not want to register, fill this field only and the name will be used as user name for your post.
Password
Please enter a password for your user account. Note that passwords are case-sensitive.
Password:
Confirm Password:
Email Address
Please enter a valid email address for yourself.
Email Address:

Log-in



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Show Printable Version Show Printable Version
Email this Page Email this Page


Forum Jump: 

Posting Rules  
You may post new threads
You may post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off