omz:forum

frenchesco

@ccc Thanks for your response. It worked well, except that I couldn't get it to work without displaying the WebView, which I was hoping to avoid. I ended up using the method suggested by @JonB to get around this.

frenchesco

@JonB Thanks! That's exactly what I was looking for. For reference for anyone else that needs to do this in the future. Here's my final code:

# coding: utf-8
import ui
import threading

class Scraper (object):
	def __init__(self, url, js = 'document.documentElement.outerHTML'):
		self.wv = ui.WebView()
		self.wv.delegate = self
		self.wv.load_url(url)
		self.js = js
		self.response = ''
		self.ready_event = threading.Event()
		self.ready_event.wait()
	
	def webview_did_finish_load(self, webview):
		self.response = webview.eval_js(self.js)
		self.ready_event.set()

def main():
	r = Scraper('https://www.google.com', 'document.title;').response
	print 'Response: ' + r

if __name__ == '__main__':
	main()

I basically got rid of the callback and just get the final response when the processing returns to the main thread as that's where I want to continue processing.

frenchesco

I'm trying to implement a Web Scraper that uses ui.WebView to render a page and then return HTML from that page. The pages can be rendered with JavaScript so I can't use requests or mechanize like I normally would.

The problem I have at the moment with my current implementation (below) is that I want the main thread to wait until the page has finished rendering and then return the HTML of that page back to the main thread. At the moment the main thread finishes executing before the page has finished loading.

# coding: utf-8
import ui

class Scraper (object):
	def __init__(self, callback, url, js = 'document.documentElement.outerHTML'):
		self.wv = ui.WebView()
		self.wv.delegate = self
		self.wv.load_url(url)
		self.callback = callback
		self.js = js
	
	def webview_did_finish_load(self, webview):
		self.callback(webview.eval_js(self.js))

# Example:
def parse_response(response):
	print 'Webview finished loading - ' + response

def main():
	s = Scraper(parse_response, 'https://www.google.com', 'document.title;')
	# How can I wait for the Web View to finish loading here and return the HTML of the webpage before proceeding on the main thread?
	print 'Main thread finished executing'
if __name__ == '__main__':
	main()

Does anyone know the best way of resolving this issue?

frenchesco

@frenchesco

Latest posts made by frenchesco