Tag Archives: selenium

Selenium / Python Notes June 2016

For the past 24 hours I’ve been jumping into some Selenium fun in Python.

Notes:

  • Firefox 47 doesn’t work, however Firefox 47.0.1 does.
Selenium Setup


from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.proxy import *
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

def init_driver():
driver = webdriver.Firefox()
driver.wait = WebDriverWait(driver, 5)
return driver

def lookup(driver):
driver.get(“http://www.google.com”)

if __name__ == “__main__”:
driver = init_driver()
lookup(driver)
time.sleep(5)
driver.quit()

Two ways to interact with an element on the page:


try:
textbox = driver.wait.until(EC.presence_of_element_located((By.ID, "formTxtBox")))
except TimeoutException:
print("Login Form Not Found!")

or

textbox = driver.find_element_by_id("formTxtBox")

Taking HTML and feeding it into BeautifulSoup


div = driver.find_element_by_id("id_of_div_here")
soup = BeautifulSoup(propertyDropDown.get_attribute('innerHTML'), 'html.parser')
#Do something with the soup....

Creating a list of dates (requires pandas)


import pandas as pd
dates = []
index = pd.date_range('2015-7-1', periods=52, freq='W-WED')
for i in index:
dates.append(i.to_datetime())
return dates

Convert Table into CSV with BeauitfulSoup

(there are probably 100 better ways to do this…)

#Get the element of the table using CSS Selector
table = driver.wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#table > tbody")))
#Take our table and pass it into BeautifulSoup so we can easily transverse it.
soup = BeautifulSoup(table.get_attribute('innerHTML'), 'html.parser')
#Make an empty list
rows = []
#Loop over the soup
for row in soup.find_all('tr'):
rows.append([val.text.strip('\n').encode('utf8').strip('\xc2\xa0') for val in row.find_all('td')])
 
#Save out to CSV.
with open('output_file.csv', 'ab') as f:
writer = csv.writer(f)
writer.writerows(row for row in rows if row)

Disclaimer: I’ve been programming in Python for about 5 minutes and using Selenium for about 2….while this code worked for my project, it’s most likely littered with errors and bad practices. This is mostly just a reference for myself for a project I might need to reefer back to this stuff in 6-12 months time. I am definitely not the person to ask for help from when it comes to this stuff.