Skip to content Skip to sidebar Skip to footer

Convert Data To Dataframe In Python

With the help of @JaSON, here's a code that enables me to get the data in the table from local html and the code uses selenium from selenium import webdriver driver = webdriver.Ch

Solution 1:

I have modified your code to do a simple output. This is not very pythonic as it does not use vectorized creation of the Dataframe, but here is how it works. First set up pandas second set up a dataframe (but we don't know the columns yet) then set up the columns on the first pass (this will cause problems if there are variable column lengths Then input the values into the dataframe

import pandas as pd
from selenium import webdriver

driver = webdriver.Chrome("C:/chromedriver.exe")
driver.get('file:///C:/Users/Future/Desktop/local.html')
counter = len(driver.find_elements_by_id("Section3"))
xpath = "//div[@id='Section3']/following-sibling::div[count(preceding-sibling::div[@id='Section3'])={0} and count(following-sibling::div[@id='Section3'])={1}]"print(counter)

df = pd.Dataframe()

for i inrange(counter):
    print('\nRow #{} \n'.format(i + 1))
    _xpath = xpath.format(i + 1, counter - (i + 1))
    cells = driver.find_elements_by_xpath(_xpath)
    if i == 0:
        df = pd.DataFrame(columns=cells) # fill the dataframe with the column namesfor cell in cells:
        value = cell.find_element_by_xpath(".//td").text
        #print(value)ifnot value:  # check the string is not empty# always puting the value in the first item
            df.at[i, 0] = value # put the value in the frame

df.to_csv('filename.txt') # output the dataframe to a file

How this could be made better is to put the items in a row into a dictionary and put them into the datframe. but I am writing this on my phone so I cannot test that.

Solution 2:

With the great help of @Paul Brennan, I could modify the code so as to get the final desired output

import pandas as pd
from selenium import webdriver

driver = webdriver.Chrome("C:/chromedriver.exe")
driver.get('file:///C:/Users/Future/Desktop/local.html')
counter = len(driver.find_elements_by_id("Section3"))
xpath = "//div[@id='Section3']/following-sibling::div[count(preceding-sibling::div[@id='Section3'])={0} and count(following-sibling::div[@id='Section3'])={1}]"
finallist = []

for i inrange(counter):
    #print('\nRow #{} \n'.format(i + 1))
    rowlist=[]
    _xpath = xpath.format(i + 1, counter - (i + 1))
    cells = driver.find_elements_by_xpath(_xpath)
    #if i == 0:#df = pd.DataFrame(columns=cells) # fill the dataframe with the column namesfor cell in cells:
        try:
            value = cell.find_element_by_xpath(".//td").text
            rowlist.append(value)
        except:
            break
    finallist.append(rowlist)
    
df = pd.DataFrame(finallist)
df[df.columns[[2, 0, 1, 7, 9, 8, 3, 5, 6, 4]]]

The code works well now but it is too slow. Is there a way to make it faster?

Post a Comment for "Convert Data To Dataframe In Python"