Working with HTML Tables using Python Selenium with Pytest
Oct 24, 2024 6 Min Read 800 Views
(Last Updated)
Working with HTML tables is a crucial aspect of web automation and testing, especially when you’re dealing with dynamic web content.
In this article, we’ll explore how to interact with HTML tables using Python Selenium in combination with Pytest, a powerful testing framework. Whether you’re extracting data, validating table content, or automating tasks like pagination, Selenium offers a robust set of tools to simplify the process.
Without any delay, let us get started!
Table of contents
- Working with HTML Tables using Python Selenium with Pytest
- The Power of the Combination
- Web Scraping and Data Extraction:
- Web Testing and Automation:
- Data Entry and Automation:
- Web Monitoring and Scraping:
- Key Considerations and Challenges
- Project Structure
- Code & its description
- Import The Necessary Modules
- Create an Instance of the SumanHTMLTable Class
- Define the Test Class
- Test Case 1: Start Python Selenium Automation
- Test Case 2: Positive Test Case - Data Validation
- Test Case 3: Stop Python Selenium Automation
- Conclusion
Working with HTML Tables using Python Selenium with Pytest
Before diving into the practical aspects of using Python Selenium and HTML tables together, let’s establish a solid foundation.
Python: A versatile and widely used programming language, Python is known for its readability and ease of use. Its extensive ecosystem of libraries and frameworks makes it a popular choice for various tasks, including web automation.
Selenium: A powerful toolset designed for automating web browsers, Selenium enables you to interact with web applications as a real user would. It supports multiple programming languages, including Python, and can be used to perform actions like clicking buttons, filling forms, and extracting data from web pages.
HTML Tables: A fundamental structure in HTML (HyperText Markup Language) for organizing data into rows and columns. They are commonly used to display tabular information in a structured and visually appealing manner.
The Power of the Combination
When combined, Python Selenium and HTML tables offer a robust solution for automating various web-related tasks. Here’s a breakdown of their key benefits:
1. Web Scraping and Data Extraction:
- Efficiently extract data: Selenium allows you to navigate web pages, locate elements, and extract data from HTML tables.
- Handle dynamic content: Selenium can handle dynamic content, which is often a challenge for traditional web scraping methods.
- Clean and organize data: Extract data from HTML tables and transform it into a structured format for further analysis or processing.
2. Web Testing and Automation:
- Create automated tests: Selenium can be used to automate repetitive testing tasks, ensuring consistent quality and reducing manual effort.
- Test web applications: Verify the functionality of web applications by simulating user interactions and validating results.
- Identify and report issues: Automatically detect and report errors or inconsistencies in web applications.
3. Data Entry and Automation:
- Automate repetitive tasks: Automate data entry processes, saving time and reducing errors.
- Fill forms and submit data: Use Selenium to interact with web forms, fill in fields, and submit data.
- Integrate with other systems: Automate data transfer between web applications and other systems.
4. Web Monitoring and Scraping:
- Monitor websites: Track changes on websites and receive notifications when specific conditions are met.
- Scrape data regularly: Automatically extract data from websites on a scheduled basis.
- Analyze trends and insights: Use scraped data to gain valuable insights and make informed decisions.
Key Considerations and Challenges
While the combination of Python Selenium and HTML tables offers numerous advantages, it’s important to consider the following:
- Website structure and design: The structure and design of the target website can impact the ease of extracting data or performing actions.
- Dynamic content and JavaScript: Dealing with dynamic content and JavaScript-heavy websites may require additional techniques and libraries.
- Ethical considerations: Ensure that your web automation activities comply with website terms of service and avoid excessive load on servers.
- Browser compatibility: Test your scripts across different browsers to ensure compatibility and avoid unexpected behavior.
In the Next Part…
In the following sections, we will delve deeper into the practical aspects of using Python Selenium and HTML tables. We will explore specific use cases, provide code examples, and discuss advanced techniques to overcome common challenges.
Project Structure
This project structure suggests a separation of concerns for testing an HTML table functionality using Python Selenium for automation and Pytest framework for testing. Here is the breakdown of the same:-
Directories:
- PageObjects (folder): This folder stores reusable code for interacting with specific page elements on your blog. It follows a common practice in test automation frameworks where page elements are encapsulated for better maintainability and reusability.
- HTMLTable.py: This file contains Python Selenium automation classes representing HTML table elements on your blog pages. These classes could encapsulate methods for locating the table, extracting data from it, and potentially interacting with its elements.
- __init__.py: This empty file is a convention in Python to signify that the folder (PageObjects) is a Python package.
- HTMLTable.py: This file contains Python Selenium automation classes representing HTML table elements on your blog pages. These classes could encapsulate methods for locating the table, extracting data from it, and potentially interacting with its elements.
- Reports (folder): This folder stores Pytest reports generated during testing in HTML format.
- TestCases (folder): This folder stores your actual test cases written with a Pytest testing framework.
- test_HTMLTable.py: This file contains Python functions representing individual test cases that focus on verifying the functionality of HTML tables on your blog. It would probably import the HTMLTable class from PageObjects and use its methods for testing different aspects of the tables.
- __init__.py: Similar to PageObjects, this empty file signifies that TestCases is a Python package.
- test_HTMLTable.py: This file contains Python functions representing individual test cases that focus on verifying the functionality of HTML tables on your blog. It would probably import the HTMLTable class from PageObjects and use its methods for testing different aspects of the tables.
Overall, this structure promotes the proper implementation of the Page Object Model (POM) :
- PageObjects : Reusable code for interacting with HTML elements.
- TestCases : Specific tests focused on verifying functionality.
- Reports: Output generated during testing.
This approach improves code organization, and maintainability, and helps you write cleaner and more focused tests.
Code & its description
The following is the Python Selenium automation code given below which is described as follows:-
"""
HTMLTable.py
Working with HTML Tables & Python Selenium XPATH
"""
from selenium import webdriver
from selenium.common import NoSuchElementException
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
# Python Class to hold Data for the Python Selenium Automation code
class Data:
url = "https://suman-table.netlify.app/"
# Python Class to hold Web Element Locators for the Python Selenium Automation code
class Locators:
table_xpath = "//table[@id='table_1']/tbody/tr/td"
rows_finder = "//table[@id='table_1']/tbody/tr"
column_finder = "//table[@id='table_1']/thead/tr/th"
# Main Execution Class for Python Selenium Automation
class SumanHTMLTable(Data, Locators):
def __init__(self):
self.driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
# Method : To start Python Selenium Automation
def start(self):
self.driver.maximize_window()
self.driver.get(self.url)
self.driver.implicitly_wait(10)
return True
# Method : To Shutdown/Stop/Close Python Selenium Automation
def shutdown(self):
self.driver.quit()
return None
# Method : To count total rows in a HTML table
def row_count(self):
try:
rows = self.driver.find_elements(by=By.XPATH, value=self.rows_finder)
return len(rows)
except NoSuchElementException as error:
print(error)
return False
# Method : To count total columns in a HTML table
def column_count(self):
try:
column_count = self.driver.find_elements(by=By.XPATH, value=self.column_finder)
return len(column_count)
except NoSuchElementException as error:
print(error)
return False
# Method : To validate the presence of data inside HTML table
def data_validator(self, table_data):
try:
data_finder = "//table[@id='table_1']/tbody//child::tr//child::td[text()='{data}']".format(data = table_data)
if self.driver.find_element(by=By.XPATH, value=data_finder):
return True
except NoSuchElementException as error:
print(error)
return False
finally:
self.driver.quit()
Here’s a line-by-line breakdown of the Python code:
Lines 1-4:
- from selenium import webdriver – Imports the webdriver class from the selenium library. This class is used to control a web browser for automation purposes.
- from selenium.common import NoSuchElementException – Imports the NoSuchElementException class from the selenium.common submodule. This exception is raised when an element is not found on the webpage.
- from selenium.webdriver.chrome.service import Service – Imports the Service class from the selenium.webdriver.chrome submodule. This class is used to manage a service object for the Chrome driver.
- from selenium.webdriver.common.by import By – Imports the By class from the selenium.webdriver.common submodule. This class provides different ways to locate web elements on a webpage.
- from webdriver_manager.chrome import ChromeDriverManager – Imports the ChromeDriverManager class from the webdriver_manager.chrome library. This class helps in automatically downloading and installing the appropriate Chrome driver based on your system.
Lines 6-9:
- This defines a Python class named Data. It acts as a container to hold data relevant to the automation script. It has a single attribute:
- url = “https://suman-table.netlify.app/” – This attribute stores the URL of the web page containing the HTML table we want to work with.
Lines 11-14:
- This defines another Python class named Locators. It acts as a container to hold the XPath expressions used to locate specific elements on the webpage. It has three attributes :
- table_xpath = “//table[@id=’table_1′]/tbody/tr/td” – This XPath expression locates all the data cells (identified by the td tag) within the table with ID “table_1”.
- rows_finder = “//table[@id=’table_1′]/tbody/tr” – This XPath expression locates all the rows (identified by the tr tag) within the table with ID “table_1”.
- column_finder = “//table[@id=’table_1′]/thead/tr/th” – This XPath expression locates all the header cells (identified by the th tag) within the table with ID “table_1”.
- table_xpath = “//table[@id=’table_1′]/tbody/tr/td” – This XPath expression locates all the data cells (identified by the td tag) within the table with ID “table_1”.
Lines 16-24:
- This defines the main class named SumanHTMLTable. It inherits from both Data and Locators classes, meaning it has access to the attributes defined in those classes. This class handles the main execution logic of the script.
- def __init__(self): – This is the constructor method that gets called when an instance of this class is created.
- self.driver = webdriver.Chrome(service=Service(ChromeDriverManager().install())) – This line creates a Chrome WebDriver instance. The Service object ensures the appropriate Chrome driver is downloaded and used.
- self.driver = webdriver.Chrome(service=Service(ChromeDriverManager().install())) – This line creates a Chrome WebDriver instance. The Service object ensures the appropriate Chrome driver is downloaded and used.
- def start(self): – This method starts the automation process.
- self.driver.maximize_window() – Maximizes the browser window.
- self.driver.get(self.url) – Opens the URL stored in the url attribute.
- self.driver.implicitly_wait(10) – Sets an implicit wait of 10 seconds for the browser to locate elements before throwing an exception.
- return True – Returns True indicating successful execution.
- self.driver.maximize_window() – Maximizes the browser window.
- def shutdown(self): – This method closes the browser window and quits the WebDriver instance.
- self.driver.quit() – Quits the WebDriver instance.
- return None – Returns None to indicate the method doesn’t explicitly return a value.
- self.driver.quit() – Quits the WebDriver instance.
- def __init__(self): – This is the constructor method that gets called when an instance of this class is created.
Lines 26-34:
- def row_count(self): – This method calculates the total number of rows in the HTML table.
- try: – Starts a try block to handle potential exceptions.
- rows = self.driver.find_elements(by=By.XPATH, value=self.rows_finder) – Finds all the elements matching the rows_finder XPath (which locates all table rows) using the find_elements method.
- return len(rows) – Returns the total number of rows found.
- rows = self.driver.find_elements(by=By.XPATH, value=self.rows_finder) – Finds all the elements matching the rows_finder XPath (which locates all table rows) using the find_elements method.
- except NoSuchElementException as error: – Catches the NoSuchElementException in case the table or rows are not found.
- print(error) – Prints the error message.
- return False
- try: – Starts a try block to handle potential exceptions.
from PageObjects.HTMLTable import SumanHTMLTable
import pytest
# Create the Object of the Class SumanHTMLTable()
suman = SumanHTMLTable()
# Test Scenario
class TestPresenceOfDataInsideHTMLTable:
# Test Case : Start Python Selenium Automation
def test_start_automation(self):
assert suman.start() == True
print("SUCCESS : Python Selenium Automation Started !")
# Positive Test Case
def test_data_inside_table_1(self):
test_data = "Venkat"
assert suman.data_validator(test_data) == True
print("SUCCESS : {data} is present inside HTML table".format(data=test_data))
# Test Case : Stop Python Selenium Automation
def test_stop_automation(self):
assert suman.shutdown() == None
print("SUCCESS : Python Selenium Automation Stopped !")
Import The Necessary Modules
- from PageObjects.HTMLTable import SumanHTMLTable: Imports the SumanHTMLTable class from the PageObjects.HTMLTable module. This class likely contains methods for interacting with HTML tables.
- import pytest: Imports the pytest testing framework, which is used for writing and running automated tests.
Create an Instance of the SumanHTMLTable Class
- suman = SumanHTMLTable(): Creates an object named suman of the SumanHTMLTable class. This object will be used to access the methods defined in the class.
Define the Test Class
- class TestPresenceOfDataInsideHTMLTable:: Defines a class named TestPresenceOfDataInsideHTMLTable. This class will contain the test cases.
Test Case 1: Start Python Selenium Automation
- def test_start_automation(self):: Defines a test case named test_start_automation.
- assert suman.start() == True: Asserts that the start() method of the suman object returns True. This likely starts the Selenium WebDriver and navigates to the required webpage.
- print(“SUCCESS : Python Selenium Automation Started !”): Prints a success message if the assertion passes.
Test Case 2: Positive Test Case – Data Validation
- def test_data_inside_table_1(self):: Defines a test case named test_data_inside_table_1
- test_data = “Venkat”: Sets a variable test_data to the value “Venkat”.
- assert suman.data_validator(test_data) == True: Asserts that the data_validator() method of the suman object returns True when passed the test_data. This likely checks if the given data is present within the HTML table.
- print(“SUCCESS : {data} is present inside HTML table”.format(data=test_data)): Prints a success message if the assertion passes, indicating that the data is present.
Test Case 3: Stop Python Selenium Automation
- def test_stop_automation(self):: Defines a test case named test_stop_automation.
- assert suman.shutdown() == None: Asserts that the shutdown() method of the suman object returns None. This likely stops the Selenium WebDriver.
- print(“SUCCESS : Python Selenium Automation Stopped !”): Prints a success message if the assertion passes.
In case, you want to learn more about Java Automation Testing and how it enhances the testing process, consider enrolling for GUVI’s Certified Java Full-stack Developer Course that teaches you everything from scratch and make sure you master it!
Conclusion
In this blog, we have explored the power of Python Selenium automation for validating HTML tables. By leveraging the flexibility and efficiency of Python, we were able to create a robust and reusable code snippet that can effectively check for the presence of specific data within HTML tables.
The provided XPATH, crafted using Dynamic Axes, offers a versatile approach that can traverse and inspect various HTML table structures. However, it’s essential to note that the effectiveness of this XPATH depends on the adherence to proper HTML semantics in the target web application. If the HTML tables deviate from the standard structure, the XPATH may require modifications to ensure accurate validation.
Beyond the core functionality, this code serves as a valuable foundation for Python automation testers seeking to automate the validation of HTML tables in their projects. By understanding the underlying concepts and adapting the code to specific use cases, testers can streamline their testing processes and enhance the overall quality of their applications.
Key takeaways from this Blog :
- Python Selenium automation is a powerful tool for validating HTML tables.
- The provided XPATH offers a versatile approach for inspecting HTML table structures.
- Adherence to proper HTML semantics is crucial for the effectiveness of the XPATH.
- The code serves as a valuable foundation for Python automation testers.
Additional considerations:
- Error Handling: Implement error handling mechanisms to gracefully handle unexpected scenarios, such as when the target element is not found or when the validation fails.
- Performance Optimization: Consider performance optimization techniques to ensure that the automation scripts execute efficiently, especially when dealing with large or complex HTML tables.
- Integration with testing frameworks: Integrate the code with popular testing frameworks like pytest or unittest to leverage their features and benefits.
By incorporating these considerations and building upon the knowledge gained from this blog, you can effectively leverage Python Selenium automation to validate HTML tables in your projects, ensuring the accuracy and reliability of your web applications.
Did you enjoy this article?