Senior Software Engineer (R-19020)
Eyeota
We are looking for skilled Data Engineer to join our Global Product Data (GPD) team in Hyderabad. You will play a critical role in building and maintaining automated web scraping pipelines that extract structured data from diverse online sources, transforming raw data into production-ready datasets for our Master Data Repository (MDR).
This role is part of a strategic initiative to bring web scraping and data acquisition capabilities in-house, replacing external vendor dependencies. You will work closely with the data engineering and product teams to ensure high-quality, reliable, and timely data delivery.
Key Responsibilities:
- Design, develop, and maintain scalable web scraping solutions to extract data from a wide range of websites and online platforms
- Build robust data pipelines and automation workflows for data collection, cleaning, validation, and transformation
- Process and prepare scraped data into MDR production-ready formats, meeting strict quality and timeline requirements
- Monitor and troubleshoot scraping jobs, handling anti-bot mechanisms, CAPTCHAs, rate limiting, and site structure changes
- Collaborate with cross-functional teams to understand data requirements, prioritize sources, and define scraping specifications
- Document scraping processes, data schemas, and technical decisions for knowledge sharing and continuity
- Identify opportunities for process improvement and automation to increase efficiency and reduce turnaround time
- Support the transition of work from external vendors, ensuring seamless continuity of data deliveries
Key Skills:
- 8+ years of professional experience in web scraping, data extraction, or data engineering
- Strong proficiency in Python, with hands-on experience using scraping libraries and frameworks (Scrapy, BeautifulSoup, Selenium, Playwright, or similar)
- Experience building and scheduling automated data pipelines (cron, Airflow, or equivalent orchestration tools)
- Solid understanding of HTML, CSS, DOM structure, and browser developer tools for inspecting and reverse-engineering web pages
- Familiarity with REST APIs, JSON, and techniques for extracting data from API endpoints
- Experience with relational databases (PostgreSQL, MySQL) and proficiency in SQL
- Ability to handle anti-scraping measures: proxy rotation, headless browsers, CAPTCHA handling, and request throttling
- Strong problem-solving skills and attention to data quality and accuracy
Good to have Skills:
- Experience with cloud platforms (AWS, GCP, or Azure) for deploying and scaling scraping infrastructure
- Familiarity with containerization (Docker) and CI/CD pipelines
- Experience with data transformation tools or ETL frameworks
- Knowledge of natural language processing (NLP) or AI-assisted data extraction techniques
- Prior experience in education data, institutional data, or similar structured-data domains
- Experience with NoSQL databases (MongoDB, Elasticsearch) for handling semi-structured data

