Key Responsibilities:
- Lead the testing strategy for GenAI-powered features of the SDLC automation platform, focusing on the relevance and accuracy of AI-generated results.
- Design and implement comprehensive test plans for GenAI features, including use-case validation, model performance testing, and output validation for correctness and consistency.
- Collaborate closely with the AI/ML and development teams to define quality standards for GenAI features, ensuring they align with user expectations and business goals.
- Develop test scripts to automate the validation of AI-generated code, ensuring the correct application of business logic, coding standards, and security protocols.
- Evaluate AI models for their performance across diverse scenarios, continuously testing for edge cases and data biases.
- Conduct manual and automated testing to validate the integration of GenAI features with other platform components, ensuring seamless functionality.
- Provide feedback on the improvement of AI models based on test results, working iteratively with AI engineers to enhance accuracy and reliability.
- Perform continuous evaluation of GenAI model relevance, assessing results for errors, hallucinations, and unwanted behavior, and collaborating with the product team to adjust the scope of features.
- Lead exploratory testing for new features, identifying potential failure points, and ensuring high-quality releases.
- Participate in AI model monitoring post-deployment to ensure ongoing accuracy and relevance of generated content as data evolves.
Key Skills & Requirements:
- Experience: 5+ years in software quality assurance, with a minimum of 2+ years focused on testing AI/ML platforms or GenAI models.
- In-depth understanding of AI/ML testing frameworks and methodologies, including model validation, bias testing, and relevance assessment.
- Hands-on experience testing AI/ML-driven applications, focusing on generated output quality, including accuracy, relevance, and user satisfaction.
- Strong proficiency in test automation tools (e.g., Selenium, Appium, JUnit) and scripting languages (e.g., Python, Java) for testing AI-based platforms.
- Experience in evaluating AI models for correctness and relevance, including testing for edge cases and unexpected results.
- Familiarity with CI/CD pipelines and automated testing in a GenAI context.
- Ability to collaborate effectively with AI/ML teams, providing feedback on model performance and quality improvements.
- Problem-solving mindset with a focus on identifying weaknesses or inaccuracies in AI-generated outputs.
- Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).
Preferred Qualifications:
- Experience with testing GenAI platforms, including text, code, or image generation applications.
- Knowledge of model training techniques and how they affect output variability and accuracy.
- Experience with cloud infrastructure testing (AWS, Azure, or GCP), especially for AI services.
- Understanding of Natural Language Processing (NLP) and how it impacts the quality of generated results.
- Familiarity with A/B testing frameworks for AI model performance evaluation.
- ISTQB or equivalent certification is a plus.
Why Join DhiWise?
- Be a key player in the development of a transformative GenAI-driven SDLC automation platform.
- Work at the cutting edge of AI and software automation, with a focus on real-world use cases and impact.
- Opportunity for professional growth and leadership in a rapidly evolving domain.
About DhiWise:
At DhiWise, we are revolutionizing the software development lifecycle (SDLC) by building a cutting-edge automation platform that leverages the power of Generative AI (GenAI) to automate development from client brief to code generation. Our platform’s effectiveness depends on the accuracy and relevance of AI-generated outputs, and we are expanding our QA team to ensure that these outcomes meet the highest standards.
We are seeking Senior QA Engineers and a QA team who can bring expertise in testing GenAI platforms, with a specific focus on evaluating the relevance, accuracy, and reliability of AI-generated results in the context of development workflows.