Simply Testable Updates #111: Performance++ Released, URL Discovery Encoding Bug Fixed
October 15, 2014
|This is the online archive for the Simply Testable weekly behind-the-scenes newsletter.
Subscribe to get weekly updates on the latest changes and the newest planned features.
This is the 111th of weekly progress updates on the development of Simply Testable, your professional automated web frontend testing service providing one-click testing for your entire site.
This week saw the release of recent performance improvements (big improvements!). I also fixed a bug in the URL discovery process relating to character encoding.
I turned the task assignment process inside out and in doing so significantly decreased the time it takes to carry out tests.
Quick recap: the core application is the brains of the operation. When you start a new test, the core application figures out what is to be tested and from this creates a collection of test tasks. Each test task performs a specific test against a specifc URL. Tasks are farmed out to workers to carry out the actual testing.
Previously, the core application assigned the next set of tasks out to workers every now and again. No consideration was made with regards the worker receiving the set of tasks and neither was any consideration made regarding how busy each worker was.
This process has now been turned inside out, with workers requesting additional tasks to work on when they're not busy. This has resulted in workers no longer occaisonally sitting idle when there is still more work to do.
Maximum task throughput has increased from 280 tasks per minute to 410 tasks per minute and many test times have been reduced to nearly half of what they previously were.
I wrote a detailed blog post covering this change.
URL Discovery Character Encoding Bug Fixed
URL discovery is the process that the system goes through when finding pages to test for a site.
When checking a specific page for new URLs, the URL discovery driver will first parse the HTML for a page into a DOM to allow the structured content to be understood. That's how pretty much anything that needs to extract information from a web page will work.
The PHP DOMDocument object that is used to do this apparently doesn't always pay close attention to the character encoding specified in the web page.
Correct understanding of the character encoding is essential. All characters are ultimately, at the lowest level of abstraction, stored as a series of 0s and 1s. There are many different character encoding specifications, with 8-bit ASCII being historically the most common and UTF-8 being a current favourite.
If you don't pay attention to the character encoding when reading a string of characters you'll end up in one of three cases: you read the string correctly by luck, you misread the string into nonsense or you fail entirely.
And that's what was happening. In some cases, the URL discovery task driver was finding nonsense URLs in web pages instead of what the author of the web page provided.
That's all been fixed now. If you don't fancy percent-encoding non-ASCII characters in URLS you now can.
I'm starting work today on providing more clear test results. This is part of a larger project to not only present more clear test results but also to make it easier to find what you need in test results. This will lead on to being able to provide an analysis of results at the site level not just the page level.
As always, if you'd like to see web testing you find boring handled automatically for you, add a suggestion or vote up those that interest you. This really helps.
Feedback, thoughts or ideas: email email@example.com, follow @simplytestable or keep an eye on the Simply Testable blog.