Simply Testable Updates September 2012 #3: Performance++, RSS Feed Support, Test Failure Support
September 19, 2012
|You're receiving this email because you joined Simply Testable's updates list.
This is the ninth of weekly progress updates on the development of Simply Testable, a brand new automated web frontend testing service providing one-click testing for your entire site.
Last week saw the private alpha release at alpha1.simplytestable.com. Over 170,000 HTML validation tests have been carried out to date.
Feedback is very much needed. Does it work, does it not? When carrying out front-end web testing, what do you need to do that I can automate?
At the time it mostly worked (for the subset of the web it can work for) but in some cases it failed. This week has been focussed on making the alpha release fail less:
RSS feed support is a nice feature to add as it opens up the service to sites with no sitemap.xml. There are two common feed formats: RSS and ATOM. The term 'RSS' is commonly used to refer to both. We support RSS at present. ATOM support will come in a short while (it works locally, we need to deploy some changes).
RSS feed support: for sites that have no sitemap.xml, URLs are retrieved from the RSS feed (if present)
Graceful test failure handling: URLs for which HTML validation cannot happen are marked as failed and no longer hold up the rest of the full-site test
Progress and results page improvements: these pages load almost always for larger (1500+ URL) sites, previously they failed most of the time
Graceful test failure handling makes a huge difference. I hadn't previously considered how common it can be for the W3C HTML validator to not be able to validate a page. In cases where this was occurring, the whole full-site test would get stuck (around the 99% mark) and would never finish. That was not good.
Cases where the HTML validator cannot perform a validation are now identified, the relevant HTML validation test marked as failed and the details of why the validation could not happen are presented on the test's results page. Using the wrong character encoding or incorrectly encoding UTF-8 characters are the best ways to make the W3C HTML validator fail.
Performance improvements were made again this week. These addressed progress and results page load times and the performance of dynamically updating test progress.
Rendering the progress page for a 100 page site was taking about 10 seconds. For a 1000 page site this was about 30 seconds. Collecting the set of errors for each test when viewing the results was taking a long time. For a 100 page site this was taking about 20 seconds. For a 1000 page site this was taking about a minute.
Rendering the progress page for a 100 page site now takes less than a second. A 1000 page site now takes about 2 seconds. Collecting the set of errors to present on the results page now takes, for a 100 page site, about half a second. For a 1000 page site it takes about a second.
Dynamically updating the progress of a test was a painful experience for sites of 1000 URLs or more. The full set of 1000 (or more) URLs show on the page would be updated at the same time. This was bad in two ways: it would take about 5 seconds to retrieve all the data that may be needed and then it would freeze the browser for a second or two whilst displaying any changes.
If you are testing a 1000 page site and you're checking for updates every 2 seconds, there's a really really good chance that only a small number of the 1000 HTML validation tests would have completed or changed their state in any way between one check and the next. Changes are now requested for only a small number of tests that have not yet finished.
That covers in one week what I expected to cover in two weeks.
This coming week I will again look at performance for larger and larger sites. I expect I'll need to examine paginating the list of URLs being tested - no matter how quickly the server can generate a page with a list of 10,000 URLs listed it's never going to be a nice page for a user's browser to handle.
I'll also look at gathering URLs for be collected from a site's HTML sitemap page for cases where a site has neither a sitemap.xml file, an RSS feed or an ATOM feed. That'll cover the easy ways of discovering URLs, leaving the hard way of crawling a site to find all the URLs, the work on which won't start until after the October 10 public launch.
Feedback, thoughts or ideas: email email@example.com, follow @simplytestable or keep an eye on the Simply Testable blog.