Web Crawling, Click Fraud Detection, and Coffee. Oh My!

CAYK’s Current Research & Development Project (Part 1) 

11,407,243 – This is our current count of websites that have participated in the Google Ad Network to date. (Websites that allow Google to place their ads on them.)

Now that you know that, how do you pick which websites to show your advertisements on? That’s a big number after all, and it’s the only representative of the ad placements that we’ve come across during this R&D project. After all, Google isn’t publishing that list anytime soon, so we’re left to build the list without their help.

It’s not difficult to rule out large portions of available ad placements with the tools provided within the ad management platform. However, even a small targeting change often wipes away millions of domains without actually knowing with any degree of certainty whether or not they would have otherwise brought value to a client’s advertising campaign. Worse is that the placements that remain after targeting updates haven’t actually been vetted. While this is the nature of the industry, it’s certainly not good enough for us.

Data-Drive Marketing vs Research and Development
Where lies the line between the two?

By the very nature of digital advertising, every online advertising decision is bordering on a big data project. Selecting advertising placements is a great example of this phenomenon, with millions of websites that can be targeted, how do you decide which are worthy of advertising funds, and which are not? It may be a bold approach, but we’re working towards vetting each one individually.

It starts with tracking down domains participating in the ad network. This means first crawling as many sites as you can and gathering previous crawl data wherever you can find it. Of the 248,555,113 domains in our overall database (248.5 million!), approximately 4.6% have had tell-tale embed scripts or DNS calls that show participation in Google’s Ad Networks.

The scope required to do anything on this scale has led to a long journey of learning with the singular purpose of doing the best job we possibly can for our clients. It’s been actively worked on (nearly every single day, (weekends too!), for over a year! In our upcoming mini-series, we will break down some of the key milestones of this project, speak to the many hurdles and challenges which needed to be surmounted, as well as help explain how these learnings are already bringing value to our clients!