Does it Still Make Sense to Write How to Advice Now that Google Scrapes it?

Tadeusz Szewczyk
by Tadeusz Szewczyk | Last Updated Apr. 15th, 2015 5 COMMENTS

When you ask Google a question like [how to boil eggs] you won’t see search results on top, sometimes not even ads. Google will often provide the answer right away.

All the years of telling webmasters to “just create great content” do not sound like very good advice anymore.

Is there a way out of the dilemma? How to still get found on Google without giving everything away?

Think outside the Google answer box


Some short sighted old school SEO practitioners advise you these days to achieve “Total SERP domination” by optimizing your content for the so called Google answer box. This box basically scrapes your content from your site and displays it right on Google so that search users do not even need to click through to your site with queries like

  • how to boil eggs
  • treat sunburn
  • how to make apple pie

Why would you want to dominate that? I’m not sure but being eager to support Google might be a reason when you own enough of their shares. Another reason might be: give Google all your great content so maybe there will be some leftover traffic for you?

Google is effectively cutting out and ripping off the middleman, that is you the

  • content creator
  • online publisher
  • website owner

For most of these people optimizing for the total content loss seems like a very short term business model.

The great Google knowledge grab

Google disowns content creators and publishers of all types. Both book authors and news publishers fought Google for years in order not to let them monetize their content without remuneration to no avail.

Google Books and Google News force publishers out of business because they neither need to buy books anymore nor view ads on newspaper sites. They can just get a quick overview on Google services and move on.

Only a small percentage of users needs to look up more than Google offers and to visit the original source to read in depth or offline.

Then Google started to grab images for its image search so that publishers don’t get much traffic anymore. Now everybody can get your images straight from Google. There is not much left for Google beside the actual content on all other websites. They are working on that and providing more cases where they do take third party content and place it on Google. You don’t need have a PhD in economics to understand what that means. Google is actually taking your money.

Blocking Google?

There is of course a radical measure you can take to stop Google from grabbing your knowledge and monetizing it in case you want to earn money by publishing yourself. Many photographers do it by now. News publishers have at least fought for the right to “opt out” of Google News. Some of them even went as far as blocking Google search altogether or at least setting up paywalls. News Corp has been the foremost of them.

Most publishers are frantically trying to find a compromise.

They are shielding only parts of the content and give away enough fodder for Google to stay in the index.

Image copyright owners tried to use technical means of reminding users to click through to the original sources but Google started penalizing them for doing that a year ago. The “image mismatch Google penalty” makes sure you don’t try to bring Google searchers back to your site when they are looking at your images.

Are you accidentally optimizing for the Google scraper?

Taking a look at the examples where Google already grabs your knowledge without sending through visitors I noticed some technical exceptions you don’t necessarily see on all pages. Google seems to exploit HTML 5 markup intended for internal use only for example. Instead they will use it to identify the content piece to steal from you.

Just take a look what the BBC pages that do get scraped by Google use internally on their recipe pages: I tested with [how to make apple pie] where this page got scraped for the onpage content on Google. The BBC uses HTML 5 data attributes that are

are not intended to compete with microformats. It is clearly stated in the spec that the data is not intended to be publicly usable. External software should not interact with it.

They look something like that:

data-title=”Proper apple pie” data-appid=”food” data-type=”recipe”

So clearly Google doesn’t play by the rules here once again. This data is by no means intended to be hijacked by Google to steal content. In case you want to give all your content away for free to Google to monetize you need to use so called microformats.

In case you don’t want Google to scrape your how to advice automatically I strongly suggest you should remove HTML 5 data attributes and not to use either. Most sites don’t and only some experts very fond of Google tell people to do so.

Practical solutions without removing your content from Google entirely

Most of you will neither want to block Google completely as the dependency is too strong nor do optimize for data attributes or microformats. So you still need a solution other than not publishing how to advice anymore at all.

I have come up with some simple practical solution worth trying out. I haven’t tested them myself yet so I’d be glad if some of you would do that and add some feedback whether they worked at least to some extent.

1. Removing essential items from bulletet lists

Readers will have to click through then because some details will remain unclear just from looking at the scraped content on Google.

2. Adding CTAs to your bulleted content

Enticing searchers with additional calls to action in the list itself like “click the link below this bullet point to see pictures and examples”.

3. Adding context within bullet points

Google can’t copy and paste your whole website content, they just try to cite the mots important items. Try to make the list items part of the whole page so that they sound incomplete without clicking through. You could add directions for example like “as seen in chapter 5” or “as the picture above shows”.

Desperate measures

The probably easiest way to get rid of the Google content theft problem as of now seems to be not using bulleted lists at all. At least not lists that look like lists in the HTML code. You can style your

  1. span
  2. p
  3. div

elements almost any way you want so replacing the HTML list element may be the desperate measure to take that helps. Maybe removing

  1. “how to”
  2. “tutorial”
  3. “guide”

from your headline would be helpful too. So it still does make sense to provide how to advice for people but you have to consider what happens when Google grabs it from your site and monetizes the content instead of you.