Struggling with duplicate content in Drupal 7

1 minutters læsning

I struggled many hours to improve the technical side of SEO on Vejle Idrætshøjskoles website. Here is what I discovered:

Google Webmaster Tools showed me duplicate content

After reviewing Google Webmaster Tools (under Diagnose –> HTML suggestions) I discovered that I had quite a lot of duplicate content, mainly caused by Views adding order- and sort-parameters to the query, thus making Google index basically the same page twice. Allthough, it seems that Google doesn’t think webmasters should make special measures anymore, I do not like having my pages indexed twice.

There is also a discussion at drupal.org which addresses the problem about duplicate content, as I discovered.

To avoid the links introduced by views being indexed by Google, I entered the following in my robots.txt.

Disallow: /*sort=
Disallow: /*order=
Disallow: /*page=1

In a week or so, I hope that I will be rid of all the duplicate content created by the extra parameters views adds to the links.

Views should not default to base view

Having setup a Views page on e.g. example.com/nyheder will automatically make requests like example.com/nyheder/34 show the base view url. To avoid this, you need to setup contextual filters on your views. I just leave everything to the basic settings, except I will add a validation rule. If the validation does not pass, I will show a 404. That, I hope, basically solved my biggest problem with duplicate content. And hopefully whis will make Google very happy and will award me some SEO LOVE.

What are you doing to avoid duplicate content?

Kommentarer

Kars-T

Hi at https://www.google.com/webmasters/tools/crawl-url-parameters you can tell google how to behave. With this you can ignore a parameter and the problem should be gone. Did you try that? Otherwise I think the robots.txt solution should work well.

DVDB

Once your content is indexed by Google, adding a disallow rule in your robots.txt won’t fix your issue. You need to give Google a 404 to force it to remove the content. Another option is to add a noindex, follow metatag <pre style="word-wrap: break-word; white-space: pre-wrap; "> <meta name="robots" content="noindex, follow" /> I’m doing the last one on each project where you have filters, sorting, …

lsolesen

@Kars-T I have seen the possibility to add url-parameters in Google Webmaster Tools. However, I also care about Bing and other search engines. Therefore I think robots.txt is a better solution. @DVDB I also scheduled them for deletion in Google using Google Webmaster Tools. I would prefer not to use the metatag you suggested, as I wanted the other links to be indexed.

Skriv en kommentar

Din e-mail bliver ikke offentliggjort. Obligatoriske felter er markeret *

Indlæser...