Struggling with duplicate content in Drupal 7
Tags:
I struggled many hours to improve the technical side of SEO on Vejle Idrætshøjskoles website. Here is what I discovered:
Google Webmaster Tools showed me duplicate content
After reviewing Google Webmaster Tools (under Diagnose --> HTML suggestions) I discovered that I had quite a lot of duplicate content, mainly caused by Views adding order- and sort-parameters to the query, thus making Google index basically the same page twice. Allthough, it seems that Google doesn't think webmasters should make special measures anymore, I do not like having my pages indexed twice.
There is also a discussion at drupal.org which addresses the problem about duplicate content, as I discovered.
Blocking duplicate links in robots.txt
To avoid the links introduced by views being indexed by Google, I entered the following in my robots.txt.
Disallow: /*sort= Disallow: /*order= Disallow: /*page=1
In a week or so, I hope that I will be rid of all the duplicate content created by the extra parameters views adds to the links.
Views should not default to base view
Having setup a Views page on e.g. example.com/nyheder will automatically make requests like example.com/nyheder/34 show the base view url. To avoid this, you need to setup contextual filters on your views. I just leave everything to the basic settings, except I will add a validation rule. If the validation does not pass, I will show a 404. That, I hope, basically solved my biggest problem with duplicate content. And hopefully whis will make Google very happy and will award me some SEO LOVE.
What are you doing to avoid duplicate content?
Kommentarer
Kars-T (ikke efterprøvet)
tor, 12/08/2011 - 17:43
Permalink
Webmastertools
Hi
at https://www.google.com/webmasters/tools/crawl-url-parameters you can tell google how to behave. With this you can ignore a parameter and the problem should be gone. Did you try that? Otherwise I think the robots.txt solution should work well.
DVDB (ikke efterprøvet)
tor, 12/08/2011 - 18:02
Permalink
Once your content is indexed
Once your content is indexed by Google, adding a disallow rule in your robots.txt won't fix your issue. You need to give Google a 404 to force it to remove the content. Another option is to add a noindex, follow metatag
<meta name="robots" content="noindex, follow" />
I'm doing the last one on each project where you have filters, sorting, ...
George Paapdongonas (ikke efterprøvet)
tor, 12/08/2011 - 20:49
Permalink
To avoid duplicate content
To avoid duplicate content issues with Views, you should definitely use rel="canonical"
Google lately also introduced rel="next" and rel="prev" to solve pagination issues with duplicate content.
lsolesen
tor, 12/08/2011 - 20:51
Permalink
How in Drupal 7?
How can you achieve this in Drupal 7 in Views?
lsolesen
tor, 12/08/2011 - 20:50
Permalink
Also remove them from Google
@Kars-T I have seen the possibility to add url-parameters in Google Webmaster Tools. However, I also care about Bing and other search engines. Therefore I think robots.txt is a better solution.
@DVDB I also scheduled them for deletion in Google using Google Webmaster Tools. I would prefer not to use the metatag you suggested, as I wanted the other links to be indexed.
Tim Kamanin (ikke efterprøvet)
fre, 12/09/2011 - 12:27
Permalink
It is enough to set a
It is enough to set a canonical url property in your meta tags. Search engines will ignore and skip the page if canonical url differs from page url.
mujeres (ikke efterprøvet)
fre, 03/30/2012 - 19:50
Permalink
Thanks, this is in every
Thanks, this is in every respect what I need to know.
ukrainian women
Tilføj kommentar