For a long time now, I was under the impression that given Archived content is not included in widgets, it would also not be included in external search. In testing the possibilities of a search tool, I’ve learned that isn’t the case.
Example: Google Search. (Our former president’s profile should be Hidden and not just Archived, more generally, but its a good example in this moment.)
It makes sense in hindsight, particularly for timed content like News/Stories, or with site search using widgets. However, we have some things where we’d like items to be out of widgets, out of search engines, but still be accessible by users with the link.
I’m interested in potentially pursuing some of the following:
If any item is Archived, then append NoIndex to it.
If an item of a developer defined data type (i.e. blurbs, profiles) is Archived, then append NoIndex.
If an item of a blurb/profile type (i.e. staff) is Archived, then append NoIndex.
If an item is Archived, then signal that to the search tool somehow. (Fore widget-like behavior.)
For these, I need a way to determine if an individual item is Archived in a module context, such as in an “onOutput” handler. Does anyone have an example of how I may be able to do that?
From there, I believe I can write the logic to to run $_LW->appendMetaTag to append NoIndex directly, or add a custom meta tag that only this search tool will use.
Lol. We were literally just talking about this same thing this morning.
Our news team was under the impression that archiving content removed it from the web, but I explained that it only removes it from widgets. Google and others still know about it.
I suggested we do one/both of the following:
Add a disclaimer at the top of archived stories stating that it is old and may not contain accurate information, etc.
Append a noindex tag to archived content.
Either option requires the ability to know that content is archived, so I had the same question about how to determine that.
Archived items (and past events) should be universally set to NoIndex
An config option be added to toggle between current behavior and the above (in case the current arrangement is preferred for some)
Assuming the above has no granularity/exceptions, then anyone wanting more specific behavior (i.e. everything but stories) can toggle universal off and make their own solution.
Add the option to also set noindex on past events, with a configurable timeframe
This just came up in another context yesterday afternoon. Someone found a couple of very old news releases in Google with a phone number on them that now belongs to a different office. They want the phone number removed, but it is the policy of our News office to not edit archival/historical news. The stories have been archived for years. We just need them to not be available in Google anymore.
Because of the amount of attention this topic seems to be getting on campus lately, I am probably going to have to write something. I’m just trying to figure out how to identify content as being archived. I know that is stored in the database, so presumably it shouldn’t be too difficult to access that in a module.
I’m assuming this is a fairly universal module – preventing certain profile/blurb types from ever being indexed in any state – but it might have a clue for your efforts:
<?php
$_LW->REGISTERED_APPS['noindex']=array(
'title'=>'NoIndex',
'handlers'=>array('onOutput'),
'custom'=>array(
'types'=>array(
// Add comma-separated profile type IDs to the following line to make their individual pages NoIndex
'profiles'=>array(27,33,55,61,66,75),
'blurbs'=>array(16,22,52)
)
)
);
class LiveWhaleApplicationNoindex {
public function onOutput($buffer) { // on page output
global $_LW;
if (!empty($_LW->REGISTERED_APPS['noindex']['custom']['types']) && is_array($_LW->REGISTERED_APPS['noindex']['custom']['types'])) { // if noindex types are configured
$config=$_LW->REGISTERED_APPS['noindex']['custom']['types'];
if (!empty($_LW->details_module) && isset($config[$_LW->details_module]) && is_array($config[$_LW->details_module]) && !empty($GLOBALS[$_LW->details_module.'_tid']) && in_array($GLOBALS[$_LW->details_module.'_tid'], $config[$_LW->details_module])) { // if on a details page for one of the noindex types
$_LW->appendMetaTag(array('name'=>'robots', 'content'=>'noindex, nofollow'));
};
};
return $buffer;
}
}
?>
I have a proof of concept for archived News stories working on our Dev server.
<?php
$_LW->REGISTERED_APPS['news_archives']=[
'title'=>'News Archives',
'handlers'=>['onLoad'],
];
class LiveWhaleApplicationNewsArchives {
public function onLoad() {
global $_LW;
// if this is a LiveURL news request
if (!empty($GLOBALS['LIVE_URL']['REQUEST_URI']) && strpos($GLOBALS['LIVE_URL']['REQUEST_URI'], '/live/news/')===0) {
// get request info
$request=explode('-', $GLOBALS['LIVE_URL']['REQUEST'][0]);
if (is_numeric($request[0])) {
// If this story is archived
if ($_LW->dbo->query('select', 'is_archived', 'livewhale_news', 'livewhale_news.id='.(int)$request[0]) ->firstRow('is_archived')->run() == 1 ) {
// Add noindex meta tag
$_LW->appendMetaTag(array('name'=>'robots', 'content'=>'noindex, nofollow'));
// Add a global variable for use with XPHP
$GLOBALS['is_archived']=1;
}
};
};
}
}
?>
When a News story is loaded (onLoad) the module checks the database to see if that story is archived. If so, it appends a noindex meta tag and creates a global is_archived variable. That variable can be used in the details template to, for example, add a disclaimer at the top of archived stories:
We are meeting next week to decide exactly what direction we want to take, so I am not adding this to Prod yet. In the meantime, I’d appreciate any feedback from others.
Looks good. I tried poking around with it, seeing if I could make it work similar to the existing “noindex” module, where you simply need to define the type (news, events, etc) and it works. Current issues that I can’t linger with and solve at the moment:
$_LW->details_module doesn’t seem to exist yet in onLoad()?
Thus, not sure how to confirm on details page, get current item’s type to compare, then current item’s ID for the query. (Rather not do URL parsing, unless absolutely required.)
Rough idea building from your code:
// Check if on any details page, continue.
// Define current page's type as $details_type.
// Compare $details_type with 'custom' list for match, continue.
// Define current item's ID as $details_id.
// Broken out query fragments for readability / understanding.
$table = $_LW->getTableForDataType($details_type);
$action = 'select';
$object = $table.'.is_archived'; // may not need table, but habit from join queries
$from = $table;
$where = $table.'.id='.(int)$details_id;
$order = '';
if ($_LW->dbo->query($action, $object, $from, $where, $order)->firstRow('is_archived')->run() == 1 ) {
// define variable for XPHP to act on archived status
$GLOBALS['is_archived']=true; // I prefer the boolean, not sure if it matters
// set noindex and define variable for XPHP
$_LW->appendMetaTag(array('name'=>'robots', 'content'=>'noindex, nofollow'));
$GLOBALS['made_noindex']=true; // may be useful in a similar manner as 'is_archived'
}
Thanks @mischlern for the feedback. I think we only need this for stories at the moment. We don’t really archive anything else unless we are also hiding it. Our News office always sets all Stories they write to automatically archive after 2 years. I wish we could do the same with Events, but it doesn’t look like you can schedule Events to expire/archive. So I may create a similar module to noindex events based on their date instead of, or in addition to, archive status.
We met about this today and here is what was decided:
We will be adding a disclaimer to archived stories like my screenshot above.
We decided not to add noindex to archived stories. Our news office wants them to remain findable, as long as they have the disclaimer.
We do want to add noindex to past events. And we decided to do so immediately after the event has passed.
I guess this highlights the fact that there is no one-size-fits-all approach to this stuff. @karl, if you are listening, if LiveWhale was to consider adding any of these features, it makes sense for it to be opt-in and conifgurable.