Having Amazon Echo Read Your Local Paper

So I just got my Amazon Echo, I really like it, it’s somewhat limited, but fun none the less. When I got it, I couldn’t wait to write a skill for it, I am very happy to say it’s not that hard, the first thing I wanted it to do was read the top headlines of the Spencer Daily Reporter

Side Note, I am using this for personal use only, I don’t know what rules this breaks, so use it at your own risk.

My town has a local paper, that basically keeps up to date on all the local happenings. Mostly their website is a day or two behind, but I probably wouldn’t read a daily paper anyways.

It didn’t take me to long to get it to work, I started at 7:50 and it’s now 9:10 and it’s working well enough.

So setting up an Alexa App is really simple. I went to https://developer.amazon.com and clicked on Alexa, then I logged in. by clicking at the top right. Click on Apps and Services Then click on Alexa Again. Click on Alexa Skills Kit (An Alexa App is called a Skill). I pressed add a new skill and I was off!
First screen I did looks like this:
screen1

Yours will be similar, make sure your url is https://

Push Next this is where you add your intent an Intent is basically saying it’s something you intend to do, this is fairly bare bones:

Screen Shot 2015-08-20 at 9.15.23 PM

Push Next

Make sure you have a server that has a cert installed, Not a wildcard, I was not able to make that work, the good news is the same site can be used for multiple projects.

Screen Shot 2015-08-20 at 9.16.35 PM

Push Next

There we go, our Alexa App now “Just Works” on our Amazon Echo:

Screen Shot 2015-08-20 at 9.17.40 PM

Now it’s time to code it up! I can write in a simple language, nothing new to learn, I wrote mine in PHP.

I will break it down:

<?php
//first get the rss feed
$data = file_get_contents("php://input");
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.spencerdailyreporter.com/feed/rss/news/week.rss");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
$news= new SimpleXMLElement($output);

So, when the Echo does a post to you it does so from Amazon (your request is proxied at Amazon), so we need to read the full input (don’t worry it’s JSON), for this app, I don’t really care what the person said or intents or anything because if this app gets called, I am just going.

The daily reporter has a RSS feed!!! We can just read that and load it into a XML doc

$report_title = array();
$report_date = array();
$report_link = array();
foreach ($news->channel->item as $item) {
        #print_r($item->title);
        array_push($report_title,$item[0]->title);
        array_push($report_date,$item[0]->pubDate);
        array_push($report_link,$item[0]->link);
}

Now we are simply parsing the XML from the RSS, nothing to see here, move along

$fulltext = "";
for ($i =0;$i<=3;$i++) {
        $fulltext .= getArticle($report_link[$i],$report_title[$i],$report_date[$i]);
}

All I want is the top 3 articles, adjust at will, but don’t make it too big as you can exceed the size the echo allows (24576 bytes)

function getArticle($link,$title,$date) {
	$ch = curl_init(); 
	curl_setopt($ch, CURLOPT_URL, $link); 
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	$t = microtime(); 
	$tmpfname = '/tmp/$t';
    	curl_setopt($ch, CURLOPT_COOKIEJAR, $tmpfname);
    	curl_setopt($ch, CURLOPT_COOKIEFILE, $tmpfname);
	$rand1 = rand (1,10000000);
	$rand2 = rand (1,99999999);
	$rand3 = rand (1,9999999999);
 
	curl_setopt($ch,CURLOPT_USERAGENT,"Mozilla/5.0 (Windows; U; Windows NT $rand1.$rand2; en-US; rv:1.8.1.13) Gecko/$rand3 Firefox/2.0.0.13");
	$output = curl_exec($ch); 
	curl_close($ch);  
	$article = "";
	$start = 0;
	$lines =explode("\n",$output);
	foreach($lines as $line) {
		if (strstr($line,"</div>") && $start == 1) {
			if ($title == "Sirens") {
				if (strstr($line,"class=\"third\"")) {
					$start = 0;
				}
			} else {
				$article .=$line ." ";
				$start = 0;
			}
		}		
		if ($start == 1) {
			$article .=$line ." ";
		}
		if (strstr($line,"<div class=\"text\">")||strstr($line,"<div class=\"storyhead1\">")) {
			$start = 1;
		}
	}
	$article=strip_tags($article);
	$article = str_replace("googletag.display('dfp_unit_SDR_Rectangle_News');        googletag.display('dfp_unit_SDR_Rectangle2_Other');       \r \r \r \r \r","",$article);
	$article = str_replace("&copy; 2015 Spencer Daily Reporter  Contact Us Terms of Service Media Partners Search","",$article);
	$article = str_replace(" Home  News Sports Opinion Records Blogs Classifieds Calendar             ,","",$article);
	$out = "$date,$title,$article,";
	$out = escapeJsonString($out);
	$out = str_replace("p.m.","PM",$out);
	$out = str_replace("a.m.","PM",$out);
	return $out;
 
}

There is a TON going on in the above code, Let me break it down a bit:

	$ch = curl_init(); 
	curl_setopt($ch, CURLOPT_URL, $link); 
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	$t = microtime(); 
	$tmpfname = '/tmp/$t';
    	curl_setopt($ch, CURLOPT_COOKIEJAR, $tmpfname);
    	curl_setopt($ch, CURLOPT_COOKIEFILE, $tmpfname);
	$rand1 = rand (1,10000000);
	$rand2 = rand (1,99999999);
	$rand3 = rand (1,9999999999);
 
	curl_setopt($ch,CURLOPT_USERAGENT,"Mozilla/5.0 (Windows; U; Windows NT $rand1.$rand2; en-US; rv:1.8.1.13) Gecko/$rand3 Firefox/2.0.0.13");
	$output = curl_exec($ch); 
	curl_close($ch);

This is the curl request, to actually get the data I struggled with this as it looks like the reporter limits your number of monthly queries, but it’s not based on IP or a cookie, It does a browser fingerprint…. I HATE browser fingerprints, it makes me feel like a person on the internet… EWWWW.. This is the code I was seeing:

Screen Shot 2015-08-20 at 8.31.34 PM

So I figured it out, if I altered the user agent, BOOM, I could load unlimited articles. I am sure there is much more pretty ways to do this, but this is what I got!

	$article = "";
	$start = 0;
	$lines =explode("\n",$output);
	foreach($lines as $line) {
		if (strstr($line,"</div>") && $start == 1) {
			if ($title == "Sirens") {
				if (strstr($line,"class=\"third\"")) {
					$start = 0;
				}
			} else {
				$article .=$line ." ";
				$start = 0;
			}
		}		
		if ($start == 1) {
			$article .=$line ." ";
		}
		if (strstr($line,"<div class=\"text\">")||strstr($line,"<div class=\"storyhead1\">")) {
			$start = 1;
		}
	}

This code is some specific parsing for different kinds of articles, it basically works, might take some tweaking, that is what is so great about an Alexa App, I can change one thing and it’s rolled out everywhere!

	$article=strip_tags($article);
	$article = str_replace("googletag.display('dfp_unit_SDR_Rectangle_News');        googletag.display('dfp_unit_SDR_Rectangle2_Other');       \r \r \r \r \r","",$article);
	$article = str_replace("&copy; 2015 Spencer Daily Reporter  Contact Us Terms of Service Media Partners Search","",$article);
	$article = str_replace(" Home  News Sports Opinion Records Blogs Classifieds Calendar             ,","",$article);
	$out = "$date,$title,$article,";
	$out = escapeJsonString($out);
	$out = str_replace("p.m.","P M",$out);
	$out = str_replace("a.m.","A M",$out);
	return $out;

This is just some cleanup for some stuff I was too lazy to fix in the parser.

function escapeJsonString($value) { # list from www.json.org: (\b backspace, \f formfeed)
    $escapers = array("\\", "/", "\"", "\n", "\r", "\t", "\x08", "\x0c");
    $replacements = array("\\\\", "\\/", "\\\"", "\\n", "\\r", "\\t", "\\f", "\\b");
    $result = str_replace($escapers, $replacements, $value);
    return $result;
}

Don’t flame me, I know I could have used json_encode, I know that, but I did a function instead to make the return response JSON compliant, I may change this some day.

header('Content-Type: application/json;charset=UTF-8');
 
$text = '{
    "version" : "1.0",
    "response" : {
        "outputSpeech" : {
            "type" : "PlainText",
            "text" : "'.$fulltext.'"
        },
        "shouldEndSession" : true
    }
}';
 
 
header('Content-Length: ' . strlen($fulltext));
echo $text;

This is the output that the Echo expects, it’s some JSON that returns all the data and the size.

So how does it work? You tell me:

Attached is the source code

This is John Signing off.

recorder.zip

Leave a Reply

Your email address will not be published. Required fields are marked *