So I just got my Amazon Echo, I really like it, it’s somewhat limited, but fun none the less. When I got it, I couldn’t wait to write a skill for it, I am very happy to say it’s not that hard, the first thing I wanted it to do was read the top headlines of the Spencer Daily Reporter
Side Note, I am using this for personal use only, I don’t know what rules this breaks, so use it at your own risk.
My town has a local paper, that basically keeps up to date on all the local happenings. Mostly their website is a day or two behind, but I probably wouldn’t read a daily paper anyways.
It didn’t take me to long to get it to work, I started at 7:50 and it’s now 9:10 and it’s working well enough.
So setting up an Alexa App is really simple. I went to https://developer.amazon.com and clicked on Alexa, then I logged in. by clicking at the top right. Click on Apps and Services Then click on Alexa Again. Click on Alexa Skills Kit (An Alexa App is called a Skill). I pressed add a new skill and I was off!
First screen I did looks like this:
Yours will be similar, make sure your url is https://
Push Next this is where you add your intent an Intent is basically saying it’s something you intend to do, this is fairly bare bones:
Push Next
Make sure you have a server that has a cert installed, Not a wildcard, I was not able to make that work, the good news is the same site can be used for multiple projects.
Push Next
There we go, our Alexa App now “Just Works” on our Amazon Echo:
Now it’s time to code it up! I can write in a simple language, nothing new to learn, I wrote mine in PHP.
I will break it down:
<?php //first get the rss feed $data = file_get_contents("php://input"); $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "http://www.spencerdailyreporter.com/feed/rss/news/week.rss"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $output = curl_exec($ch); curl_close($ch); $news= new SimpleXMLElement($output); |
So, when the Echo does a post to you it does so from Amazon (your request is proxied at Amazon), so we need to read the full input (don’t worry it’s JSON), for this app, I don’t really care what the person said or intents or anything because if this app gets called, I am just going.
The daily reporter has a RSS feed!!! We can just read that and load it into a XML doc
$report_title = array(); $report_date = array(); $report_link = array(); foreach ($news->channel->item as $item) { #print_r($item->title); array_push($report_title,$item[0]->title); array_push($report_date,$item[0]->pubDate); array_push($report_link,$item[0]->link); } |
Now we are simply parsing the XML from the RSS, nothing to see here, move along
$fulltext = ""; for ($i =0;$i<=3;$i++) { $fulltext .= getArticle($report_link[$i],$report_title[$i],$report_date[$i]); } |
All I want is the top 3 articles, adjust at will, but don’t make it too big as you can exceed the size the echo allows (24576 bytes)
function getArticle($link,$title,$date) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $link); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $t = microtime(); $tmpfname = '/tmp/$t'; curl_setopt($ch, CURLOPT_COOKIEJAR, $tmpfname); curl_setopt($ch, CURLOPT_COOKIEFILE, $tmpfname); $rand1 = rand (1,10000000); $rand2 = rand (1,99999999); $rand3 = rand (1,9999999999); curl_setopt($ch,CURLOPT_USERAGENT,"Mozilla/5.0 (Windows; U; Windows NT $rand1.$rand2; en-US; rv:1.8.1.13) Gecko/$rand3 Firefox/2.0.0.13"); $output = curl_exec($ch); curl_close($ch); $article = ""; $start = 0; $lines =explode("\n",$output); foreach($lines as $line) { if (strstr($line,"</div>") && $start == 1) { if ($title == "Sirens") { if (strstr($line,"class=\"third\"")) { $start = 0; } } else { $article .=$line ." "; $start = 0; } } if ($start == 1) { $article .=$line ." "; } if (strstr($line,"<div class=\"text\">")||strstr($line,"<div class=\"storyhead1\">")) { $start = 1; } } $article=strip_tags($article); $article = str_replace("googletag.display('dfp_unit_SDR_Rectangle_News'); googletag.display('dfp_unit_SDR_Rectangle2_Other'); \r \r \r \r \r","",$article); $article = str_replace("© 2015 Spencer Daily Reporter Contact Us Terms of Service Media Partners Search","",$article); $article = str_replace(" Home News Sports Opinion Records Blogs Classifieds Calendar ,","",$article); $out = "$date,$title,$article,"; $out = escapeJsonString($out); $out = str_replace("p.m.","PM",$out); $out = str_replace("a.m.","PM",$out); return $out; } |
There is a TON going on in the above code, Let me break it down a bit:
$ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $link); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $t = microtime(); $tmpfname = '/tmp/$t'; curl_setopt($ch, CURLOPT_COOKIEJAR, $tmpfname); curl_setopt($ch, CURLOPT_COOKIEFILE, $tmpfname); $rand1 = rand (1,10000000); $rand2 = rand (1,99999999); $rand3 = rand (1,9999999999); curl_setopt($ch,CURLOPT_USERAGENT,"Mozilla/5.0 (Windows; U; Windows NT $rand1.$rand2; en-US; rv:1.8.1.13) Gecko/$rand3 Firefox/2.0.0.13"); $output = curl_exec($ch); curl_close($ch); |
This is the curl request, to actually get the data I struggled with this as it looks like the reporter limits your number of monthly queries, but it’s not based on IP or a cookie, It does a browser fingerprint…. I HATE browser fingerprints, it makes me feel like a person on the internet… EWWWW.. This is the code I was seeing:
So I figured it out, if I altered the user agent, BOOM, I could load unlimited articles. I am sure there is much more pretty ways to do this, but this is what I got!
$article = ""; $start = 0; $lines =explode("\n",$output); foreach($lines as $line) { if (strstr($line,"</div>") && $start == 1) { if ($title == "Sirens") { if (strstr($line,"class=\"third\"")) { $start = 0; } } else { $article .=$line ." "; $start = 0; } } if ($start == 1) { $article .=$line ." "; } if (strstr($line,"<div class=\"text\">")||strstr($line,"<div class=\"storyhead1\">")) { $start = 1; } } |
This code is some specific parsing for different kinds of articles, it basically works, might take some tweaking, that is what is so great about an Alexa App, I can change one thing and it’s rolled out everywhere!
$article=strip_tags($article); $article = str_replace("googletag.display('dfp_unit_SDR_Rectangle_News'); googletag.display('dfp_unit_SDR_Rectangle2_Other'); \r \r \r \r \r","",$article); $article = str_replace("© 2015 Spencer Daily Reporter Contact Us Terms of Service Media Partners Search","",$article); $article = str_replace(" Home News Sports Opinion Records Blogs Classifieds Calendar ,","",$article); $out = "$date,$title,$article,"; $out = escapeJsonString($out); $out = str_replace("p.m.","P M",$out); $out = str_replace("a.m.","A M",$out); return $out; |
This is just some cleanup for some stuff I was too lazy to fix in the parser.
function escapeJsonString($value) { # list from www.json.org: (\b backspace, \f formfeed) $escapers = array("\\", "/", "\"", "\n", "\r", "\t", "\x08", "\x0c"); $replacements = array("\\\\", "\\/", "\\\"", "\\n", "\\r", "\\t", "\\f", "\\b"); $result = str_replace($escapers, $replacements, $value); return $result; } |
Don’t flame me, I know I could have used json_encode, I know that, but I did a function instead to make the return response JSON compliant, I may change this some day.
header('Content-Type: application/json;charset=UTF-8'); $text = '{ "version" : "1.0", "response" : { "outputSpeech" : { "type" : "PlainText", "text" : "'.$fulltext.'" }, "shouldEndSession" : true } }'; header('Content-Length: ' . strlen($fulltext)); echo $text; |
This is the output that the Echo expects, it’s some JSON that returns all the data and the size.
So how does it work? You tell me:
Attached is the source code
This is John Signing off.