About this blog
Why write your own blog software?
You don't have to have been on the Internet for long to realise that terms and conditions, available features, and pricing on corporate sites - like LinkedIn, Blogger, Facebook, Medium, or Twitter - can and do change.
I wrote this software so that the journals, news items, and articles I find can be my starting point for my learning - stuff that I read that I can find again, share, and reflect. The software has a mechanism for fetching the title, short piece of text if available, and a picture from the URL of interest. I can then post on the social media platforms a link to my blog entry instead of the original link - no more broken links for people clicking on my posts.
Technologies used:
- URL fetching with PHP - I wrote a PHP class to follow redirects, expanding any URL shortening, and managing cookies until a final URL was reached from a link. It uses Open Graph or Twitter card information (or looks as best it can at the HTML) to find the Title, Desription, and Image representing the web page.
- tf-idf information retrieval - to identify similar blog entries and keywords I have written information retrieval code in python to analyse the term frequency - inverse document frequency of all the blog entries. This shouldn't work well on the blogs as they are all quite short documents but it does a reasonable job. I did this at OnExamination in the 2000s (pre BMJ days) for finding similar MCQs - it was very good at identifying accidental duplicates of questions. The python code then was quite slow so I did it in C++ but the current python libraries are amazing at doing this sort of work. Don't do it in PHP or Javacsript just because you can - use the proper tools for the job.
- PHP MySQL Apache stack - a traditional webserver stack for storing and displaying the blog.