PHP and Machine Learning – Untapped Opportunities

Been tweaking around with PHP-ML at Pupa Clic and felt it’s time people recognized PHP as an apt platform for machine learning implementation since its upgrade – PHP 7.

Machine learning algorithms are notoriously CPU intensive. With PHP 7 being touted as twice as fast as its predecessor, is it now practical for simple machine learning tasks? In this post, it is compared to both Java and Python to show why speed isn’t everything.

Despite years of flak, PHP still dominates the server-side scripting language market. Rightly or wrongly, part of its success is that it is a forgiving language that is incredibly accessible. In the right hands, a developer can quickly build all manner of web applications — in the wrong hands, one can quickly make a mess! This flexibility not only has cost at the code level. Internally, PHP has to do a lot of juggling and this is where processor time is lost.

Each minor version of PHP 5 has made small incremental improvements to performance, but PHP 7’s refactored core is something we haven’t seen before. Zend’s infographic shows big improvements for many frameworks and CMSs. These traditional uses for PHP typically hit databases — ideal for PHP 7’s optimised data structures. Most of the time spent on machine learning problems is actual computation so will there be similar returns even when not juggle data types?

The motivation for this post was my wish to run simple classification tasks natively in PHP. In the past, I have used PHP/Java Bridge but it has always felt a bit wobbly. Now I prefer creating a simple web service for PHP to query. This also means the brain can be moved off the web server to a more suitable machine. This disconnect still feels like overkill for simple problems like spam filtering and it adds a point of failure. Would it not be nice to be able to query the model natively rather than through an API? Does PHP 7 now make this feasible?

Methods

To measure PHP 7’s grunt, we will be using the k-nearest neighbours algorithm — probably the most basic classifier there is. It is like sticking a pin in a scatter chart and determining the class by the colours of the surrounding points. In the example below, we assume this point belongs to the red pluses because of the three nearest, two were of this class.

K-nearest neighbour is known as a lazy classifier — there is no model per se — all the work is done when we come to classify a new instance. That is why this classifier is ideal for this experiment. To keep things simple, we’re only looking for the closest neighbour (k = 1). The script will have to calculate the distance to every point in our dataset in n-dimensional space to find the closest match. This means lots of floating point operations and a more efficient language implementation means more CPU time is spent working on our problem rather than itself.

The script loads the Iris dataset from a CSV and then runs leave-one-out cross validation continuously for thirty seconds. The more iterations classified in this time — the better the performance of that language implementation. We don’t care about accuracy so we will be discarding the answer and simply move to the next instance.

The source code for each language is available on GitHub. I have tried to code each in the style of that language rather than a straight port from PHP. Sorry if it’s not the way you would implement it. I am sure there are a myriad of ways to squeeze more performance from each test but we are trying to imitate how someone would reasonably use the languages.

To keep things fair, the benchmarks will be run on the same single-core virtual machine. We are not interested in memory use but the VM only has 1GB of memory available, not that a 150-instance dataset should make much of a dent in that.

Results

Running the benchmarks multiple times showed there is a small variance from one run to the next. For this reason, the exact figures have been omitted from the charts. Feel free to run the code yourself if exact numbers for a toy problem matter to you.

Interpreters

These implementations of the PHP and Python languages are running simply as interpreters. Java can be run in an interpretive only mode but no one would do that in the real world.

 

Here we see PHP 5.6 and Python performed similarly but PHP 7 stands out with a 35% improvement on its predecessor. The Python 3 implementation is marginally slower than Python 2 for reasons beyond the scope of this article.

Just-In-Time compilers

Unlike Zend’s benchmarks where PHP 7 sometimes outperforms HHVM, our benchmarks show HHVM is crunching nearly 30% more instances than PHP 7. If your PHP is CPU-intensive and you’re not using strange extensions, HHVM could help you. PyPy clearly dwarfs HHVM but like HHVM, there are compatibility issues that could drastically slow your development down instead. More on this in the discussion below.

Adding the Java benchmark result to this graph makes it useless. Java is just in another league. Outside VirtualBox, it is pushing nearly one million instances per second!

Discussion

It is clear that any dynamically-typed language, compiled or not, will be vastly slower than something like Java. Hopefully, you’re not surprised by that. Once HotSpot kicks in, the Java code is running much closer to metal.

But as fun as performance benchmarks are, it is clear there is more to the story than speed, or we would all be using assembly language. Anyone who has dabbled with Java will know how it can sometimes feel like walking through molasses compared to dynamically-typed scripting languages. This reminds me of something a lecturer pointed out during the first year of my undergraduate course: developer time is vastly more expensive than CPU time, and this gap is only going to widen further. Using a language or implementation just because it performs better in a benchmark is a form of premature optimisation.

If we use human time as the primary measure of efficiency, it makes sense to use the best performing language that will meet the requirements and that the team can maintain. Unlike the benchmarks above, in the real world one would use a machine learning library and this is where PHP falls flat on its face. There just isn’t a viable machine learning library for PHP. Any advantage one would gain from using the common language of the team goes straight out of the window when they start writing mountains of code from scratch. A Python team would have the luxury of using NumPy and scikit-learn — both of which are pre-compiled so there is a significant performance boost there too.

Until there is a viable machine learning toolkit for PHP, don’t even bother. Even if PHP 7 was faster than the competition, the development time will not be made back in all but the most exceptional circumstances. It’s all about using the right tool for the job and PHP doesn’t even have any tools to offer.

So to answer the question: Is PHP now suitable for machine learning? Practically, no — not until there is a viable library available. For now, if you want to keep your codebase heterogeneous, use a web service like Google’s Prediction API or Amazon Machine Learning.


Originally published at Synthetic Minds on 25th November 2015.

An insight on Google’s I’m Feeling Lucky

The concept of the ”I’m feeling lucky” button has always been very simple. When a user typed in a search query on Google and clicked on the button, it would take them to the page listed on the top of the results!

The idea behind the button was that a user would have to have a dose of confidence that they would be directed to the exact page they were thinking of, or at least a relevant one, in a single try. In other words they would have to feel pretty lucky – hence how the button got it’s name.

On the other hand, ”I’m Feeling Lucky” has also been labeled as a display of Google’s own confidence – that they can take you to the desired page in one try! Today, the way the ”I’m Feeling Lucky” button works depends on a feature called ”Google Instant”. When Google Instant is enabled and working, you really have no time to click on the button, as you start seeing suggestions and results the moment you begin typing a query.

However, if you hover over a suggestion beneath the search box, the ”I’m Feeling Lucky” feature will appear to the right! When Google Instant is disabled the ”I’m Feeling Lucky” button goes back to normal, and behaves the way it previously did! But whether Google instant is enabled or disabled, clicking on the button while the search box is empty always takes the user to the Google Doodles gallery.

The video below elaborates further on the same.

Remove WordPress emoji code without plugin

You would probably find the following code in your WordPress page since the 4.2 update:


window._wpemojiSettings = {"baseUrl":"http:\/\/s.w.org\/images\/core\/emoji\/72x72\/","ext":".png","source":{"concatemoji":"http:\/\/your-url\/wp-includes\/js\/wp-emoji-release.min.js?ver=4.2.1"}};
    !function(a,b,c){function d(a){var c=b.createElement("canvas"),d=c.getContext&&c.getContext("2d");return d&&d.fillText?(d.textBaseline="top",d.font="600 32px Arial","flag"===a?(d.fillText(String.fromCharCode(55356,56812,55356,56807),0,0),c.toDataURL().length>3e3):(d.fillText(String.fromCharCode(55357,56835),0,0),0!==d.getImageData(16,16,1,1).data[0])):!1}function e(a){var c=b.createElement("script");c.src=a,c.type="text/javascript",b.getElementsByTagName("head")[0].appendChild(c)}var f;c.supports={simple:d("simple"),flag:d("flag")},c.supports.simple&&c.supports.flag||(f=c.source||{},f.concatemoji?e(f.concatemoji):f.wpemoji&&f.twemoji&&(e(f.twemoji),e(f.wpemoji)))}(window,document,window._wpemojiSettings);
 
img.wp-smiley,
img.emoji {
    display: inline !important;
    border: none !important;
    box-shadow: none !important;
    height: 1em !important;
    width: 1em !important;
    margin: 0 .07em !important;
    vertical-align: -0.1em !important;
    background: none !important;
    padding: 0 !important;
}

In functions.php (your WordPress theme) just add in the following lines :

remove_action('wp_head', 'print_emoji_detection_script', 7);
remove_action('wp_print_styles', 'print_emoji_styles');

remove_action( 'admin_print_scripts', 'print_emoji_detection_script' );
remove_action( 'admin_print_styles', 'print_emoji_styles' );

Email response vs reputation

Email response rates are signals of your reputation. Under 10%? You’ve got a problem.

You’re only as good as the people you know, that’s why it’s so worth it to make your reputation a priority. Your brand is what people think or say about you when you’re not around. When they see your name in their inbox, think about what you want their first reaction to be, If you follow these principles, your network will naturally widen into a solid foundation.

It’s not just about having another high-caliber LinkedIn connection or the email address of a person. Those are nice to have, sure, but the real win is knowing those people will respond when you need it.

Google I/O 2017 vs Pupa Clic

  • Similar to Google Len’s at Pupa Clic we’ve incorporated real world mapping and image recognition in our Augmented Reality browser Clic AR

  • Translation was incorporated for Pupa Clic’s project Clic View which uses real world image recognition and translation

  • A conversational commerce platform was developed for an e-commerce portal with end to end offline voice recognition

The entire Google I/O Keynote summed up in 10 minutes by Verge

 

Wannacrypt Prevention – Advisory from CERT-In (Government of India)

Over the last few days, many systems globally have been affected by a ransomware named “Wannacrypt”. CERT-In from Government of India, has issued an advisory and precautionary measures for the same.

http://www.cyberswachhtakendra.gov.in/alerts/wannacry_ransomware.html

CERTIn – Vulnerability Note CIVN20170032

Multiple vulnerabilities in Windows SMB
Original Issue Date:March 15, 2017
Severity Rating: HIGH

Software Affected
Windows Vista Service Pack 2 and Windows Vista x64 Edition Service Pack 2
Windows 7 for 32bit
Service Pack 1 and Windows 7 for x64based
Systems Service Pack 1
Windows 8.1 for 32bit
and 64bit
systems
Windows RT 8.1
Windows 10 for 32 bit and 64bit
systems
Windows Server 2012
Windows Server 2012 R2
Windows Server 2008 R2 for x64based
Systems Service Pack 1
Windows Server 2008 SP2 for 32bit
and 64bit
systems (Server Core Installation)
Windows Server 2008 SP1 R2 for64bit
Systems(Server Core Installation)
Windows Server 2008 R2 for Itaniumbased
Systems Service Pack 1
Windows Server 2008 for Itaniumbased
Systems Service Pack 2
Windows Server 2012 (Server Core Installation)
Windows Server 2012 R2(Server Core Installation)
Windows Server 2016 for 64bit
Systems(Server Core Installation)
Windows Server 2016 for 64bit Systems

Overview
Multiple remote code execution vulnerabilities and an Information Disclosure Vulnerability exist in the way that the Microsoft
Server Message Block 1.0 (SMBv1) server handles certain requests which could be exploited by a remote attacker to execute code on the target server.

Description
1. Remote Code Execution Vulnerabilities ( CVE20170143
CVE20170144
CVE20170145
CVE20170146
CVE20170148)
These vulnerabilities exist in the way that the Microsoft Server Message Block 1.0 (SMBv1) server handles certain specially crafted requests. A unauthenticated attacker could exploit these vulnerabilities by sending specially crafted packets to the targeted SMBv1 server, which could lead him to run an arbitrary code.

2. Windows SMB Information Disclosure Vulnerability ( CVE20170147)
This vulnerability exists in the way that the Microsoft Server Message Block 1.0 (SMBv1) server handles certain specially crafted requests. A unauthenticated attacker could exploit this vulnerability by sending a specially crafted packet to a targeted SMBv1 server, which could lead to information disclosure from the server.

Solution

Apply appropriate patches as mentioned in Microsoft Security Bulletin MS17010 Vendor Information
Microsoft
https://technet.microsoft.com/enus/library/security/ms17010.aspx

References
Microsoft
https://technet.microsoft.com/enus/library/security/ms17010.aspx

Cisco
https://tools.cisco.com/security/center/viewAlert.x?alertId=52834
https://tools.cisco.com/security/center/viewAlert.x?alertId=52838
CVE Name
CVE20170143
CVE20170144
CVE20170145
CVE20170146
CVE20170147
CVE20170148
Disclaimer
The information provided herein is on “as is” basis, without warranty of any kind.

Contact Information
Email: info@certin.org.in
Phone: +911124368572

Postal address
Indian Computer Emergency Response Team (CERTIn)
Ministry of Electronics and Information Technology
Government of India
Electronics Niketan
6, CGO Complex, Lodhi Road,
New Delhi 110 003
India

Saturo Global – Redefining Knowledge Management Services

Saturo Global is an online portal where users can engage with the firm to purchase the required knowledge management services such as :

  • Custom Curation and Analytics
  • Patent Search and Analytics
  • Business Research and Analytics
  • Paralegal Support
  • Publishing and Document Management
  • Engagement Models

The platform comprising of a robust internal project management system aides the end user to purchase, monitor and track the work order right from the palm of their handheld device or desktop.

Saturo Global works with scientists, legal and business leaders of global management enterprises integrating knowledge/information from varied sources for trans-formative decisions based on strategic insights. Their expertise spans Life Sciences, Healthcare, Pharmaceuticals, Biotechnology, Physics, Chemistry, Cleantech, Computers & IT, Telecom & Networks, ICT, Digital media and Electrical & Electronics. Over the last 3 years, they have built a strong intellectual team comprising of experienced researchers, search experts and analysts with qualifications from premier educational institutions and work experience in multiple industries.

Know More : https://www.saturoglobal.com/