When Yandex Behaves Oddly: Discovering Spider Bots with SYSLOG-NG and Striim

4 Minute Read

F. Clark, SOC analyst, Striim

Domo arigato, Mr. Roboto,
Mata ah-oo hima de
Domo arigato, Mr. Roboto,
Himitsu wo shiri tai

Every once in a while an analyst is called upon to look at things that are unique, that pique the interest and make the day go by faster. (For other days, there is coffee.) As the web server admin approached my desk, I was just about to put on another pot of coffee. I decided to hold off.

Our admin came to me with a bit of a puzzle. Although there are a myriad of well-known robots spidering our website daily, there was one that he was not happy with. It was called Yandex, a Russian search engine. Given the amount of malware and other less-than-wanted things coming out of Russian networks, the admin was concerned about this indexer accessing our website. With a grin a mile wide, I set aside the coffee and reached for my green tea with honey, and responded with a heartfelt, “I’m on it.”

FIRST STEPS

The first thing I needed to do was to get our web logs into a place where I could use Striim to analyze them. At my request, the admins had implemented SYSLOG-NG and were using a central repository for all of our logs which made it far easier to access them using Striim. Our primary and backup web servers in production resided in /var/log/www-prod-1 and /var/log/www-prod-2 on the central logging system. From there all I had to do was get them into Striim and we could start having fun. From the UI I whipped up a pair of text readers and configured them to take data from the access logs from both production servers.

The next step was to parse the log files so that the information from the log files was organized into fields that could be processed. A little wave of the REGEX wand and we had both logs parsed, and combined into a single flow for Striim to analyze.


From there, I next created a dashboard to show me just the information related to the web requests from Yandex. This would give me a clean and up-to-date view of the data I needed in real time. A quick TQL query combined with a table and I was on my way!

 


Immediately the information started flowing in. At first, it looked like just normal traffic one would expect from a indexing spider bot. Sure enough, however, my keen eyes spotted something that was not quite right.

As Dorothy Parker would say, “What fresh hell is this?” The spider was making a GET request of the search function of our website! This is not normal behavior for a spider if all it was doing was indexing our site. A little analyst magic performed on that request revealed it was using our own site search feature to look for “Fun HB Slot Machine” and the domain qpyl18.com. A quick check of this domain showed that it was [protected by Cloufflare ( Hi Otto! )], but that the origin server was having issues. A quick check of the IP addresses involved and my spidey-analyst senses were tingling.

The next burning question was how often was this happening, and at what volume? Back to the dashboard I went! I altered my query to show me the requests that were performing the internal searches, and was quickly rewarded with the information I was looking for:

 


Not only was this happening, but it was happening frequently.

 


I configured Striim to keep a watch on this, making it part of my overall security application, and using customized queries to create indicators of how many instances in a day, average number of instances over a week, and a special dashboard page with alerts to let me know if it got out-of-hand.

Like any good analyst, I gathered data for 60 days and then presented it to the web admin, and we both decided this was not something we wanted on our network. A few adjustments to the web server, firewall, and IDS, and we were off for a celebratory lunch.

The ease of use, speed, and myriad of tools along with the flexibility of Striim allowed me and the web admin to quickly and efficiently acquire, process, enrich and report the data on the unusual traffic, and create an environment where any of the shift analysts could keep an eye on the activity, both streaming in real time and stored for historical purposes.

So you want to empower your analysts with tools like this? Request a demo today. We will be happy to guide you through all of the features of Striim and help you improve your security footprint.