2011-10-26

Our own Chef cookbook for Ceph

My first task at Ragnarson was to deploy new distributed file system - Ceph. The only proper way of deploying software at our servers is by using Chef. When I started i didn't know anything about Chef or Ceph, but it wasn't so hard after all.

In Ceph architecture we can distinguish three types of nodes:
  • mds - metadata server daemon. It's crucial to has at least one mds node, its role is to coordinate access to osd nodes, cache and manage metadata. Our cluster has one "primary" mds. Each additional node is on 'standby', which means that we always have backup mds nodes ready to deploy.
  • mon - monitoring daemon. We need one, three or other odd number of monitoring nodes. It is closely related to Paxos algorithm, used to achive consensus among distributed systems. Monitor manages cluster map.
  • osd - object storage daemon. Basically it holds data, having at least two osd nodes is required. Data distribution is described in CRUSH map. At this moment our cookbook provides CRUSH only for data redundanc.
As NewDream developers promise, Ceph should offer:
  1. "Distributed file system, easily accessed via kernel client or fuse driver". Fuse driver works great, kernel client not so much. We've had some random crashes, but even Ceph wiki confirms that fuse driver is more stable and it's recommended one.
  2. "Object storage - Clients talk directly with storage nodes to store named blobs of data and attributes, while the cluster transparently handles replication and recovery internally". This part works very well and seams to be as much reliable as should be. Replication is almost seamless and always on time, even if your osd/mds daemon crash during heavy load.
  3. "Robust, open-source distributed storage". From what I know, we are using other distributed file system at different cluster and it isn't as reliable as it should be. So it's big chance for Ceph, to be our primary clustered fs. So far it behaves great, simulated test crashes went good enough to move our log backups to Ceph.
This very brief description and comment about Ceph and most interesting features. More can by found in the article posted at IBM website. Ceph doc and wiki are also good places to start.


Now our Chef cookbook, it's publicly available at github. It's my first Chef 'big' thing so code may be a little rough, but it's reliable and heavily tested. Forks, pull requests, comments are welcome.

Available recipes (each node recipe provides service definition, code is well commented, there isn't much to add):
  • default.rb - basic recipe. Installs necessary packages at debian/ubuntu, generates ceph.conf required by nodes and clients.
  • mds.rb - configures mds. Two cases: setting up first mds and expanding cluster, first mds is primary, each next is backup.
  • osd.rb - configures osd. Two cases like mds, it's also generates subsequent osd ids. Because osds can't have literal names, at this moment each osd have the same data and you need two to start cluster.
  • mon.rb - configures mon. Three cases this time, first is initial mon, second expanding cluster by additional mon it requires mon_snapshot.rb, third is registering new mons at existing one.
  • mon_snapshot.rb - it should create mon snapshot which is required to expand cluster as described at wiki. Because of data_bag bug it's done by hand now, it will be fixed after upgrading Chef to latest version.
  • prepare.rb - it's our internal recipe. Creates directory structure at cluster.

At start we need initial cluster, one node with each recipe fully executed (1 mds, 1 osd, 1 mon), now our cluster will be in degraded state, we should have at least two osd nodes. Expanding is done by adding recipes to run_list. During adding new osd or mds after first chef-client run, you have to run once chef-client at mon.

IMPORTANT: During process of adding even mon, cluster will be offline. As it was described total number of mon nodes have to be odd.

Recipes also provide templates for monitoring software - munin and monit. Munin monitoring is done by ceph-perf which communicates with ceph administrative sockets.


Ceph has also great, always willing to help developers. Only pity is that they are from US time zone. You can contact them through irc channel #ceph at oftc network.


2011-10-11

Łódź Ruby User Group - 20 października, 18:00

Zapraszamy wszystkich, których interesuje programowanie aplikacji webowych na spotkania łódzkiej grupy użytkowników Ruby. Kolejny LRUG już 20 października o 18:00.
Agenda:
  1. Wprowadzenie do websockets ‐ Bartłomiej Kozal, Ragnarson
  2. Faye w Ruby on Rails ‐ Grzegorz Kołodziejczyk
  3. Przetwarzanie GB danych w czasie rzeczywistym z wykorzystaniem Goliath i Resque ‐ Łukasz Piestrzeniewicz, Ragnarson
Spotykamy się w biurze firmy Ragnarson (Łąkowa 11, główny budynek z cegły, 1. klatka, 1. piętro)
Szczegóły na plakacie: http://cl.ly/273S0N3v3k1A1a1Q291b
I na stronie: http://www.lrug.pl

2011-10-07

Navigating through many pages with phantom.js

Phantom.js is light, minimalistic WebKit library. Basically it acts as any modern WebKit browser but you control it with scripts written in javascript or coffee.  This turns out to be really handy when you need to automate any task that requires real web browser to be involved.

It's really simple when you want to use phantom just to get page from one specific url.  In such case you define onLoadFinished callback on phantom's WebPage object and do all your stuff there.

onLoadFinished is called every single time the page is loaded after new request has been made, but the only parameter it receives is load status. In most situations we want to perform different operations depending on currently loaded page. In order to do that we have to store some kind of information that identifies current page.

The very simple solution is to store current location in some persistent attribute. So we can access and change it form onLoadFinished.

Example: We want to use phantom to get a list of chicken soup recipies from BBC Recipes and print it on console.  In order to do that, we first have to fill in search box with 'chicken soup' and submit the search form. Then we are getting redirected to search results page, from where we can get the list of recipes.

Here is simple coffee script that does the job:


Please notice that phantom.state is not defined in original phantomjs code base. We define it dynamically on phantom object. It's the popular method among phantomjs users. If you don't like it or find it dangerous, you can always use other attribute or create a global variable for that purpose.

2011-10-06

Nowy Magister

Wczoraj Marcin Baliński, jeden z naszych programistów obronił swoją pracę magisterską "Testowo Zorientowane Metodyki Rozwoju Oprogramowania".

Opisuje ona jak prawidłowo rozwija się oprogramowanie w Ruby z pełnym wykorzystaniem testów i jest dostępna on-line na github: github.com/marcinb/thesis/.

Gratulujemy!

2011-09-21

Unfuddle - do not use

There are many different bug trackers.

For me hosted solutions that serve single purpose, do it right and allow to integrate with others are the best. This is why my toolset of choice is Pivotal Tracker, GithubBasecamp and Campfire. Each of those services is perfected in its own area.

On the other end of spectrum are tools that want to do all things at once. Unfuddle falls into this category. What you get is not tool that is good in all areas. Rather you get a half-assed product that gives mediocre performance in all features.

Unfuddle is slow, experiences heavy featurosis and looks like it was written by guy who found out about Ajax and wanted a test field for all it's possibilities. Avoid at all costs.