In Ceph architecture we can distinguish three types of nodes:
- mds - metadata server daemon. It's crucial to has at least one mds node, its role is to coordinate access to osd nodes, cache and manage metadata. Our cluster has one "primary" mds. Each additional node is on 'standby', which means that we always have backup mds nodes ready to deploy.
- mon - monitoring daemon. We need one, three or other odd number of monitoring nodes. It is closely related to Paxos algorithm, used to achive consensus among distributed systems. Monitor manages cluster map.
- osd - object storage daemon. Basically it holds data, having at least two osd nodes is required. Data distribution is described in CRUSH map. At this moment our cookbook provides CRUSH only for data redundanc.
- "Distributed file system, easily accessed via kernel client or fuse driver". Fuse driver works great, kernel client not so much. We've had some random crashes, but even Ceph wiki confirms that fuse driver is more stable and it's recommended one.
- "Object storage - Clients talk directly with storage nodes to store named blobs of data and attributes, while the cluster transparently handles replication and recovery internally". This part works very well and seams to be as much reliable as should be. Replication is almost seamless and always on time, even if your osd/mds daemon crash during heavy load.
- "Robust, open-source distributed storage". From what I know, we are using other distributed file system at different cluster and it isn't as reliable as it should be. So it's big chance for Ceph, to be our primary clustered fs. So far it behaves great, simulated test crashes went good enough to move our log backups to Ceph.
Now our Chef cookbook, it's publicly available at github. It's my first Chef 'big' thing so code may be a little rough, but it's reliable and heavily tested. Forks, pull requests, comments are welcome.
Available recipes (each node recipe provides service definition, code is well commented, there isn't much to add):
- default.rb - basic recipe. Installs necessary packages at debian/ubuntu, generates ceph.conf required by nodes and clients.
- mds.rb - configures mds. Two cases: setting up first mds and expanding cluster, first mds is primary, each next is backup.
- osd.rb - configures osd. Two cases like mds, it's also generates subsequent osd ids. Because osds can't have literal names, at this moment each osd have the same data and you need two to start cluster.
- mon.rb - configures mon. Three cases this time, first is initial mon, second expanding cluster by additional mon it requires mon_snapshot.rb, third is registering new mons at existing one.
- mon_snapshot.rb - it should create mon snapshot which is required to expand cluster as described at wiki. Because of data_bag bug it's done by hand now, it will be fixed after upgrading Chef to latest version.
- prepare.rb - it's our internal recipe. Creates directory structure at cluster.
At start we need initial cluster, one node with each recipe fully executed (1 mds, 1 osd, 1 mon), now our cluster will be in degraded state, we should have at least two osd nodes. Expanding is done by adding recipes to run_list. During adding new osd or mds after first chef-client run, you have to run once chef-client at mon.
IMPORTANT: During process of adding even mon, cluster will be offline. As it was described total number of mon nodes have to be odd.
Recipes also provide templates for monitoring software - munin and monit. Munin monitoring is done by ceph-perf which communicates with ceph administrative sockets.
Ceph has also great, always willing to help developers. Only pity is that they are from US time zone. You can contact them through irc channel #ceph at oftc network.