EElasticsearch Handbook

İLERİ

Cluster Yönetimi

Production cluster'da node ekleme/çıkarma, rolling upgrade, ve cluster state yönetimi.

Kod örneği tercihiBu sayfadaki istemci örneklerini birlikte değiştirir.

Seviye: İleri+ — Bu bölüm production deneyimi gerektirir.

Rolling Upgrade / Node Maintenance Flow 1. Drain Node allocation.exclude Shard'lar taşınır 2. Wait Green _cluster/health Tüm shard relocated 3. Stop Node systemctl stop Upgrade/maintenance 4. Start Node Yeni versiyon ile Cluster'a katıl 5. Remove Exclude kaldır Shard rebalance Dikkat: Upgrade öncesi mutlaka snapshot al! PUT _snapshot/backup/pre-upgrade → wait_for_completion=true → ardından rolling upgrade başlat Rollback stratejisi: snapshot restore + önceki versiyon node'ları başlat

Karar Rehberi

DurumÖneriÖrnek veya gerekçe
Rolling restart Uygun: Minor upgrade, config değişikliği 9.3→9.4 upgrade
Full cluster restart Uygun: Major upgrade, critical fix 8.x→9.x migration
Node drain (exclude) Uygun: Planlı bakım HW değişimi
Snapshot/restore Uygun: Backup, DR, migration Gece backup
Shard allocation filter Uygun: Tier-based (hot/warm) ILM tier routing
Voting config exclusion Uygun: Master node kaldırma Cluster shrink
# Cluster health
curl -s "http://localhost:9200/_cluster/health?pretty"

# Node bilgileri
curl -s "http://localhost:9200/_cat/nodes?v&h=name,role,heap.percent,disk.used_percent,cpu"

# Shard allocation explain (unassigned shard debug)
curl -X GET "http://localhost:9200/_cluster/allocation/explain?pretty" -H "Content-Type: application/json" -d'
{
  "index": "products",
  "shard": 0,
  "primary": true
}'

# Rolling restart - node drain
curl -X PUT "http://localhost:9200/_cluster/settings" -H "Content-Type: application/json" -d'
{
  "persistent": {
    "cluster.routing.allocation.exclude._name": "node-to-remove"
  }
}'

# Exclude kaldır (node geri geldiğinde)
curl -X PUT "http://localhost:9200/_cluster/settings" -H "Content-Type: application/json" -d'
{
  "persistent": {
    "cluster.routing.allocation.exclude._name": null
  }
}'

# Snapshot (backup)
curl -X PUT "http://localhost:9200/_snapshot/my_backup" -H "Content-Type: application/json" -d'
{
  "type": "fs",
  "settings": { "location": "/mnt/backups/es" }
}'

curl -X PUT "http://localhost:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true" -H "Content-Type: application/json" -d'
{
  "indices": "products,orders",
  "ignore_unavailable": true
}'
public class ClusterManagementService
{
    private readonly ElasticsearchClient _client;

    public async Task<ClusterHealthInfo> GetClusterHealthAsync()
    {
        var health = await _client.Cluster.HealthAsync();
        return new ClusterHealthInfo(
            health.Status.ToString(),
            health.NumberOfNodes,
            health.ActiveShards,
            health.UnassignedShards,
            health.NumberOfPendingTasks);
    }

    public async Task DrainNodeAsync(string nodeName)
    {
        await _client.Cluster.PutSettingsAsync(s => s
            .Persistent(p => p.Add("cluster.routing.allocation.exclude._name", nodeName)));
    }
}

Örnek: Production'da ES upgrade: Önce rolling restart ile node drain → upgrade → rejoin. Shard allocation exclude ile o node'daki shard'lar diğerlerine taşınır, ardından güvenle restart yapılır.

Operasyonel Güvenlik Prosedürleri

# === SNAPSHOT RESTORE (Disaster Recovery) ===
# 1. Mevcut snapshot'ları listele
curl -X GET "http://localhost:9200/_snapshot/my_backup/_all?verbose=false"

# 2. Restore öncesi: target index'i kapat veya sil
curl -X POST "http://localhost:9200/products/_close"

# 3. Restore (belirli index, rename ile)
curl -X POST "http://localhost:9200/_snapshot/my_backup/snapshot_2026-05-30/_restore" -H "Content-Type: application/json" -d'
{
  "indices": "products",
  "rename_pattern": "(.+)",
  "rename_replacement": "restored_$1",
  "index_settings": {
    "index.number_of_replicas": 0
  }
}'

# 4. Restore durumunu izle
curl -X GET "http://localhost:9200/restored_products/_recovery?active_only=true"

# 5. Doğrulama: doc count karşılaştır
curl -X GET "http://localhost:9200/restored_products/_count"

# === ZERO-DOWNTIME REINDEX (Alias Swap) ===
# 1. Yeni index oluştur (yeni mapping ile)
curl -X PUT "http://localhost:9200/products_v2" -H "Content-Type: application/json" -d'
{
  "mappings": { "properties": { "name": { "type": "text", "analyzer": "turkish_custom" } } }
}'

# 2. Reindex (eski → yeni)
curl -X POST "http://localhost:9200/_reindex?wait_for_completion=false" -H "Content-Type: application/json" -d'
{
  "source": { "index": "products_v1" },
  "dest": { "index": "products_v2" }
}'

# 3. Task durumunu izle
curl -X GET "http://localhost:9200/_tasks/<task_id>"

# 4. Atomic alias swap (zero downtime)
curl -X POST "http://localhost:9200/_aliases" -H "Content-Type: application/json" -d'
{
  "actions": [
    { "remove": { "index": "products_v1", "alias": "products" } },
    { "add": { "index": "products_v2", "alias": "products" } }
  ]
}'
# Uygulamanız "products" alias'ına sorgu yapar — swap anlık ve atomic

# === UPGRADE ROLLBACK ===
# Node upgrade başarısız olursa:
# 1. Upgraded node'u durdur
# 2. Eski binary'yi geri yükle
# 3. Node'u başlat (aynı data dizini ile)
# 4. Allocation exclude'u kaldır
curl -X PUT "http://localhost:9200/_cluster/settings" -H "Content-Type: application/json" -d'
{
  "persistent": { "cluster.routing.allocation.exclude._name": null }
}'
# Node cluster'a rejoin eder, shard'lar geri gelir

Snapshot restore sonrası MUTLAKA doğrulama yapın: _count API ile doc count, _search ile sample query, ve _cat/indices ile health=green kontrolü. Otomatik restore test'ini aylık çalıştırın.

.NET Client (Snapshot Restore + Alias Swap)
public class OperationalSafetyService
{
    private readonly ElasticsearchClient _client;

    public OperationalSafetyService(ElasticsearchClient client) => _client = client;

    // === Snapshot Restore ===
    public async Task<RestoreResponse> RestoreSnapshotAsync(
        string repo, string snapshot, string index)
    {
        // Index'i kapat (restore için gerekli)
        await _client.Indices.CloseAsync(index);

        var response = await _client.Snapshot.RestoreAsync(repo, snapshot, r => r
            .Indices(index)
            .IndexSettings(s => s.Add("index.number_of_replicas", "0"))
            .WaitForCompletion(true));

        // Doğrulama: doc count kontrol
        var count = await _client.CountAsync<object>(c => c.Indices(index));
        return new RestoreResponse(response.IsValidResponse, count.Count);
    }

    // === Zero-Downtime Alias Swap ===
    public async Task<bool> AtomicAliasSwapAsync(
        string alias, string oldIndex, string newIndex)
    {
        var response = await _client.Indices.UpdateAliasesAsync(a => a
            .Actions(actions => actions
                .Remove(r => r.Index(oldIndex).Alias(alias))
                .Add(add => add.Index(newIndex).Alias(alias))));

        return response.IsValidResponse;
    }

    // === Reindex with Progress ===
    public async Task<string> StartReindexAsync(string source, string dest)
    {
        var response = await _client.ReindexAsync(r => r
            .Source(s => s.Index(source))
            .Dest(d => d.Index(dest))
            .WaitForCompletion(false));

        return response.Task!; // Task ID for monitoring
    }

    // === Rollback: Clear Allocation Exclude ===
    public async Task ClearAllocationExcludeAsync()
    {
        await _client.Cluster.PutSettingsAsync(s => s
            .Persistent(p => p.Add("cluster.routing.allocation.exclude._name", null!)));
    }
}

public record RestoreResponse(bool Success, long DocumentCount);