CI/CD with GitHub Actions
Automated CI/CD pipeline using GitHub Actions with Workload Identity Federation for keyless GCP authentication.
Overview
The GitHub Actions pipeline provides:
- Production Deployments: Auto-deploy to
toralenamespace on push tomain - Staging Deployments: Deploy to
torale-stagingvia label or manual trigger - Security Scanning: Trivy vulnerability detection
- Parallel Builds: Fast builds with matrix strategy
- Keyless Auth: Workload Identity Federation (no service account keys)
Pipeline Flow
Validate → Build (3 parallel) → Scan → Deploy → Verify
↓ ↓ ↓ ↓ ↓
Check API/Worker/ Trivy Helmfile Health
Files Frontend Sync ChecksWorkflow Files
- Production:
.github/workflows/production.yml - Staging:
.github/workflows/staging.yml - Runtime: ~5-10 minutes
Setup
Prerequisites
- GKE cluster running
gcloudCLI authenticated- GitHub repository access
Keyless Authentication (Workload Identity Federation)
Benefits:
- ✅ No long-lived credentials
- ✅ No key rotation required
- ✅ Automatic short-lived tokens via OIDC
- ✅ More secure than service account keys
Setup:
bash
./scripts/setup-github-wif.shThis creates:
- Workload Identity Pool
- Workload Identity Provider
- Service account with GKE/GCR permissions
- Workload Identity binding
Add GitHub Secrets:
Go to repository Settings → Secrets and variables → Actions:
bash
gh secret set GCP_PROJECT_ID --body 'your-project-id'
gh secret set GCP_SERVICE_ACCOUNT --body 'github-actions@PROJECT.iam.gserviceaccount.com'
gh secret set GCP_WORKLOAD_IDENTITY_PROVIDER --body 'projects/123/locations/global/...'Environment Flow
Local Development → Staging → ProductionProduction Workflow
Triggers on push to main:
bash
git push origin mainSteps:
- ✅ Validate project structure
- ✅ Build API, Worker, Frontend images (parallel)
- ✅ Scan for vulnerabilities with Trivy
- ✅ Deploy to
toralenamespace via Helmfile - ✅ Verify rollout and health checks
- ✅ Update https://torale.ai
Staging Workflow
Triggers on deploy label or manual dispatch.
Deploy via Label:
- Add
deploylabel to any PR - Workflow runs once
- To redeploy after new commits: remove and re-add label
Deploy Manually:
bash
# Via GitHub UI: Actions → "Deploy to Staging" → Run workflow
# Via CLI:
gh workflow run staging.ymlStaging Environment
| Resource | Staging | Production |
|---|---|---|
| GKE Cluster | Shared | Shared |
| Namespace | torale-staging | torale |
| Database | Shared | Shared |
| Clerk App | Shared | Shared |
| Temporal Namespace | Shared | Shared |
| Temporal Task Queue | torale-staging | torale-tasks |
| Static IP | Separate | Separate |
| SSL Certificate | Separate | Separate |
| Domains | staging.torale.ai | torale.ai |
Lifecycle:
- Staging persists indefinitely
- Each deploy updates existing environment
- No automatic teardown on PR close/merge
- Manual teardown:
helm uninstall torale -n torale-staging
Monitoring
View Workflow Runs
GitHub UI: Repository → Actions tab
CLI:
bash
gh run list # List recent runs
gh run view <run-id> # View specific run
gh run watch # Watch live runCheck Deployment Status
bash
# Production
kubectl get pods -n torale
kubectl get ingress -n torale
# Staging
kubectl get pods -n torale-staging
kubectl get ingress -n torale-stagingView Logs
bash
# Production
kubectl logs -n torale -l app.kubernetes.io/component=api -f
# Staging
kubectl logs -n torale-staging -l app.kubernetes.io/component=api -fTroubleshooting
Build Fails: "Permission denied to push to GCR"
Solution:
bash
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:github-actions@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/storage.admin"Deployment Fails: "Error from server (Forbidden)"
Solution:
bash
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:github-actions@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/container.developer"Pods Not Starting: ImagePullBackOff
Check if image exists:
bash
gcloud container images list --repository=gcr.io/$PROJECT_ID
kubectl describe pod <pod-name> -n torale
kubectl get deployment torale-api -n torale -o yaml | grep image:Health Check Timeout
Increase timeout in workflow:
bash
kubectl rollout status deployment/torale-api -n torale --timeout=10mAdvanced Usage
Manual Workflow Trigger
bash
gh workflow run production.yml --ref mainRollback Deployment
bash
# Rollback to previous version
kubectl rollout undo deployment/torale-api -n torale
# View rollout history
kubectl rollout history deployment/torale-api -n torale
# Rollback to specific revision
kubectl rollout undo deployment/torale-api -n torale --to-revision=3Skip CI for Commit
bash
git commit -m "docs: update README [skip ci]"Security Best Practices
- Never commit GCP keys to git
- Use GitHub secrets for sensitive data
- Limit service account permissions to minimum required
- Enable branch protection on main
- Require PR reviews before merging
- Use Trivy scanning for vulnerability detection
Cost Optimization
GitHub Actions:
- Public repos: Unlimited free minutes
- Private repos: 2,000 minutes/month free
- Typical build: ~10 minutes
GKE Resources:
- Production: ~$12-19/month with Spot pods
- Staging: ~$12-19/month with Spot pods (same resources)
Tips:
- Use Spot pods (already configured)
- Right-size resources:
kubectl top pods - Monitor with GKE dashboard
Quick Reference
bash
# Workflow management
gh run list # List workflow runs
gh run watch # Watch current run
gh workflow run production.yml # Trigger manually
# Kubernetes
kubectl get pods -n torale # Production pods
kubectl get pods -n torale-staging # Staging pods
kubectl logs -n torale -l app=torale-api -f # API logs
# Deployment
git push origin main # Deploy to production
# Add 'deploy' label to PR # Deploy to staging