Hello team!
Me and my team are developing a solution for developers to be able to manage infrastructure resources through an API. This solution relies on crossplane to manage cloud resources as kubernetes CRs.
We've started using terrajet for AWS, but changed to the native provider because of this issue. New we also need to manage other resources for which there are no native providers (namely vault, boundary and datadog) and it would be great to use terrajet.
Existing Context
initial slack thread with problem description
What happened?
The expected behaviour is that terrajet resources time-to-readiness wouldn't depend on the amount of resources that already exist in the cluster.
In reality, the terrajet controller takes roughly 3
* (number of existing resources)
* (time it takes for one resource to be reconciled)
seconds to consider a new resource ready.
The examples in this issue are using provider-jet-aws
, but all terrajet providers have the same behaviour, from what I can see.
Below are the debug logs related to one recently created policy (terrajetloadtestslow
). It was created on a provider that is already managing 200 other aws policies.
crossplane-provider-jet-aws-599c4ea494ca-55b9dd885f-zr2pb provider-jet-aws 1.6630252994777613e+09 DEBUG provider-jet-aws Reconciling {"controller": "managed/iam.aws.jet.crossplane.io/v1alpha2, kind=policy", "request": "/terrajetloadtestslow"}
crossplane-provider-jet-aws-599c4ea494ca-55b9dd885f-zr2pb provider-jet-aws 1.6630253028353422e+09 DEBUG provider-jet-aws refresh ended {"workspace": "/tmp/636bbd3e-94bb-40ae-acde-f866e095b0e6", "out": "{\"@level\":\"info\",\"@message\":\"Terraform 1.0.5\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:28:20.558331Z\",\"terraform\":\"1.0.5\",\"type\":\"version\",\"ui\":\"0.1.0\"}\n{\"@level\":\"info\",\"@message\":\"aws_iam_policy.terrajetloadtestslow: Refreshing state...\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:28:22.822179Z\",\"hook\":{\"resource\":{\"addr\":\"aws_iam_policy.terrajetloadtestslow\",\"module\":\"\",\"resource\":\"aws_iam_policy.terrajetloadtestslow\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_iam_policy\",\"resource_name\":\"terrajetloadtestslow\",\"resource_key\":null},\"id_key\":\"id\"},\"type\":\"refresh_start\"}\n{\"@level\":\"info\",\"@message\":\"aws_iam_policy.terrajetloadtestslow: Refresh complete\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:28:22.822992Z\",\"hook\":{\"resource\":{\"addr\":\"aws_iam_policy.terrajetloadtestslow\",\"module\":\"\",\"resource\":\"aws_iam_policy.terrajetloadtestslow\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_iam_policy\",\"resource_name\":\"terrajetloadtestslow\",\"resource_key\":null}},\"type\":\"refresh_complete\"}\n{\"@level\":\"info\",\"@message\":\"aws_iam_policy.terrajetloadtestslow: Drift detected (delete)\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:28:22.823374Z\",\"change\":{\"resource\":{\"addr\":\"aws_iam_policy.terrajetloadtestslow\",\"module\":\"\",\"resource\":\"aws_iam_policy.terrajetloadtestslow\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_iam_policy\",\"resource_name\":\"terrajetloadtestslow\",\"resource_key\":null},\"action\":\"delete\"},\"type\":\"resource_drift\"}\n{\"@level\":\"info\",\"@message\":\"Plan: 0 to add, 0 to change, 0 to destroy.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:28:22.823487Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"plan\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Apply complete! Resources: 0 added, 0 changed, 0 destroyed.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:28:22.829034Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"apply\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Outputs: 0\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:28:22.829081Z\",\"outputs\":{},\"type\":\"outputs\"}\n"}
crossplane-provider-jet-aws-599c4ea494ca-55b9dd885f-zr2pb provider-jet-aws 1.6630253059385006e+09 DEBUG provider-jet-aws apply ended {"workspace": "/tmp/636bbd3e-94bb-40ae-acde-f866e095b0e6", "out": "{\"@level\":\"info\",\"@message\":\"Terraform 1.0.5\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:28:22.897931Z\",\"terraform\":\"1.0.5\",\"type\":\"version\",\"ui\":\"0.1.0\"}\n{\"@level\":\"info\",\"@message\":\"aws_iam_policy.terrajetloadtestslow: Plan to create\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:28:25.127798Z\",\"change\":{\"resource\":{\"addr\":\"aws_iam_policy.terrajetloadtestslow\",\"module\":\"\",\"resource\":\"aws_iam_policy.terrajetloadtestslow\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_iam_policy\",\"resource_name\":\"terrajetloadtestslow\",\"resource_key\":null},\"action\":\"create\"},\"type\":\"planned_change\"}\n{\"@level\":\"info\",\"@message\":\"Plan: 1 to add, 0 to change, 0 to destroy.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:28:25.127975Z\",\"changes\":{\"add\":1,\"change\":0,\"remove\":0,\"operation\":\"plan\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"aws_iam_policy.terrajetloadtestslow: Creating...\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:28:25.630013Z\",\"hook\":{\"resource\":{\"addr\":\"aws_iam_policy.terrajetloadtestslow\",\"module\":\"\",\"resource\":\"aws_iam_policy.terrajetloadtestslow\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_iam_policy\",\"resource_name\":\"terrajetloadtestslow\",\"resource_key\":null},\"action\":\"create\"},\"type\":\"apply_start\"}\n{\"@level\":\"info\",\"@message\":\"aws_iam_policy.terrajetloadtestslow: Creation complete after 0s [id=arn:aws:iam::ACCOUNT_NUMBER:policy/terrajetloadtestslow]\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:28:25.924980Z\",\"hook\":{\"resource\":{\"addr\":\"aws_iam_policy.terrajetloadtestslow\",\"module\":\"\",\"resource\":\"aws_iam_policy.terrajetloadtestslow\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_iam_policy\",\"resource_name\":\"terrajetloadtestslow\",\"resource_key\":null},\"action\":\"create\",\"id_key\":\"id\",\"id_value\":\"arn:aws:iam::ACCOUNT_NUMBER:policy/terrajetloadtestslow\",\"elapsed_seconds\":0},\"type\":\"apply_complete\"}\n{\"@level\":\"info\",\"@message\":\"Apply complete! Resources: 1 added, 0 changed, 0 destroyed.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:28:25.932091Z\",\"changes\":{\"add\":1,\"change\":0,\"remove\":0,\"operation\":\"apply\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Outputs: 0\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:28:25.932195Z\",\"outputs\":{},\"type\":\"outputs\"}\n"}
crossplane-provider-jet-aws-599c4ea494ca-55b9dd885f-zr2pb provider-jet-aws 1.663025305947192e+09 DEBUG provider-jet-aws Successfully requested creation of external resource {"controller": "managed/iam.aws.jet.crossplane.io/v1alpha2, kind=policy", "request": "/terrajetloadtestslow", "uid": "636bbd3e-94bb-40ae-acde-f866e095b0e6", "version": "573953565", "external-name": "", "external-name": "arn:aws:iam::ACCOUNT_NUMBER:policy/terrajetloadtestslow"}
crossplane-provider-jet-aws-599c4ea494ca-55b9dd885f-zr2pb provider-jet-aws 1.6630253059475644e+09 DEBUG events Normal {"object": {"kind":"Policy","name":"terrajetloadtestslow","uid":"636bbd3e-94bb-40ae-acde-f866e095b0e6","apiVersion":"iam.aws.jet.crossplane.io/v1alpha2","resourceVersion":"573965831"}, "reason": "CreatedExternalResource", "message": "Successfully requested creation of external resource"}
crossplane-provider-jet-aws-599c4ea494ca-55b9dd885f-zr2pb provider-jet-aws 1.663025826864625e+09 DEBUG provider-jet-aws Reconciling {"controller": "managed/iam.aws.jet.crossplane.io/v1alpha2, kind=policy", "request": "/terrajetloadtestslow"}
crossplane-provider-jet-aws-599c4ea494ca-55b9dd885f-zr2pb provider-jet-aws 1.6630258282078466e+09 DEBUG provider-jet-aws refresh ended {"workspace": "/tmp/636bbd3e-94bb-40ae-acde-f866e095b0e6", "out": "{\"@level\":\"info\",\"@message\":\"Terraform 1.0.5\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:37:06.895864Z\",\"terraform\":\"1.0.5\",\"type\":\"version\",\"ui\":\"0.1.0\"}\n{\"@level\":\"info\",\"@message\":\"aws_iam_policy.terrajetloadtestslow: Refreshing state... [id=arn:aws:iam::ACCOUNT_NUMBER:policy/terrajetloadtestslow]\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:37:08.112383Z\",\"hook\":{\"resource\":{\"addr\":\"aws_iam_policy.terrajetloadtestslow\",\"module\":\"\",\"resource\":\"aws_iam_policy.terrajetloadtestslow\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_iam_policy\",\"resource_name\":\"terrajetloadtestslow\",\"resource_key\":null},\"id_key\":\"id\",\"id_value\":\"arn:aws:iam::ACCOUNT_NUMBER:policy/terrajetloadtestslow\"},\"type\":\"refresh_start\"}\n{\"@level\":\"info\",\"@message\":\"aws_iam_policy.terrajetloadtestslow: Refresh complete [id=arn:aws:iam::ACCOUNT_NUMBER:policy/terrajetloadtestslow]\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:37:08.197666Z\",\"hook\":{\"resource\":{\"addr\":\"aws_iam_policy.terrajetloadtestslow\",\"module\":\"\",\"resource\":\"aws_iam_policy.terrajetloadtestslow\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_iam_policy\",\"resource_name\":\"terrajetloadtestslow\",\"resource_key\":null},\"id_key\":\"id\",\"id_value\":\"arn:aws:iam::ACCOUNT_NUMBER:policy/terrajetloadtestslow\"},\"type\":\"refresh_complete\"}\n{\"@level\":\"info\",\"@message\":\"Plan: 0 to add, 0 to change, 0 to destroy.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:37:08.198356Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"plan\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Apply complete! Resources: 0 added, 0 changed, 0 destroyed.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:37:08.203956Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"apply\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Outputs: 0\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:37:08.203989Z\",\"outputs\":{},\"type\":\"outputs\"}\n"}
crossplane-provider-jet-aws-599c4ea494ca-55b9dd885f-zr2pb provider-jet-aws 1.6630258282080562e+09 DEBUG provider-jet-aws External resource is up to date {"controller": "managed/iam.aws.jet.crossplane.io/v1alpha2, kind=policy", "request": "/terrajetloadtestslow", "uid": "636bbd3e-94bb-40ae-acde-f866e095b0e6", "version": "573965832", "external-name": "arn:aws:iam::ACCOUNT_NUMBER:policy/terrajetloadtestslow", "requeue-after": 1663025888.2080543}
crossplane-provider-jet-aws-599c4ea494ca-55b9dd885f-zr2pb provider-jet-aws 1.6630263118990378e+09 DEBUG provider-jet-aws Reconciling {"controller": "managed/iam.aws.jet.crossplane.io/v1alpha2, kind=policy", "request": "/terrajetloadtestslow"}
crossplane-provider-jet-aws-599c4ea494ca-55b9dd885f-zr2pb provider-jet-aws 1.6630263132372632e+09 DEBUG provider-jet-aws refresh ended {"workspace": "/tmp/636bbd3e-94bb-40ae-acde-f866e095b0e6", "out": "{\"@level\":\"info\",\"@message\":\"Terraform 1.0.5\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:45:11.952075Z\",\"terraform\":\"1.0.5\",\"type\":\"version\",\"ui\":\"0.1.0\"}\n{\"@level\":\"info\",\"@message\":\"aws_iam_policy.terrajetloadtestslow: Refreshing state... [id=arn:aws:iam::ACCOUNT_NUMBER:policy/terrajetloadtestslow]\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:45:13.135934Z\",\"hook\":{\"resource\":{\"addr\":\"aws_iam_policy.terrajetloadtestslow\",\"module\":\"\",\"resource\":\"aws_iam_policy.terrajetloadtestslow\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_iam_policy\",\"resource_name\":\"terrajetloadtestslow\",\"resource_key\":null},\"id_key\":\"id\",\"id_value\":\"arn:aws:iam::ACCOUNT_NUMBER:policy/terrajetloadtestslow\"},\"type\":\"refresh_start\"}\n{\"@level\":\"info\",\"@message\":\"aws_iam_policy.terrajetloadtestslow: Refresh complete [id=arn:aws:iam::ACCOUNT_NUMBER:policy/terrajetloadtestslow]\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:45:13.226661Z\",\"hook\":{\"resource\":{\"addr\":\"aws_iam_policy.terrajetloadtestslow\",\"module\":\"\",\"resource\":\"aws_iam_policy.terrajetloadtestslow\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_iam_policy\",\"resource_name\":\"terrajetloadtestslow\",\"resource_key\":null},\"id_key\":\"id\",\"id_value\":\"arn:aws:iam::ACCOUNT_NUMBER:policy/terrajetloadtestslow\"},\"type\":\"refresh_complete\"}\n{\"@level\":\"info\",\"@message\":\"Plan: 0 to add, 0 to change, 0 to destroy.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:45:13.227599Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"plan\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Apply complete! Resources: 0 added, 0 changed, 0 destroyed.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:45:13.232695Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"apply\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Outputs: 0\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2022-09-12T23:45:13.232734Z\",\"outputs\":{},\"type\":\"outputs\"}\n"}
crossplane-provider-jet-aws-599c4ea494ca-55b9dd885f-zr2pb provider-jet-aws 1.6630263132466455e+09 DEBUG provider-jet-aws External resource is up to date {"controller": "managed/iam.aws.jet.crossplane.io/v1alpha2, kind=policy", "request": "/terrajetloadtestslow", "uid": "636bbd3e-94bb-40ae-acde-f866e095b0e6", "version": "573976982", "external-name": "arn:aws:iam::ACCOUNT_NUMBER:policy/terrajetloadtestslow", "requeue-after": 1663026373.2466433}
From these logs we can also see the time it took for the resource to be Ready
. From 23:28:19 to 23:45:13, ~17 minutes. (using GMT time to make it easier)
It should also be noted that the resource is created in the cloud provider at 23:28:25 (12 seconds), when the logs say Successfully requested creation of external resource
, but it takes a long time until the resource is marked as Ready
in kubernetes
The reason why it takes so long seems to be that resource change events are not given any priority over the normal reconciliation loop. The controller stays in an endless loop reconciling every resource it knows about, which is necessary to override external changes in the providers, but any resource that is added or changed in kubernetes doesn't get immediately picked up. Rather it seems to be added to the queue and only seen after all existing reconcile loops are done, then the resource is picked up, and a creation flow starts, but the result will only be seen on the next reconciliation loop. Once that comes, terrajet will pick up the resource's status from the terraform state and requeue once more, only after this last requeue the status will be updated to Ready
.
It's also worth noting that changing the MaxConcurrentReconciles configuration to something bigger, like 5 or 10 reduces the time by that factor, but the underlying behavior persists and all resources are reconciled before considering new resource or updates Ready
.
What I'm having trouble understanding is how this terrajet
workflow differs from the native provider-aws
one, as both seem to use the same Reconcile function from crossplane-runtime.
Also, the UseAsync
configuration for a resource doesn't seem to change anything. I've tried both for policies.iam
and for repositories.ecr
with the same behavior.
How can we reproduce it?
- Create a new kubernetes cluster (with kind or in the cloud).
- Install crossplane
- Install a terrajet provider (I'll use AWS because it can be easily compared with the native
provider-aws
)
- Create a handful of resources to be managed by terrajet (I'll use ECR because it is quickly created and dont incur costs)
- Wait until all resources are created and ready
it will take some minutes, but a burst of resources is expected to take a bit.
Althought it does take much longer than
provider-aws
for the same resource.
- Create one more resource to be managed by terrajet
The last step will take a long time, which is the problem this bug report is about.
Open collapsible for reproducible commands
# Create kind cluster
kind create cluster --name terrajet-load --image kindest/node:v1.23.10
# Install crossplane with helm
kubectl create namespace crossplane-system
helm repo add crossplane-stable https://charts.crossplane.io/stable
helm repo update
helm upgrade --install crossplane --namespace crossplane-system crossplane-stable/crossplane
# Install AWS terrajet provider
kubectl apply -f - <<YAML
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
name: crossplane-provider-jet-aws
spec:
controllerConfigRef:
name: config
package: crossplane/provider-jet-aws:v0.5.0
---
apiVersion: pkg.crossplane.io/v1alpha1
kind: ControllerConfig
metadata:
name: config
spec:
args:
- --debug
YAML
## use your aws credentials in the secret, if you have a custom way to interact with AWS, change the credentials key of this secret.
kubectl create secret generic -n crossplane-system --from-file=credentials=$HOME/.aws/credentials aws-credentials
kubectl apply -f - <<YAML
apiVersion: aws.jet.crossplane.io/v1alpha1
kind: ProviderConfig
metadata:
finalizers:
- in-use.crossplane.io
name: default
spec:
credentials:
secretRef:
key: credentials
name: aws-credentials
namespace: crossplane-system
source: Secret
YAML
# Create a handful of resources managed by terrajet. I chose policies because they are created near-instantly in the cloud and don't incur costs
## Note: seq on MacOS doesn't seem to support the -w flag, it can be removed safely below
for n in $(seq -w 200); do
echo "---"
sed "s/NUMBER/$n/" <<YAML
apiVersion: iam.aws.jet.crossplane.io/v1alpha2
kind: Policy
metadata:
name: terrajetloadtestNUMBER
spec:
forProvider:
name: terrajetloadtestNUMBER
policy: |
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Action": "*",
"Resource": ["*"]
}
]
}
providerConfigRef:
name: default
YAML
done | kubectl apply -f -
# Wait until all are ready, it took about 15 minutes for me
kubectl get policies.iam.aws.jet.crossplane.io
# Create one more resource
kubectl apply -f - <<YAML
apiVersion: iam.aws.jet.crossplane.io/v1alpha2
kind: Policy
metadata:
name: terrajetloadtestslow
spec:
forProvider:
name: terrajetloadtestslow
policy: |
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Action": "*",
"Resource": ["*"]
}
]
}
providerConfigRef:
name: default
YAML
# It takes a long time for the resource to become ready.
kubectl get policies.iam.aws.jet.crossplane.io terrajetloadtestslow
########
# Cleanup cloud resources
kubectl delete policy.iam.aws.jet $(kubectl get policy.iam.aws.jet | grep terrajetloadtest | awk '{print $1}')
# Delete kind cluster
kind destroy cluster --name terrajet-load
Possibly related discussions
Please let me know if I can help figure out what's happening.
bug