How to manage/monitor multiple AWS batch jobs that should all complete successfully before the next step?

I am working on a system that involves launching multiple AWS batch jobs, each takes between 5 to 15 minutes on average to complete.

I need to provide a mechanism that will let me know once all the jobs have completed successfully, or if any failures have occurred, so I can proceed to the next step accordingly.

Also, I don’t yet have a good strategy for handling errors. For example, how to deal with the case where only one or a few jobs failed after a certain number of retries? My first guess is that we can have a failure threshold that will dictate whether or not the overall step in the process (i.e. the collection of AWS batch jobs) can/should proceed. Something along the lines of

if failed_jobs > failed_job_threshold:
    raise RuntimeError("Too many failed jobs")

The process recreates/repopulates a database table periodically (i.e. each month the overall job runs to recreate/repopulate a database table). Therefore any individual batch job failures will entail an incomplete database table and will require attention.

Is there a “best practice” architecture for handling this use case?

My development landscape includes Python, Terraform, Bitbucket Pipelines, AWS (Lambda, Batch, SQS, RDS/Aurora, etc.), and PostgreSQL.

Go to Source
Author: James Adams

Infinite loop when trying to launch a symbolic-link to a bash script

I am trying to create a symbolic-link by using:

$ ln -s path/to/ ~/.local/bin/foo

To the following bashscript:


  1 #!/usr/bin/env bash
  2  appname=`basename $0 | sed s,.sh$,,`
  4  dirname=`dirname $0`
  5  tmp="${dirname#?}"
  7  if [ "${dirname%$tmp}" != "/" ]; then
  8  dirname=$PWD/$dirname
  9  fi
 10  LD_LIBRARY_PATH=$dirname
 11  export LD_LIBRARY_PATH
 12  $dirname/$appname $*

I dont understand why i get the following error, when I try to launch, script above, from the symbolic link:

bash: warning: shell level (1000) too high, resetting to 1

Go to Source
Author: z3r0p1r