Diagnosing a multithreaded programming issue in Flink’s unit tests

Unfortunately, my contribution to resolve
FLINK-11568 Exception in Kinesis ShardConsumer hidden by InterruptedException introduced a race condition in the unit test code, leading to flaky unit tests reported in FLINK-12595. I made a pointed effort to fix the issue quickly, once it was brought to my attention. Race conditions can be difficult to experimentally reproduce and explain. In order to pinpoint the problem, I wanted to get a full understanding of the multithreaded test and surrounding code. So, I created a big ad-hoc diagram (mainly inspired by UML sequence diagrams) on a whiteboard which I thought would be fun to post here.

Each color represents a different thread. Blue is the unit test (on the left), black is the consumer thread (middle), and green is the KinesisShardConsumer thread (right). Ordering guarantees only exist within each lifeline (vertical line). Multiple lifelines can be run in parallel on the same object by different threads. By drawing the overall diagram carefully, especially blocking method calls and ordering guarantees, I was able to analyze the logic sufficiently to deduce a hypothesis for the likely cause of the issue (circled in red in the picture), and subsequently prove that hypothesis in the code. After that, fixing it was the easy part! It took a couple hours or so to do this work, but I enjoyed it and I was happy to have been able to provide a solution quickly.

AWS: allow an assumed role to assume another role

You may occasionally wish to allow an assumed IAM role, such as a role assumed via an EC2 instance profile, to assume another role. This is described in in Switching to an IAM Role (AWS CLI) as “role chaining“. If we wish for role A to be able to assume role B, for example, we must add a statement to the “trust policy” in role B, like this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "...",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::000000000000:role/a"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

On the EC2, assumed role A will start out looking something like this:

$ aws sts get-caller-identity
{
    "Account": "000000000000", 
    "UserId": "AROAJQTW5F5O55I5ZXQ24:i-00000000000000000", 
    "Arn": "arn:aws:sts::000000000000:assumed-role/a/i-00000000000000000"
}

Despite the fact that this is an assumed role and looks different from the Principal for role A which we referenced in our trust policy, it will still be allowed to assume role B.

IAM Role Policy for Kinesis “Enhanced Fan-Out” Consumers

When switching to version 2 of the KCL Java library and using the “Enhanced Fan-Out” consumer mode, it was difficult to determine the appropriate IAM policy because the AWS documentation did not mention any differences between the old consumer and the new consumer. However, by trial and error, a policy like the one below (though with your own account id) may be reasonable. Of course, you could also split out specific actions to more specific resources, but this is a reasonable first draft.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:CreateTable",
                "dynamodb:DeleteItem",
                "dynamodb:DescribeTable",
                "dynamodb:GetItem",
                "dynamodb:PutItem",
                "dynamodb:Scan",
                "dynamodb:UpdateItem"
            ],
            "Resource": [
                "arn:aws:dynamodb:us-west-2:111111111111:table/my-consumer-name"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "kinesis:DescribeStream",
                "kinesis:DescribeStreamConsumer",
                "kinesis:DescribeStreamSummary",
                "kinesis:GetShardIterator",
                "kinesis:GetRecords",
                "kinesis:ListShards",
                "kinesis:PutRecord",
                "kinesis:PutRecords",
                "kinesis:RegisterStreamConsumer",
                "kinesis:SubscribeToShard"
            ],
            "Resource": [
                "arn:aws:kinesis:us-west-2:111111111111:stream/my-stream-name",
                "arn:aws:kinesis:us-west-2:111111111111:stream/my-stream-name/consumer/my-consumer-name",
                "arn:aws:kinesis:us-west-2:111111111111:stream/my-stream-name/consumer/my-consumer-name:*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:PutMetricData"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

Grammar Nazi PSA

“Checkout” is a noun. “Check out” is a verb (specifically a “phrasal verb”).

“Setup” is a noun. “Set up” is a verb.

Bad:

“How to setup X”
“First, setup X”
“Checkout the git repo.”
“For more info, checkout the documentation on the wiki.”

Good:

“How to set up X”
“First, set up X”
“Check out the git repo.”
“For more info, check out the documentation on the wiki.”

Bad:

“That’s a great set up.” (You have a great set which is up?)
“Maybe the check out failed.”

Good:

“That’s a great setup.”
“Maybe the checkout failed.”

Ciphers supported by AWS (classic) ELBs

I recently had some trouble deploying an app to AWS after enabling HTTPS/TLS on the application because the health check was failing. It turned out that because I had also restricted the list of ciphers my app could use (per my organization’s security recommendations), the ELB was unable to connect to the app because it did not support any of my app’s ciphers. Unfortunately, the AWS docs do not explain what ciphers are supported between a classic ELB and the app. So, here’s the current list:

TLS_RSA_WITH_AES_256_GCM_SHA384 (0x009d)
TLS_RSA_WITH_AES_256_CBC_SHA256 (0x003d)
TLS_RSA_WITH_AES_256_CBC_SHA (0x0035)
TLS_RSA_WITH_CAMELLIA_256_CBC_SHA (0x0084)
TLS_RSA_WITH_AES_128_GCM_SHA256 (0x009c)
TLS_RSA_WITH_AES_128_CBC_SHA256 (0x003c)
TLS_RSA_WITH_AES_128_CBC_SHA (0x002f)
TLS_RSA_WITH_CAMELLIA_128_CBC_SHA (0x0041)
TLS_RSA_WITH_RC4_128_SHA (0x0005)
TLS_RSA_WITH_3DES_EDE_CBC_SHA (0x000a)
TLS_DHE_DSS_WITH_AES_256_GCM_SHA384 (0x00a3)
TLS_DHE_RSA_WITH_AES_256_GCM_SHA384 (0x009f)
TLS_DHE_RSA_WITH_AES_256_CBC_SHA256 (0x006b)
TLS_DHE_DSS_WITH_AES_256_CBC_SHA256 (0x006a)
TLS_DHE_RSA_WITH_AES_256_CBC_SHA (0x0039)
TLS_DHE_DSS_WITH_AES_256_CBC_SHA (0x0038)
TLS_DHE_RSA_WITH_CAMELLIA_256_CBC_SHA (0x0088)
TLS_DHE_DSS_WITH_CAMELLIA_256_CBC_SHA (0x0087)
TLS_DHE_DSS_WITH_AES_128_GCM_SHA256 (0x00a2)
TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 (0x009e)
TLS_DHE_RSA_WITH_AES_128_CBC_SHA256 (0x0067)
TLS_DHE_DSS_WITH_AES_128_CBC_SHA256 (0x0040)
TLS_DHE_RSA_WITH_AES_128_CBC_SHA (0x0033)
TLS_DHE_DSS_WITH_AES_128_CBC_SHA (0x0032)
TLS_DHE_RSA_WITH_CAMELLIA_128_CBC_SHA (0x0045)
TLS_DHE_DSS_WITH_CAMELLIA_128_CBC_SHA (0x0044)
TLS_DHE_RSA_WITH_3DES_EDE_CBC_SHA (0x0016)
TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA (0x0013)
TLS_EMPTY_RENEGOTIATION_INFO_SCSV (0x00ff)

A. A. Klaf Calculus Refresher errata

In going through A. Albert Klaf’s Calculus Refresher, republished by Dover, I came across a mistake in Appendix A. The answer to question 7 on page 88 is incorrect. The question is:

7. What are the most economical dimensions of a right circular cylindrical tank made of steel of uniform thickness and of fixed volume = 6,000 cu. ft.?

And the answer given in Appendix A, p377 is:

7. r = h = 12.41 ft

However, that would give:

V = 6004.3392
dA/dr = 78.0307

I believe the correct answer is:

r = 9.8475
h = 19.6949
giving
A = 1827.8966

If you notice any other errors in this book, let me know in the comments.

PMD XPathRule

Getting this exception when creating a custom XPathRule in a PMD ruleset file?

Oct 11, 2016 12:37:05 PM net.sourceforge.pmd.PMD removeBrokenRules
WARNING: Removed misconfigured rule: OldHadoopPackageImport cause: Missing xPath expression

Make sure your rule definition includes the property element and the value element inside it. For example:

    <rule name="OldHadoopPackageImport"
          message="Avoid importing old Hadoop mapred package, use mapreduce package instead"
          language="java"
          class="net.sourceforge.pmd.lang.rule.XPathRule">
        <properties>
            <property name="xpath" description="XPath expression">
                <value>
                    //ImportDeclaration[Name[contains(@Image, 'org.apache.hadoop.mapred')]]
                </value>
            </property>
        </properties>
    </rule>

Avro Schemas in Multiple Files

Please don’t follow the advice given in the InfoQ article on building an Avro schema up from multiple files. The article recommends doing string replacement to mutate the schemas in order to combine them. The article was written in 2011, and clearly there have been some improvements to Avro since then.

A better (I’m not sure if it’s the best) way to do this, assuming you don’t want to or can’t use .avdl files, is to parse your various files into the same Schema.Parser object. It will give you a map of type name to Schema object:

        List<String> schemaResourceNames = Arrays.asList("avro/foo.avsc", "avro/bar.avsc");

        Schema.Parser parser = new Schema.Parser();
        for (String schemaResourceName : schemaResourceNames) {
            try (InputStream schemaInputStream = classLoader.getResourceAsStream(schemaResourceName)) {
                if (schemaInputStream == null) {
                    throw new RuntimeException("Resource not found " + schemaResourceName);
                }
                parser.parse(schemaInputStream);
            }
        }
        return parser.getTypes();