Avro Schemas in Multiple Files

Please don’t follow the advice given in the InfoQ article on building an Avro schema up from multiple files. The article recommends doing string replacement to mutate the schemas in order to combine them. The article was written in 2011, and clearly there have been some improvements to Avro since then.

A better (I’m not sure if it’s the best) way to do this, assuming you don’t want to or can’t use .avdl files, is to parse your various files into the same Schema.Parser object. It will give you a map of type name to Schema object:

        List<String> schemaResourceNames = Arrays.asList("avro/foo.avsc", "avro/bar.avsc");

        Schema.Parser parser = new Schema.Parser();
        for (String schemaResourceName : schemaResourceNames) {
            try (InputStream schemaInputStream = classLoader.getResourceAsStream(schemaResourceName)) {
                if (schemaInputStream == null) {
                    throw new RuntimeException("Resource not found " + schemaResourceName);
                }
                parser.parse(schemaInputStream);
            }
        }
        return parser.getTypes();

Docker unable to pull images from Docker Hub registry

In AWS, my Docker-based ElasticBeanstalk apps were repeatedly removing & adding instances. This resulted in many “Adding instance ‘i-465f6382’ to your environment.” type messages. In the docker-events.log the message “Could not reach any registry endpoint” was repeated. I got the same message when running a “sudo docker pull ubuntu” manually on the EC2, even though network connectivity seemed ok.

As it turns out, Docker Hub has deprecated pulls from Docker clients on 1.5 and earlier. See https://blog.docker.com/2015/10/docker-hub-deprecation-1-5/ for more information.

Also, due to a bug in the ElasticBeanstalk console UI, I had to use the EB CLI command “eb config” to trigger an update of the platform.

Corrupted scala-library jar causes build failure

Had a heck of a time figuring out why, when I run “mvn compile” on my project that uses scalatest and scala-maven-plugin I always got:

[ERROR] java.lang.NoClassDefFoundError: scala/Function1
[INFO] 	at java.lang.Class.getDeclaredMethods0(Native Method)
[INFO] 	at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
[INFO] 	at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
[INFO] 	at java.lang.Class.getMethod0(Class.java:3018)
[INFO] 	at java.lang.Class.getMethod(Class.java:1784)
[INFO] 	at scala_maven_executions.MainHelper.runMain(MainHelper.java:155)
[INFO] 	at scala_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)

The cause? A corrupted scala-library JAR in the local Maven repository. Once I deleted it and allowed Maven to re-download it, everything started working.

Implicit conversion of Strings

I recently replaced some code that looked like:

return "" + str;

with:

return str.toString();

I thought I was fixing bad code style. Unfortunately, I forgot how implicit conversion to String worked, and I wasn’t thorough enough. I failed to write a test to check the behavior when str is null. Contrary to my intuition, the two statements are not equivalent. According to the Java specification a null reference is implicitly converted to “null”. My code, on the other hand, throws a NullPointerException. I can’t help but think this is a weakness in the language design. It makes printing null references easier but it causes at least two problems:

  • It hides the invocation of toString(). If you’re interested in a call hierarchy of toString() your IDE will need to be intelligent enough to also show you implicit casts. A call hierarchy of toString() might not be very useful anyway, but if you’re able to restrict the scope in some way, it could be handy. IntelliJ IDEA, unfortunately, fails to show implicit conversion of strings as invocations of toString()… I might suggest a feature to fix that.
  • It hides significant logic from the programmer. The implicit conversion performs something like:
    if (ref == null) {
      return "null";
    } else {
      String result = ref.toString();
      if (result == null) {
        return "null";
      } else {
        return result;
      }
    }
    

Sharing test resources from a Ruby gem

We recently split out all of our Rails model classes into a separate gem: that way multiple apps/engines can all share the models. As a result, all of the Fabricators (using Fabrication gem), our test fixtures, also got moved into the new gem. Since the original Rails app’s specs are using those fabricators, our specs no longer succeeded.

Unfortunately, Rubygems and Bundler do not have the concept of a test artifact that exists in Maven, for example. Therefore, it is not clear how one might share testing resources.

To resolve this, I added the fabricators to the test_files of the gem. In my case, it looks like this in my gemspec:

...
Gem::Specification.new do |s|
  ...
  s.test_files = Dir['spec/fabricators/**/*']
  ...
end

If your gem is a Rails engine, be sure not to include the dummy app’s log folder in the test_files of the gem!

Next, to make the fabricators available to code that uses that gem as a dependency, I added a file called “spec/support/fabrication.rb” (which gets loaded by spec_helper.rb) that looks like:

Fabrication.configure do |config|
  nameofgem_gem_spec = Bundler.rubygems.find_name('nameofgem').first
  config.path_prefix = nameofgem_gem_spec.full_gem_path
end

Voila, Fabrication gem can load the Fabricators from my external models gem! Obviously, this is a one-off solution. It’s definitely not a general means of having test resources. However, it worked for me. Let me know if you have success with this approach, or if you have ideas about how to do it better.

Considerations for MongoDB, Mongoid and Eventual Consistency in General

I wrote this brain-dump back in October of 2013 and am just now publishing it in 2016. As you will see, I was a bit irritated at having been forced to use Mongo in a relational way. I had warned my coworkers, architectural leadership, and the CTO (of the small embedded “startup” I was working for) repeatedly about the dangers of misusing a non-transactional NoSQL system. By the time we started to see real occurrences of non-transactional updates interfering with each other, all the other people on my team were gone or on their way out. I took it to heart that I should stand up for my technical opinions more. I’m finally publishing this post without making edits.

Oh, and a disclaimer: I haven’t used Mongoid or MongoDB since that project, so many things have probably changed. Also, all my projects that use NoSQL since then have used it properly (not relationally).

Don’t downplay the consistency issues you will encounter with MongoDB. You will encounter them at some point, and they will be difficult to resolve. Your data will become inconsistent at some point, and it may be difficult to identify the problem, identify the cause of the problem, and resolve the data. It will also be difficult to determine the proper approach you should take in your code to avoid such inconsistencies. Pay attention to those who advise using a transactional datastore as your primary source of truth to start with, and including NoSQL datastores as solutions to targeted performance problems when they appear later.

Mongoid: don’t try to build up changes in memory and validate all before saving: it’s still a piecemeal save of each individual object. Also, you’ll run into problems where you have multiple copies of the same object in memory, each with different states. Also, it’s impossible to perform a delete in-memory and validate that before actually persisting it. Instead, save each modification in turn and be careful to watch for places where .reload() will be necessary in order to see the previous modifications. Doing this is not easy.

Also see my post at: https://groups.google.com/forum/#!searchin/mongoid/shannon/mongoid/0m3i2pwjh-0/neN0HzZcmccJ

Using a “live” UI that makes a small change (ideally on a single document) for each user interaction is ideal. It’s not honest to present the user with a large edit page with a “save” and “cancel” button, as clicking “save” may result in only part of the changes taking effect. That is, unless you have written a full transaction management system on top of Mongo, in which case you’ll be tackling all the issues of concurrency that database researchers, programmers, and vendors have already addressed in mature, transactionally consistent (SQL) datastores. It’s unlikely you’ll be able to achieve performance, accuracy, and reliability equivalent to the existing transactional databases.

Achieving eventual consistency: eg. how to ensure that both ends of a many-to-many are updated? Or, how to make sure that it doesn’t matter to the reader whether they’re consistent (eg. it ignores anything that disagrees (negative impact on performance)). Ensure that anything that could be considered a denormalization is a lazy action that can be queued & attempted repeatedly??