Today I Learned

3 posts by adamkerr

Record File Handle Usage in OSX

lsof is a helpful tool for looking at what files a process currently has open, however sometimes a process may only access a file for a second and lsof may miss the moment.

For OSX we also have Instruments. This is included with XCode and is pretty straight forward to use:

  • Open Instruments
  • Select File Activity
  • Select the process
  • Hit Record
  • Perform your action
  • Stop Recording

You can also save the log for later analysis.

ZDT Column Rename in a Distributed System

In order to deploy code to a highly available distributed system any two sequential versions of the code can be running at the same time. Therefore they need to be compatible.

  1. Add the new column, keep the columns in sync when updating.
  2. Migrate the data, start using the new column however fallback to the old column if the new column is blank, continue keeping the columns in sync.
  3. Remove all dependencies on the old column, only use the new column, do not sync them anymore.
  4. Drop the column.

When in Rails, Step #3 requires some special care as the column needs to be marked for removal:

module MarkColumnsForRemoval
  def mark_columns_for_removal(*columns_marked_for_removal)
    @columns_marked_for_removal = columns_marked_for_removal.map(&:to_s)
  end

  ##
  # Overrides ActiveRecord's list of the database columns in order to hide a column which we intend to delete
  # This ensures that ActiveRecord does not try to read or write to the column
  #
  def columns
    cols = super
    cols.reject { |col| (@columns_marked_for_removal || []).include?(col.name.to_s) }
  end
end

class SomeModel < ActiveRecord::Base
  # Remove this as part of step 4 when dropping the old_column
  extend MarkColumnsForRemoval
  mark_columns_for_removal :old_column
end

Loading Data into ELK

Scenario

I want to load data in to Elasticsearch

Solution

Modify this script.

require "elasticsearch"
require "typhoeus"
require "typhoeus/adapters/faraday"

client = Elasticsearch::Client.new(host: "localhost:9200")

scope = BackgroundTask
  .where("created_at > '2015-01-01'")
  .where("created_at < '2016-01-01'")

count_so_far = 0

puts "Processing #{scope.count} records"

scope
  .find_in_batches do |tasks|

  puts "#{count_so_far} of #{scope.count}"
  count_so_far += tasks.count

  task_array = tasks.map do |task|
    {
      create: {
        _index: "background_tasks",
        _id: task.id,
        _type: "task",
        data: {
          task_type: task.type,
          created_at: task.created_at,
          waiting_time: task.queued_at - task.created_at,
          queued_time: task.run_at - task.queued_at,
          processing_time: task.completed_at - task.run_at
        },
      }
    }
  end

  client.bulk(body: task_array)
end