Matching Chinese characters in InSpec http tests

I test our infrastructure with InSpec and one of the locations where we operate our product happens to be China. For this market a Chinese branding was required. Unfortunately, InSpec, or rather ruby, does not work well with what I wanted to do:

control 'CN-PROD-DEALER-AUTH' do
  dealer_auth_base_url = "https://accounts.example.cn/auth"
  dealer_auth_realm_name = "dealers"
  dealer_auth_client_id = "dealer-client"
  dealer_auth_redirect_uri = "https://cn1-prod.example.cn"
  dealer_auth_scopes = "openid%20profile%20username%20email"

  describe http("#{dealer_auth_base_url}/realms/#{dealer_auth_realm_name}/protocol/openid-connect/auth?client_id=#{dealer_auth_client_id}&redirect_uri=#{dealer_auth_redirect_uri}&response_type=code&scope=#{dealer_auth_scopes}", method: 'GET') do
    its('status') { should eq 200 }
    its('body') { should match /My Product Name 随星畅驭/ }
  end
end

Returns a stacktrace with the following error:

/opt/inspec/embedded/lib/ruby/gems/2.5.0/gems/pry-0.12.2/lib/pry/history.rb:131:in `write': "\xE9" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)

According to a stackoverflow post Net::HTTP tags the data as binary/ASCII-8BIT, i.e. the data has no encoding, and leaves it to you to figure out how to interpret the data, e.g.:

server_encoding = "ISO-8859-1"
resp = Net::HTTP.get_response('www.telize.com',"/geoip/190.88.39.27")
json = resp.body.force_encoding(server_encoding).encode("UTF-8")
puts json

Even telling InSpec to use a specific encoding yields the same conversion error:

 its('body.force_encoding(Encoding::UTF_8)') { should match /My Product Name 随星畅驭/ }

Apparently you can control the type of a regex by appending a modifier. The ruby documentation says:

Regular expressions are assumed to use the source encoding. This can be overridden with one of the following modifiers.

  • /pat/u - UTF-8
  • /pat/e - EUC-JP
  • /pat/s - Windows-31J
  • /pat/n - ASCII-8BIT

A regexp can be matched against a string when they either share an encoding, or the regexp’s encoding is US-ASCII and the string’s encoding is ASCII-compatible.

If a match between incompatible encodings is attempted anEncoding::CompatibilityError exception is raised.

But even that did not solve my issue. After browsing countless GitHub issues I came up with a working solution:

control 'CN-PROD-DEALER-AUTH' do
  dealer_auth_base_url = "https://accounts.example.cn/auth"
  dealer_auth_realm_name = "dealers"
  dealer_auth_client_id = "dealer-client"
  dealer_auth_redirect_uri = "https://cn1-prod.example.cn"
  dealer_auth_scopes = "openid%20profile%20username%20email"

  response = http("#{dealer_auth_base_url}/realms/#{dealer_auth_realm_name}/protocol/openid-connect/auth?client_id=#{dealer_auth_client_id}&redirect_uri=#{dealer_auth_redirect_uri}&response_type=code&scope=#{dealer_auth_scopes}")

  describe "Keycloak dealer login page" do
    after(:each) do |example|
      if example.exception
        @@previous_failed = true
      else
        @@previous_failed = false
      end
    end

    it "Status should cmp == 200" do
      expect(response.status).to cmp 200
    end

    it "Body should include Chinese Product Suffix" do
      fail "Skipping body validation due to status != 200" if @@previous_failed
      # use expect syntax because the force encoding method cannot yet be used in the matcher
      # https://github.com/inspec/inspec/issues/2256
      expect(response.body.force_encoding(Encoding::UTF_8)).to match(/My Product Name 随星畅驭/)
    end
  end

end