PKIX path validation failed - Debugging

October 4, 2017, 9:57 am

≫ Next: Liquid Galaxy at ASTC 2017

≪ Previous: Disaster Recovery - Miami to Dallas in one hit

I recently ran into a case working on a application with a PKIX path validation error on a site that had a valid certificate. I was able to solve the issue using OpenSSL to debug.

Typically, the PKIX path validation error arises due to SSL certificate expiry, but I ran into the same error even when the system was configured with a valid certificate. There are two web applications in our scenario, AppX and AppY. AppX uses AppY's authentication mechanism to allow the users to login with same user account. AppX sends a POST request using HttpClient with necessary arguments to SSL enabled AppY and allows the user to login based on the response.

HttpClient httpclient = new DefaultHttpClient();
// ...
HttpPost httppost = new HttpPost("https://app2domain.com/sessions");

try {
    resp = httpclient.execute(httppost);
}
catch (Exception e) {
    throw new Exception("Exception: ", e);
}

Error

The AppX was isolated to new server and it started throwing PKIX path validation failed error while sending requests to AppY.

Exception: javax.net.ssl.SSLHandshakeException: 
sun.security.validator.ValidatorException: PKIX path validation failed: 
java.security.cert.CertPathValidatorException: timestamp check failed

PKIX (Public-Key Infrastructure - X.509) is standard for key based encryption mechanism. The PKIX path related errors come up due to the failure establishing the connection with SSL applications.

Debug

It is good to identify the root cause of the problem since there are few possible reasons for the same error. Let's start debugging...

Check the Certificate status and expiration date in your browser
OpenSSL validation

openssl

$ echo -n | openssl s_client -CApath /etc/ssl/certs/ -connect app2domain.com:443 
...
Start Time: 1482921042
Timeout   : 300 (sec)
Verify return code: 20 (unable to get local issuer certificate)


$ echo -n | openssl s_client -CApath /etc/ssl/certs/ -connect app2domain.com:443 </dev/null | openssl x509 -noout -dates

verify error:num=20:unable to get local issuer certificate
DONE
notBefore=May  4 00:00:00 2013 GMT
notAfter=May 14 23:59:59 2015 GMT

Root Cause

The openssl tool reported certificate details another unused and expired domain. Since this is configured on the same server, it is causing the error in our case. The same scenario happened to the AppX when sending request to AppX. It may have tried to establish connection through the expired certificate. So, the lesson here is that it is necessary to clean up the expired certificates when the connection is established through HttpClient utilities. Also, a specific domain name can be validated by passing the -servername option (for SNI) to the openssl, which in this case reports appYdomain.com has valid certificate.

$ echo -n | openssl s_client -CApath /etc/ssl/certs/ -connect app2domain.com:443 -servername app2domain.com
...
Start Time: 1482920942
Timeout   : 300 (sec)
Verify return code: 0 (ok)

$ echo -n | openssl s_client -CApath /etc/ssl/certs/ -connect app2domain.com:443 -servername app2domain.com </dev/null | openssl x509 -noout -dates
...
verify return:0
DONE
notBefore=Sep 26 11:52:51 2015 GMT
notAfter=Apr  1 12:35:52 2018 GMT

Conclusion

In most cases, the PKIX path validation error comes up when the SSL certificate is expired for the domain name, however, there may be different reasons such as certificate expiry, picking wrong certificate, etc. It is always helpful to debug with the openssl tool to identify the root cause. This specific issue was fixed by removing the unused expired certificate.

↧

Liquid Galaxy at ASTC 2017

October 18, 2017, 9:46 am

≫ Next: Using tail_n_mail after hours

≪ Previous: PKIX path validation failed - Debugging

End Point is pleased to be participating in ASTC 2017, alongside our partners BWC Visual Technology. ASTC, which stands for the Association of Science-Technology Centers, is holding their annual conference at The Tech Museum of Innovation, located in San Jose, CA. We were excited to hear that the conference takes place at The Tech Museum, as we have a Liquid Galaxy set up in the museum!

A 3-screen desktop Liquid Galaxy display will be set up by Liquid Galaxy Engineer Josh Ausborne at Booth 1103. This display will be showcasing content that includes Sketchfab and Unity 3D Models, Cesium content with interactive weather data, 360 panoramic video, Google Earth/Google Streetview content, and engaging presentations about National Parks and National Marine Sanctuaries.

We are very excited to be showcasing our technology with BWC Visual Technology. BWC is a distributor and licensed re-seller of state-of-the-art, interactive exhibit technology for museums and science centers. We have great respect for their team and technologies, and are excited to be showcasing Liquid Galaxy with them.

Liquid Galaxy is currently featured in many science and technology centers around the world. Please view the information featured below to learn more. This handout will be available to all attendees at ASTC, and can be picked up at booth 1103.

↧

Using tail_n_mail after hours

October 23, 2017, 3:30 am

≫ Next: Hot-deploy Java classes and assets in Wildfly 8/9/10

≪ Previous: Liquid Galaxy at ASTC 2017

(Photo of Turtle Island by Edwin Poon)

Someone recently asked me something about tail_n_mail, a program that watches over your log files, scans for certain patterns, and sends out an email if matches are found. It is frequently used to watch over Postgres logs so you can receive an automatic email alert when Bad Things start happening to your database. The questioner wanted to know if it was possible for tail_n_mail to change its behavior based on the time of day - would it be able to do things differently outside of "business hours"? Although tail_n_mail cannot do so directly, a simple solution is to use alternate configuration files - which get swapped by cron - and the INHERIT keyword.

To demonstrate the solution, let's spin up a Postgres 10 instance, route the logs to syslog, setup tail_n_mail, and then create separate configuration files for different times of the week. First, some setup:

$ initdb --version
initdb (PostgreSQL) 10.0$ initdb --data-checksums data
$ cat >> data/postgresql.conf << EOT
log_line_prefix=''
log_destination='syslog'
EOT
$ echo 'local0.*  /var/log/postgres.log' | sudo tee -a /etc/rsyslog.conf > /dev/null
$ sudo systemctl restart rsyslog
$ pg_ctl start -D data -l logfile

Grab the latest version of tail_n_mail and verify it:

$ wget --no-verbose https://bucardo.org/downloads/tail_n_mail{,.asc} 
2017-03-03 10:00:33 URL:https://bucardo.org/downloads/tail_n_mail [98767/98767] -> "tail_n_mail" [1]
2017-03-03 10:00:33 URL:https://bucardo.org/downloads/tail_n_mail.asc [163/163] -> "tail_n_mail.asc" [1]
FINISHED --2017-03-03 10:00:33--
Total wall clock time: 0.3s
Downloaded: 2 files, 96K in 0.1s (702 KB/s)
$ gpg --verify tail_n_mail.asc
gpg: assuming signed data in `tail_n_mail'
gpg: Signature made Sun 01 Oct 2017 11:14:07 AM EDT using DSA key ID 14964AC8
gpg: Good signature from "Greg Sabino Mullane "
gpg:                 aka "Greg Sabino Mullane (End Point Corporation) "
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 2529 DF6A B8F7 9407 E944  45B4 BC9B 9067 1496 4AC8

The main way to configure tail_n_mail is through its configuration file, which is always the first argument given to the program. This file describes where the log files are, what to look for, and a few other important items. In addition, it automatically updates itself each time tail_n_mail is run to keep track of where the last run left of, so the next run can start at the exact same file, and the correct place within that file. In this example, let's assume the DBA wants to get email for every error that pops up in the database (in practice, this means any severity levels that are ERROR, FATAL, or PANIC). The configuration file would look like this:

$ cat > tnm.conf << EOT
FILE: /var/log/postgres.log
PGLOG: syslog
EMAIL: greg@example.com
MAILSUBJECT: HOST Postgres errors NUMBER
INCLUDE: PANIC:
INCLUDE: FATAL:
INCLUDE: ERROR:
## Okay, we don't want to get emailed on *every* error:
EXCLUDE: could not serialize access due to concurrent update
EXCLUDE: canceling statement due to user request
EOT

To test it out, we will generate some errors, and then run tail_n_mail from the command line. If all goes well, it sends out an email and then rewrites the configuration file to indicate how far along it got. The --dry-run option can be used to view the email without actually sending it.

$ for i in 2 4 6 8; do psql -tc "select $i/0"; done
ERROR:  division by zero
ERROR:  division by zero
ERROR:  division by zero
ERROR:  division by zero

$ perl tail_n_mail tnm.conf --dry-run
Subject: localhost.localdomain Postgres errors 4
Auto-Submitted: auto-generated
Precedence: bulk
X-TNM-VERSION: 1.30.0
To: greg@example.com

Date: Tue Oct  3 03:19:22 2017 EDT
Host: localhost.localdomain
Unique items: 1
Matches from /var/log/postgres.log: 4

[1] (between lines 139 and 142, occurs 4 times)
First: Oct   3 03:19:00 localhost postgres[28483]: [6-1]
Last:  Oct   3 03:19:00 localhost postgres[28495]: [6-1]
ERROR: division by zero
STATEMENT: select ?/0
-
ERROR: division by zero
STATEMENT: select 2/0

  DRYRUN: /usr/sbin/sendmail 'greg@example.com'< tnmBWaG6QA1.tnm2

Running it in normal mode rewrites the configuration file:

$ perl tail_n_mail tnm.conf
$ cat tnm.conf
## Config file for the tail_n_mail program
## This file is automatically updated
## Last updated: Mon Oct  2 12:09:29 2017
PGLOG: syslog
EMAIL: greg@example.com
MAILSUBJECT: HOST Postgres errors NUMBER

INCLUDE: PANIC:
INCLUDE: FATAL:
INCLUDE: ERROR:
## Okay, we don't want to get emailed on *every* error:
EXCLUDE: could not serialize access due to concurrent update
EXCLUDE: canceling statement due to user request

FILE1: /var/log/postgres.log
LASTFILE1: /var/log/postgres.log
OFFSET1: 333

Note how the file was rewritten to include state information about the files we are tracking, but leaves the exclusion rules and their comments in place. Tail_n_mail also attempts to "flatten" similar queries, which is why the four division-by-zero errors all appear as "SELECT ?/0". A sample of one of the literal errors appears below the normalized version.

You are not limited to a single configuration file, however, as the main config file can read in other configuration files via the INHERITS keyword. This allows you to import one or more other configuration files. Not only does this allow different tail_n_mail invocations to share common items to search for, but (as you will see in a bit) can solve the problem at the top of this post: how to change what is being looked for based on the time of day.

Using INHERITS also allows us to store files in version control, without worrying about them getting rewritten on each invocation, as we can store the ephemeral data in one file, and the constant data in a separate, version controlled file. Let's apply that idea to our example:

$ cat > tnm.global.conf << EOT
PGLOG: syslog
EMAIL: greg@example.com
INCLUDE: PANIC:
INCLUDE: FATAL:
INCLUDE: ERROR:
## Okay, we don't want to get emailed on *every* error:
EXCLUDE: could not serialize access due to concurrent update
EXCLUDE: canceling statement due to user request
EOT
$ git add tnm.global.conf && git commit tnm.global.conf \
  -m "Global config for tail_n_mail"[master 2441df8] Global config for tail_n_mail
 1 file changed, 7 insertions(+)
 create mode 100644 tnm.global.conf$ cat > tnm.conf << EOT
FILE: /var/log/postgres.log
MAILSUBJECT: HOST Postgres errors NUMBER
INHERIT: tnm.global.conf
EOT

After another run, we observe that the inherited file does not change:

$ perl tail_n_mail tnm.conf
$ git status
On branch master
nothing to commit, working tree clean$ cat tnm.conf

## Config file for the tail_n_mail program
## This file is automatically updated
## Last updated: Mon Oct  2 12:28:10 2017
MAILSUBJECT: HOST Postgres errors NUMBER

INHERIT: tnm.global.conf

FILE1: /var/log/postgres.log
LASTFILE1: /var/log/postgres.log
OFFSET1: 13219

Another advantage to moving common items to another file is that we can run multiple tail_n_mails, with slightly different purposes, but all sharing some of the same underlying rules. A common usage is to get an immediate email about almost all database problems, as well as a daily report about all problems. To do this, we create two configuration files and set them up in cron:

$ cp tnm.conf tnm.fatals.conf
$ mv tnm.conf tnm.errors.conf
$ perl -pi -e 's/Postgres errors/Postgres fatals/' tnm.fatals.conf
$ crontab -e
## Every five minutes, check for important problems
*/5 * * * * perl tail_n_mail tnm.fatals.conf
## Once every morning, generate a report of all errors in the last 24 hours.
30 6 * * * perl tail_n_mail tnm.errors.conf
## Note: it is usually easier to have separate "fatals" and "errors" exclusions.

What if we want to change the rules depending on the time of day, per the question that started this article? Simple enough - we just create two "inherited" configuration files, then have cron swap things around as needed. For example, let's say that after 5pm on weekdays, and all weekend, we do not want to receive emails about "division by zero" errors. First, create files named tnm.global.hometime.conf and tnm.global.workday.conf:

$ cp tnm.global.conf tnm.global.workday.conf
$ cp tnm.global.conf tnm.global.hometime.conf
$ ln -sf tnm.global.workday.conf tnm.global.conf
$ echo "EXCLUDE: ERROR:  division by zero">> tnm.global.hometime.conf

Finally, have cron swap the files around at the start and end of business hours:

$ crontab -e
## May need to use 1-5 instead of Mon-Fri on some systems
0 9 * * Mon-Fri ln -sf tnm.global.workday.conf tnm.global.conf
0 17 * * Mon-Fri ln -sf tnm.global.hometime.conf tnm.global.conf

Voila! We've changed the way tail_n_mail runs depending on the time of day. There are many other tricks you can do with tail_n_mail - check out the documentation or post to the mailing list for more help and/or inspiration.

↧

Hot-deploy Java classes and assets in Wildfly 8/9/10

October 27, 2017, 4:00 pm

≫ Next: A Collaborative Timezone Utility

≪ Previous: Using tail_n_mail after hours

Introduction

Java development can be really frustrating when you need to re-build your project and restart a server every time you change something. I know about JRebel, but while it’s a good tool, it’s also pretty expensive. You can use the open-source version, but then you need to send project statistics to the JRebel server, which is not a viable option for your more serious projects.

Fortunately, there is an open-source project called HotSwapAgent and it does the same thing as JRebel, for free (thank you, guys!).

I will explain how to combine it with Widlfly in order to hot-deploy Java classes as well as how to hot-deploy other resources (Javascript, CSS, images).

Wildfly configuration

Let’s assume that we use the standalone-full.xml configuration file.

We need to use exploded deployment instead of deploying WAR or EAR. You can do this in production as well to allow for application changes with zero downtime.

Start by configuring the metaspace size; we had to increase defaults for our application, but it’s possible that it will be just fine in your case. It’s encouraged that you play with these values after completing all steps.

In:

WILDFLY_DIR/bin/standalone.conf

set:

-XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=512m

so it looks like this:

JAVA_OPTS="-Xms512m -Xmx1024m -XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=512m".

Now, look for the deployment-scanner node in:

WILDFLY_DIR/standalone/configuration/standalone-full.xml

Replace it so it looks like this:

<deployment-scanner path="PATH_TO_DEPLOYMENT_DIR" relative-to="RELATIVE_TO_PATH" scan-enabled="true" scan-interval="2000" auto-deploy-exploded="false" runtime-failure-causes-rollback="${jboss.deployment.scanner.rollback.on.failure:false}"/>

Note:

PATH_TO_DEPLOYMENT_DIR is WILDFLY_DIR/standalone/deployments

RELATIVE_TO_PATH is, as the name suggests the dir that the PATH_TO_DEPLOYMENT_DIR is relative to.

HotSwapAgent installation and configuration

We need to download and install the latest release of DCEVM Java patch from here: https://github.com/dcevm/dcevm/releases . Why it’s needed? It will allow us unlimited redefinition of loaded classes at runtime. This is not possible with the original Java HotSpot VM. Make sure you update to the same Java version that you’re going to use to run the Wildfly server.

Now, download the latest release of the Hotswap agent from here:

https://github.com/HotswapProjects/HotswapAgent/releases .

The only thing that you need to do is get the JAR and put it anywhere on your hard drive (I recommend to add it to your Java project).

Ok, great, now just some configuration.

Open:

WILDFLY_DIR/bin/standalone.conf

and add new Java opts:

-XXaltjvm=dcevm -javaagent:PATH_TO_HOTSWAPAGENT_JAR.

What does this do?

The altjvm option sets an alternative Java Virtual Machine.
The javagent is just an interceptor on the top of your classes that allows the HotSwapAgent library to manipulate your code on the fly.

That’s all you need. It’s a good idea to create a configuration file for the HotSwapAgent. This is well explained here:

http://hotswapagent.org/mydoc_configuration.html

Basically create a new file, name it hotswap-agent.properties, set all needed configuration inside and add it to the classpath of the application.

If you use Netbeans, Eclipse or Intellij you should check the HotSwapAgent page for some helpful plugins here: http://hotswapagent.org/mydoc_setup_intellij_idea.html , http://hotswapagent.org/mydoc_setup_eclipse.html and http://hotswapagent.org/mydoc_setup_netbeans.html .

Application configuration

Now that we have everything in place, I will explain how to put it all together.

I doesn’t really matter which build-tool you use (Ant, Gradle or Maven). The process should look like this (you can do it in many ways, in our case, it’s pretty specific as our build process is really complicated):

Build your application and deploy it to the PATH_TO_DEPLOYMENT_DIR in the exploded version,
Create a script that will look for changes in the application directory (this one is interesting: https://gist.github.com/wernight/11401031 ),
On change, trigger a job that will:
1. Copy all resources like JSP, JavaScript, CSS and copy to the PATH_TO_DEPLOYMENT_DIR,
2. Compile classes and copy them to the PATH_TO_DEPLOYMENT_DIR.

That’s it, after you replace files in the PATH_TO_DEPLOYMENT_DIR HotSwapAgent and Wildfly will do the rest really fast. We have a ton of assets and classes and the whole process takes only a few seconds!

Summary

I feel this process is really worth doing. It doesn’t take a lot of time to configure everything and saves a lot of manual work. Just multiply the number of manual deployments and the number of developers in your team and you understand how much time you lose everyday without hot-deployment.

From now on, focus on development, forget about deployment!

Lastly, good luck!

↧

A Collaborative Timezone Utility

October 30, 2017, 7:37 pm

≫ Next: Series Digital joins End Point!

≪ Previous: Hot-deploy Java classes and assets in Wildfly 8/9/10

Try It Out Yourself

The code for this project is hosted on Github and can be cloned from here.

At End Point Corporation, our team is spread out across 10 time zones. This gives us the advantage of being able to work around the clock on projects. When one co-worker leaves for day, another can take over. Consider this scenario. It's Monday evening and Martin needs to continue installing software on that Linux cluster, but it's already 6pm and his wife is going to murder him if he's not ready to go out for their anniversary dinner. Let's see who can take over... Ah, yes, Sanjay in Bangalore can continue with the maintenance. Tuesday morning, the client wakes up to be surprised that 16 hours of work was completed in a day. With respect to software development, the same efficiencies can be realized by parallelizing tasks across time-zones. Code reviews and further development can be continued after normal business hours.

With all the blessings of a distributed engineering team, collaborating with co-workers can be, occasionally, challenging. Some of these challenges stem from complexities of our system of time. Every co-worker may be operating in a timezone that is different than yours. Time-zones have an associated offset relative to Coordinated Universal Time (UTC). These offsets are usually in whole hour increments but they may be any real-valued number.

For example, Eastern Standard Time (EST) has an offset of -5 (five hours behind UTC) and Indian Standard Time (IST) has an offset of 5.5 (five and half hours ahead of UTC). Furthermore, these UTC offsets can be completely arbitrary. In 1995, Kiribati, an island nation in the Pacific, changed its UTC offset from -10 to +14 so that all of its outlying islands can share the same time. To further complicate things, some regions may not observe daylight savings time (DST) while other regions do. In fact, in the United States, Indiana started observing DST on April 2, 2006. Some states like Arizona and Hawaii do not observe DST. Other countries, like Australia, have a similar situation where it's left to local governments to decide whether DST is observed or not. Moreover, although DST usually accounts for adding or subtracting an hour of time, it isn't always one hour. This has historically changed from time to time.

Now you may begin to imagine the headaches that arise when you need to coordinate with anything involving multiple time-zones. To make all of this easier, you can use a utility that we wrote to do all the time conversions for you. First, you have to add each co-worker's information to a configuration file stored at ~/.timezoner. This configuration file will describe all of your co-worker's contact information and their associated IANA time-zone. As an example, this is what the configuration file looks like:

# Timezone            Email                  Name              OfficePhone        MobilePhone
America/New_York      "edward@example.com""Edward Teach ""n/a""+1 731 555 1234"
America/New_York      "henry@dexample.com""Henry Morgan""+1 646 555 5678""+1 954 555 5678"
America/New_York      "john@example.com""John Auger""n/a""+1 902 555 1234"
America/Denver        "sam@example.com""Samuel Bellamy""+1 347 535 1234""+1 994 555 5678"
America/Los_Angeles   "william@example.com""William Kidd""+1 330 555 5678""+1 305 555 1234"
America/Los_Angeles   "israel@example.com""Israel Hands""+1 507 555 1234""+1 208 555 5678"

Now when I need to coordinate a meeting, I can run the utility with the -T option to see each team member's local time.

With the -U option, you can display each contact separated in groups based on UTC offset.

Let us know what you think and if you found this tool helpful.

↧

Series Digital joins End Point!

November 11, 2017, 12:37 pm

≫ Next: Amazon AWS upgrades to Postgres with Bucardo

≪ Previous: A Collaborative Timezone Utility

End Point has the pleasure to announce some very big news!

After an amicable wooing period, End Point has purchased the software consulting company Series Digital, a NYC-based firm that designs and builds custom software solutions. Over the past decade, Series Digital has automated business processes, brought new ideas to market, and built large-scale dynamic infrastructure.

Series Digital launched in 2006 in New York City. From the start, Series Digital managed large database installations for financial services clients such as Goldman Sachs, Merrill Lynch, and Citigroup. They also worked with startups including Drop.io, Byte, Mode Analytics, Domino, and Brewster.

These growth-focused, data-intensive businesses benefited from Series Digital’s expertise in scalable infrastructure, project management, and information security. Today, Series Digital supports clients across many major industry sectors and has focused its development efforts on the Microsoft .NET ecosystem. They have strong design and user experience expertise. Their client list is global.

The Series Digital team began working at End Point on April 3rd, 2017.

The CEO of Series Digital is Jonathan Blessing. He joins End Point’s leadership team as Director of Client Engagements. End Point has had a relationship with Jonathan since 2010, and looks forward with great anticipation to the role he will play expanding End Point’s consulting business.

To help support End Point’s expansion into .NET solutions, End Point has hired Dan Briones, a 25-year veteran of IT infrastructure engineering, to serve as Project and Team Manager for the Series Digital group. Dan started working with End Point at the end of March.

The End Point leadership team is very excited by the addition of Dan, Jonathan, and the rest of the talented Series Digital team: Jon Allen, Ed Huott, Dylan Wooters, Vasile Laur, Liz Flyntz, Andrew Grosser, William Yeack, and Ian Neilsen.

End Point’s reputation has been built upon its excellence in e-commerce, managed infrastructure, and database support. We are excited by the addition of Series Digital, which both deepens those abilities, and allows us to offer new services.

Talk to us to hear about the new ways we can help you!

↧

Amazon AWS upgrades to Postgres with Bucardo

November 11, 2017, 12:37 pm

≫ Next: Successful First GEOINT Symposium for End Point Liquid Galaxy

≪ Previous: Series Digital joins End Point!

(Bird-chasing photo by Doug Waldron)

Many of our clients at End Point are using the incredible Amazon Relational Database Service (RDS), which allows for quick setup and use of a database system. Despite minimizing many database administration tasks, some issues still exist, one of which is upgrading. Getting to a new version of Postgres is simple enough with RDS, but we've had clients use Bucardo to do the upgrade, rather than Amazon's built-in upgrade process. Some of you may be exclaiming "A trigger-based replication system just to upgrade?!"; while using it may seem unintuitive, there are some very good reasons to use Bucardo for your RDS upgrade:

Minimize application downtime

Many businesses are very sensitive to any database downtime, and upgrading your database to a new version always incurs that cost. Although RDS uses the ultra-fast pg_upgrade --links method, the whole upgrade process can take quite a while - or at least too long for the business to accept. Bucardo can reduce the application downtime from around seven minutes to ten seconds or less.

Upgrade more than one version at once

As of this writing (June 2017), RDS only allows upgrading of one major Postgres version at a time. Since pg_upgrade can easily handle upgrading older versions, this limitation will probably be fixed someday. Still, it means even more application downtime - to the tune of seven minutes for each major version. If you are going from 9.3 to 9.6 (via 9.4 and 9.5), that's at least 21 minutes of application downtime, with many unnecessary steps along the way. The total time for Bucardo to jump from 9.3 to 9.6 (or any major version to another one) is still under ten seconds.

Application testing with live data

The Bucardo upgrade process involves setting up a second RDS instance running the newer version, copying the data from the current RDS server, and then letting Bucardo replicate the changes as they come in. With this system, you can have two "live" databases you can point your applications to. With RDS, you must create a snapshot of your current RDS, upgrade that, and then point your application to the new (and frozen-in-time) database. Although this is still useful for testing your application against the newer version of the database, it is not as useful as having an automatically-updated version of the database.

Control and easy rollback

With Bucardo, the initial setup costs, and the overhead of using triggers on your production database, is balanced a bit by ensuring you have complete control over the upgrade process. The migration can happen when you want, at a pace you want, and can even happen in stages as you point some of the applications in your stack to the new version, while keeping some pointed at the old. And rolling back is as simple as pointing apps back at the older version. You could even set up Bucardo as "master-master", such that both new and old versions can write data at the same time (although this step is rarely necessary).

Database bloat removal

Although the pg_upgrade program that Amazon RDS uses for upgrading is extraordinarily fast and efficient, the data files are seldom, if ever, changed at all, and table and index bloat is never removed. On the other hand, an upgrade system using Bucardo creates the tables from scratch on the new database, and thus completely removes all historical bloat. (Indeed, one time a client thought something had gone wrong, as the new version's total database size had shrunk radically - but it was simply removal of all table bloat!).

Statistics remain in place

The pg_upgrade program currently has a glaring flaw - no copying of the information in the pg_statistic table. Which means that although an Amazon RDS upgrade completes in about seven minutes, the performance will range somewhere from slightly slow to completely unusable, until all those statistics are regenerated on the new version via the ANALYZE command. How long this can take depends on a number of factors, but in general, the larger your database, the longer it will take - a database-wide analyze can take hours on very large databases. As mentioned above, upgrading via Bucardo relies on COPYing the data to a fresh copy of the table. Although the statistics also need to be created when using Bucardo, the time cost for this does NOT apply to the upgrade time, as it can be done any time earlier, making the effective cost of generating statistics zero.

Upgrading RDS the Amazon way

Having said all that, the native upgrade system for RDS is very simple and fast. If the drawbacks above do not apply to you - or can be suffered with minimal business pain - then this way should always be the upgrade approach to use. Here is a quick walk through of how an Amazon RDS upgrade is done.

For this example, we will create a new Amazon RDS instance. The creation is amazingly simple: just log into aws.amazon.com, choose RDS, choose PostgreSQL (always the best choice!), and then fill in a few details, such as preferred version, server size, etc. The "DB Engine Version" was set as PostgreSQL 9.3.16-R1", the "DB Instance Class" as db.t2.small -- 1 vCPU, 2 GiB RAM, and "Multi-AZ Deployment" as no. All other choices are the default. To finish up this section of the setup, "DB Instance Identifier" was set to gregtest, the"Master Username" to greg, and the "Master Password" to b5fc93f818a3a8065c3b25b5e45fec19

Clicking on "Next Step" brings up more options, but the only one that needs to change is to specify the "Database Name" as gtest. Finally, the "Launch DB Instance" button. The new database is on the way! Select "View your DB Instance" and then keep reloading until the "Status" changes to Active.

Once the instance is running, you will be shown a connection string that looks like this: gregtest.zqsvirfhzvg.us-east-1.rds.amazonaws.com:5432. That standard port is not a problem, but who wants to ever type that hostname out, or even have to look at it? The pg_service.conf file comes to the rescue with this new entry inside the ~/.pg_service.conf file:

[gtest]
host=gregtest.zqsvirfhzvg.us-east-1.rds.amazonaws.com
port=5432
dbname=gtest
user=greg
password=b5fc93f818a3a8065c3b25b5e45fec19
connect_timeout=10

Now we run a quick test to make sure psql is able to connect, and that the database is an Amazon RDS database:

$ psql service=gtest -Atc "show rds.superuser_variables"
session_replication_role

We want to use the pgbench program to add a little content to the database, just to give the upgrade process something to do. Unfortunately, we cannot simply feed the "service=gtest" line to the pgbench program, but a little environment variable craftiness gets the job done:

$ unset PGSERVICEFILE PGSERVICE PGHOST PGPORT PGUSER PGDATABASE
$ export PGSERVICEFILE=/home/greg/.pg_service.conf PGSERVICE=gtest
$ pgbench -i -s 4
NOTICE:  table "pgbench_history" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_branches" does not exist, skipping
creating tables...
100000 of 400000 tuples (25%) done (elapsed 0.66 s, remaining 0.72 s)
200000 of 400000 tuples (50%) done (elapsed 1.69 s, remaining 0.78 s)
300000 of 400000 tuples (75%) done (elapsed 4.83 s, remaining 0.68 s)
400000 of 400000 tuples (100%) done (elapsed 7.84 s, remaining 0.00 s)
vacuum...
set primary keys...
done.

At 68MB in size, this is still not a big database - so let's create a large table, then create a bunch of databases, to make pg_upgrade work a little harder:

## Make the whole database 1707 MB:
$ psql service=gtest -c "CREATE TABLE extra AS SELECT * FROM pgbench_accounts"
SELECT 400000
$ for i in {1..5}; do psql service=gtest -qc "INSERT INTO extra SELECT * FROM extra"; done

## Make the whole cluster about 17 GB:
$ for i in {1..9}; do psql service=gtest -qc "CREATE DATABASE gtest$i TEMPLATE gtest" ; done
$ psql service=gtest -c "SELECT pg_size_pretty(sum(pg_database_size(oid))) FROM pg_database WHERE datname ~ 'gtest'"
17 GB

To start the upgrade, we log into the AWS console, and choose "Instance Actions", then "Modify". Our only choices for instances are"9.4.9" and "9.4.11", plus some older revisions in the 9.3 branch. Why anything other than the latest revision in the next major branch (i.e. 9.4.11) is shown, I have no idea! Choose 9.4.11, scroll down to the bottom, choose "Apply Immediately", then "Continue", then "Modify DB Instance". The upgrade has begun!

How long will it take? All one can do is keep refreshing to see when your new database is ready. As mentioned above, 7 minutes and 30 seconds is the total time. The logs show how things break down:

11:52:43 DB instance shutdown
11:55:06 Backing up DB instance
11:56:12 DB instance shutdown
11:58:42 The parameter max_wal_senders was set to a value incompatible with replication. It has been adjusted from 5 to 10.
11:59:56 DB instance restarted
12:00:18 Updated to use DBParameterGroup default.postgres9.4

How much of that time is spent on upgrading though? Surprisingly little. We can do a quick local test to see how long the same database takes to upgrade from 9.3 to 9.4 using pg_upgrade --links: 20 seconds! Ideally Amazon will improve upon the total downtime at some point.

Upgrading RDS with Bucardo

As an asynchronous, trigger-based replication system, Bucardo is perfect for situations like this where you need to temporarily sync up two concurrent versions of Postgres. The basic process is to create a new Amazon RDS instance of your new Postgres version (e.g. 9.6), install the Bucardo program on a cheap EC2 box, and then have Bucardo replicate from the old Postgres version (e.g. 9.3) to the new one. Once both instances are in sync, just point your application to the new version and shut the old one down. One way to perform the upgrade is detailed below.

Some of the steps are simplified, but the overall process is intact. First, find a temporary box for Bucardo to run on. It doesn't have to be powerful, or have much disk space, but as network connectivity is important, using an EC2 box is recommended. Install Postgres (9.6 or better, because of pg_dump) and Bucardo (latest or HEAD recommended), then put your old and new RDS databases into your pg_service.conf file as "rds93" and "rds96" to keep things simple.

The next step is to make a copy of the database on the new Postgres 9.6 RDS database. We want the bare minimum schema here: no data, no triggers, no indexes, etc. Luckily, this is simple using pg_dump:

$ pg_dump service=rds93 --section=pre-data | psql -q service=rds96

From this point forward, no DDL should be run on the old server. We take a snapshot of the post-data items right away and save it to a file for later:

$ pg_dump service=rds93 --section=post-data -f rds.postdata.pg

Time to get Bucardo ready. Recall that Bucardo can only replicate tables that have a primary key or unique index. But if those tables are small enough, you can simply copy them over at the final point of migration later.

$ bucardo install
$ bucardo add db A dbservice=rds93
$ bucardo add db B dbservice=rds96
## Create a sync and name it 'migrate_rds':
$ bucardo add sync migrate_rds tables=all dbs=A,B

That's it! The current database will now have triggers that are recording any changes made, so we may safely do a bulk copy to the new database. This step might take a very long time, but that's not a problem.

$ pg_dump service=rds93 --section=data | psql -q service=rds96

Before we create the indexes on the new server, we start the Bucardo sync to copy over any rows that were changed while the pg_dump was going on. After that, the indexes, primary keys, and other items can be created:

$ bucardo start
$ tail -f log.bucardo ## Wait until the sync finishes once
$ bucardo stop
$ psql service=rds96 -q -f rds.postdata.pg

For the final migration, we simply stop anything from writing to the 9.3 database, have Bucardo perform a final sync of any changed rows, and then point your application to the 9.6 database. The whole process can happen very quickly: well under a minute for most cases.

Upgrading major Postgres versions is never a trivial task, but both Bucardo and pg_upgrade allow it to be orders of magnitude faster and easier than the old method of using the pg_dump utility. Upgrading your Amazon AWS Postgres instance is fast and easy using the AWS pg_upgrade method, but it has limitations, so having Bucardo help out can be a very useful option.

↧

Successful First GEOINT Symposium for End Point Liquid Galaxy

November 11, 2017, 12:37 pm

≫ Next: Liquid Galaxy at The Ocean Conference

≪ Previous: Amazon AWS upgrades to Postgres with Bucardo

This past week, End Point attended GEOINT Symposium to showcase Liquid Galaxy as an immersive panoramic GIS solution to GEOINT attendees and exhibitors alike.

At the show, we showcased Cesium integrating with ArcGIS and WMS, Google Earth, Street View, Sketchfab, Unity, and panoramic video. Using our Content Management System, we created content around these various features so visitors to our booth could take in the full spectrum of capabilities that the Liquid Galaxy provides.

Additionally, we were able to take data feeds for multiple other booths and display their content during the show! Our work served to show everyone at the conference that the Liquid Galaxy is a data-agnostic immersive platform that can handle any sort of data stream and offer data in a brilliant display. This can be used to show your large complex data sets in briefing rooms, conference rooms, or command centers.

Given the incredible draw of the Liquid Galaxy, the GEOINT team took special interest in our system and formally interviewed Ben Goldstein in front of the system to learn more! You can view the video of the interview here:

We look forward to developing the relationships we created at GEOINT, and hope to participate further in this great community moving forward. If you would like to learn more please visit our website or email ask@endpoint.com.

↧

Liquid Galaxy at The Ocean Conference

November 11, 2017, 12:37 pm

≫ Next: Zero Pricing in Interchange using CommonAdjust

≪ Previous: Successful First GEOINT Symposium for End Point Liquid Galaxy

End Point had the privilege of participating in The Ocean Conference at the United Nations, hosted by the International Union for Conservation of Nature (IUCN), these past two weeks. The health of the oceans is critical, and The Ocean Conference, the first United Nations conference on this issue, presents a unique and invaluable opportunity for the world to reverse the precipitous decline of the health of the oceans and seas with concrete solutions.

A Liquid Galaxy was set up in a prominent area on the main floor of the United Nations. End Point created custom content for the Ocean Conference, using the Liquid Galaxy’s Content Management System. Visiting diplomats and government officials were able to experience this content - Liquid Galaxy’s interactive panoramic setup allows visitors to feel immersed in the different locations, with video and information spanning their periphery.

Liquid Galaxy content created for The Ocean Conference included:

-A study of the Catlin Seaview Survey and how the world's coral reefs are changing

-360 panoramic underwater videos

-All Mission Blue Ocean Hope Spots

-A guided tour of the Monaco Explorations 3 Year Expedition

-National Marine Sanctuaries around the United States

We were grateful to be able to create content for such a good cause, and hope to be able to do more good work for the IUCN and the UN! If you’d like to learn more, please visit our website or email ask@endpoint.com.

From left to right: Dan Thomas, Susan Thomas, Sylvia Earle, Jenifer Austin, David Koch, Trammel Crowe.

↧

Zero Pricing in Interchange using CommonAdjust

November 11, 2017, 3:35 pm

≫ Next: Liquid Galaxy at 2017 BOMA International Annual Conference & Expo

≪ Previous: Liquid Galaxy at The Ocean Conference

Product pricing can be quite complex. A typical Interchange catalog will have at least one table in the ProductFiles directive (often products plus either options or variants) and those tables will often have one or more pricing fields (usually price and sales_price). But usually a single, static price isn't sufficient for more complex needs, such as accessory adjustments, quantity pricing, product grouping--not to mention promotions, sales, or other conditional features that may change a product's price for a given situation, dependent on the user's account or session.

Typically to handle these variety of pricing possibilities, a catalog developer will implement a CommonAdjust algorithm. CommonAdjust can accommodate all the above pricing adjustments and more, and is a powerful tool (yet can become quite arcane when reaching deeper complexity). CommonAdjust is enabled by setting the PriceField directive to a non-existent field value in the tables specified in ProductFiles.

To give an adequate introduction and treatise on CommonAdjust would be at a minimum its own post, and likely a series. There are many elements that make up a CommonAdjust string, and subtle operator nuances that instruct it to operate in differing patterns. It is even possible for elements themselves to return new CommonAdjust atoms (a feature we will be leveraging in this discussion). So I will assume for this writing that the reader is familiar generally with CommonAdjust and we will implement a very simple example to demonstrate henceforth.

To start, let's create a CommonAdjust string that simply replaces the typical PriceField setting, and we'll allow it to accommodate a static sales price:

ProductFiles products
PriceField 0
CommonAdjust :sale_price ;:price

The above, in words, indicates that our products live in the products table, and we want CommonAdjust to handle our pricing by setting PriceField to a non-existent field (0 is a safe bet not to be a valid field in the products table). Our CommonAdjust string is comprised of two atoms, both of which are settors of type database lookup. In the products table, we have 2 fields: sale_price and price. If sale_price is "set" (meaning a non-zero numeric value or another CommonAdjust atom) it will be used as it comes first in the list. The semicolon indicates to Interchange "if a previous atom set a price by this point, we're done with this iteration" and, thus, the price field will be ignored. Otherwise, the next atom is checked (the price field), and as long as the price field is set, it will be used instead.

A few comments here:

The bare colon indicates that the field is not restricted to a particular table. Typically, to specify the field, you would have a value like "products:price" or "variants:price". But part of the power of ProductFiles holding products in different tables is you can pick up a sku from any of them. And at that point, you don't know whether you're looking at a sku from products, variants, or as many additional tables as you'd like to grab products from. But if all of them have a price and sales_price field, then you can pick up the pricing from any of them by leaving the table off. You can think of ":price" as "*:price" where asterisk is "table this sku came from".
The only indicator that CommonAdjust recognizes as a terminal value is a non-zero numeric value. The proposed price is coerced to numeric, added on to the accumulated price effects of the rest of the CommonAdjust string (if applicable), and the final value is tested for truth. If it is false (empty, undef, or 0) then the process repeats.
What happens if none of the atoms produce a non-zero numeric value? If Interchange reaches the end of the original CommonAdjust string without hitting a set atom, it will relent and return a zero cost.

At this point, we finally introduce our situation, and one that is not at all uncommon. What if I want a zero price? Let's say I have a promotion for buy one product, get this other product for free. Typically, a developer would be able to expect to override the prices from the database optionally by leveraging the "mv_price" parameter in the cart. So, let's adjust our CommonAdjust to accommodate that:

CommonAdjust $ ;:sale_price ;:price

The $ settor in the first atom means "look in the line-item hash for the mv_price parameter and use that, if it's set". But as we've discussed above, we "set" an atom by making it a non-zero numeric value or another CommonAdjust atom. So if we set mv_price to 0, we've gained nothing. CommonAdjust will move on to the next atom (sale_price database settor) and pick up that product's pricing from the database. And even if we set that product's sale_price and price fields to 0, it means everyone purchasing that item would get it for free (not just our promotion that allows the item to be free with the specific purchase of another item).

In the specific case of using the $ settor in CommonAdjust, we can set mv_price to the keyword "free", and that will allow us to price the item for 0. But this restricts us to only be able to use $ and mv_price to have a free item. What if the price comes from a complex calculation, out of a usertag settor? Or out of a calc block settor? The special "free" keyword doesn't work there.

Fortunately, there is a rarely used CommonAdjust settor that will allow for a 0 price item in a general solution. As I mentioned above, CommonAdjust calculations can themselves return other CommonAdjust atoms, which will then be operated on in a subsequent iteration. This frees us from just the special handling that works on $ and mv_price as such an atom can be returned from any of the CommonAdjust atoms and work.

The settor of interest is >>, and according to what documentation there is on it, it was never even intended to be used as a pricing settor! Rather, it was to be a way of redirecting to additional modes for shipping or tax calculations, which can also leverage CommonAdjust for their particular purposes. However, the key to its usefulness here is thus: it does not perform any test on the value tied to it. It is set, untested, into the final result of this call to the chain_cost() routine and returned. And with no test, the fact that it's Perly false as numeric 0 is irrelevant.

So building on our current CommonAdjust, let's leverage >> to allow our companion product to have a zero cost (assuming it is the 2nd line item in the cart):

[calcn]
    $Items->[1]{mv_price} = '>>0';
    return;
[/calcn]

Now what happens is, $ in the first atom picks up the value out of mv_price and, because it's a CommonAdjust atom, is processed in a second iteration. But this CommonAdjust atom is very simple: take the value tied to >> and return it, untested.

Perhaps our pricing is more complex than we can (or would like to) support with using $. So we want to write a usertag, where we have the full power of global Perl at our disposal, but we still have circumstances where that usertag may need to return zero-cost items. Using the built-in "free" solution, we're stuck, short of setting mv_price in the item hash within the usertag, which we may not want to do for a variety of reasons. But using >>, we have no such restriction. So let's change CommonAdjust:

CommonAdjust $ ;[my-special-pricing] ;:sale_price ;:price

Now instead of setting mv_price in the item, let's construct [my-special-pricing] to do some heavy lifting:

UserTag my-special-pricing Routine <<EOR
sub {
    # A bunch of conditional, complicated code, but then ...
    elsif (buy_one_get_one_test($item)) {
        # This is where we know this normally priced item is supposed to be
        # free because of our promotion. Excellent!

        return '>>0';
    }
    # remaining code we don't care about for this discussion
}
EOR

Now we haven't slapped a zero cost onto the line item in a sticky fashion, like we do by setting mv_price. So presumably, above, if the user gets sneaky and removes the "buy one" sku identified by our promotion, our equally clever buy_one_get_one_test() sniffs it out, and the 0 price is no longer in effect.

For more information on CommonAdjust, see the Custom Pricing section of 'price' glossary entry. And for more examples of leveraging CommonAdjust for quantity and attribute pricing adjustments, see the Examples section of the CommonAdjust document entry.

↧

Liquid Galaxy at 2017 BOMA International Annual Conference & Expo

November 11, 2017, 3:35 pm

≫ Next: Postgres migrating SQL_ASCII to UTF-8 with fix_latin

≪ Previous: Zero Pricing in Interchange using CommonAdjust

We just returned from Nashville after bringing Liquid Galaxy to the 2017 BOMA International Annual Conference & Expo. Commercial real estate has always been a seamless fit for Liquid Galaxy due to the system's ability to showcase real estate and property data in an interactive and panoramic setting. Within real estate, Liquid Galaxy was first used by CBRE and has since been adopted by Hilton, JLL, and Prologis to name a few.

In preparation for BOMA (Building Owners and Managers Association), we prepared sample commercial real estate content on our content management system to be displayed on the Liquid Galaxy. This included the creation of content about Hudson Yards, the new development being built in lower Manhattan.

The content that was created demonstrates how brokers and directors at commercial real estate companies can tell an immersive, panoramic, and interactive story to their potential clients and investors. Future developments can be shown in developing areas, and with the content management system you can create stories that include videos, images, and 3D models of these future developments. The following video demonstrates this well:

We were able to effectively showcase our ability to incorporate 3D models and mapping layers at BOMA through the use of Google Earth, Cesium, ArcGIS, Unity, and Sketchfab. We were also able to pull data and develop content for neighboring booths and visitors, demonstrating what an easy and data-agnostic platform Liquid Galaxy can be.

We’re very excited about the increasing traction we have in the real estate industry, and hope our involvement with BOMA will take that to the next level. If you’d like to learn more about Liquid Galaxy, please visit our website or email ask@endpoint.com.

↧

Postgres migrating SQL_ASCII to UTF-8 with fix_latin

November 11, 2017, 3:35 pm

≫ Next: co:collective Doable Innovation Software

≪ Previous: Liquid Galaxy at 2017 BOMA International Annual Conference & Expo

(photograph by NOAA National Ocean Service)

Upgrading Postgres is not quite as painful as it used to be, thanks primarily to the pg_upgrade program, but there are times when it simply cannot be used. We recently had an existing End Point client come to us requesting help upgrading from their current Postgres database (version 9.2) to the latest version (9.6 - but soon to be 10). They also wanted to finally move away from their SQL_ASCII encoding to UTF-8. As this meant that pg_upgrade could not be used, we also took the opportunity to enable checksums as well (this change cannot be done via pg_upgrade). Finally, they were moving their database server to new hardware. There were many lessons learned and bumps along the way for this migration, but for this post I'd like to focus on one of the most vexing problems, the database encoding.

When a Postgres database is created, it is set to a specific encoding. The most common one (and the default) is "UTF8". This covers 99% of all user's needs. The second most common one is the poorly-named "SQL_ASCII" encoding, which should be named"DANGER_DO_NOT_USE_THIS_ENCODING", because it causes nothing but trouble. The SQL_ASCII encoding basically means no encoding at all, and simply stores any bytes you throw at it. This usually means the database ends up containing a whole mess of different encodings, creating a "byte soup" that will be difficult to sanitize by moving to a real encoding (i.e. UTF-8).

Many tools exist which convert text from one encoding to another. One of the most popular ones on Unix boxes is "iconv". Although this program works great if your source text is using one encoding, it fails when it encounters byte soup.

For this migration, we first did a pg_dump from the old database to a newly created UTF-8 test database, just to see which tables had encoding problems. Quite a few did - but not all of them! - so we wrote a script to import tables in parallel, with some filtering for the problem ones. As mentioned above, iconv was not particularly helpful: looking at the tables closely showed evidence of many different encodings in each one: Windows-1252, ISO-8859-1, Japanese, Greek, and many others. There were even large bits that were plainly binary data (e.g. images) that simply got shoved into a text field somehow. This is the big problem with SQL_ASCII: it accepts everything, and does no validation whatsoever. The iconv program simply could not handle these tables, even when adding the //IGNORE option.

To better explain the problem and the solution, let's create a small text file with a jumble of encodings. Discussions of how UTF-8 represents characters, and its interactions with Unicode, are avoided here, as Unicode is a dense, complex subject, and this article is dry enough already. :)

First, we want to add some items using the encodings 'Windows-1252' and 'Latin-1'. These encoding systems were attempts to extend the basic ASCII character set to include more characters. As these encodings pre-date the invention of UTF-8, they do it in a very inelegant (and incompatible) way. Use of the "echo" command is a great way to add arbitrary bytes to a file as it allows direct hex input:

$ echo -e "[Windows-1252]   Euro: \x80   Double dagger: \x87" > sample.data
$ echo -e "[Latin-1]   Yen: \xa5   Half: \xbd" >> sample.data
$ echo -e "[Japanese]   Ship: \xe8\x88\xb9" >> sample.data
$ echo -e "[Invalid UTF-8]  Blob: \xf4\xa5\xa3\xa5" >> sample.data

This file looks ugly. Notice all the "wrong" characters when we simply view the file directly:

$ cat sample.data
[Windows-1252]   Euro: �   Double dagger: �
[Latin-1]   Yen: �   Half: �
[Japanese]   Ship: 船
[Invalid UTF-8]  Blob: ����

Running iconv is of little help:

## With no source encoding given, it errors on the Euro:
$ iconv -t utf8 sample.data >/dev/null
iconv: illegal input sequence at position 23

## We can tell it to ignore those errors, but it still barfs on the blob:
$ iconv -t utf8//ignore sample.data >/dev/null
iconv: illegal input sequence at position 123

## Telling it the source is Window-1252 fixes some things, but still sinks the Ship:
$ iconv -f windows-1252 -t utf8//ignore sample.data
[Windows-1252]   Euro: €   Double dagger: ‡
[Latin-1]   Yen: ¥   Half: ½
[Japanese]   Ship: èˆ¹
[Invalid UTF-8]  Blob: ô¥£¥

After testing a few other tools, we discovered the nifty Encoding::FixLatin, a Perl module which provides a command-line program called "fix_latin". Rather than being authoritative like iconv, it tries its best to fix things up with educated guesses. Its documentation gives a good summary:

  The script acts as a filter, taking source data which may contain a mix of
  ASCII, UTF8, ISO8859-1 and CP1252 characters, and producing output will be
  all ASCII/UTF8.

  Multi-byte UTF8 characters will be passed through unchanged (although
  over-long UTF8 byte sequences will be converted to the shortest normal
  form). Single byte characters will be converted as follows:

    0x00 - 0x7F   ASCII - passed through unchanged
    0x80 - 0x9F   Converted to UTF8 using CP1252 mappings
    0xA0 - 0xFF   Converted to UTF8 using Latin-1 mappings

While this works great for fixing the Windows-1252 and Latin-1 problems (and thus accounted for at least 95% of our table's bad encodings), it still allows"invalid" UTF-8 to pass on through. Which means that Postgres will still refuse to accept it. Let's check our test file:


$ fix_latin sample.data
[Windows-1252]   Euro: €   Double dagger: ‡
[Latin-1]   Yen: ¥   Half: ½
[Japanese]   Ship: 船
[Invalid UTF-8]  Blob: ����

## Postgres will refuse to import that last part:
$ echo "SELECT E'"  "$(fix_latin sample.data)"  "';" | psql
ERROR:  invalid byte sequence for encoding "UTF8": 0xf4 0xa5 0xa3 0xa5

## Even adding iconv is of no help:
$ echo "SELECT E'"  "$(fix_latin sample.data | iconv -t utf-8)"  "';" | psql
ERROR:  invalid byte sequence for encoding "UTF8": 0xf4 0xa5 0xa3 0xa5

The UTF-8 specification is rather dense and puts many requirements on encoders and decoders. How well programs implement these requirements (and optional bits) varies, of course. But at the end of the day, we needed that data to go into a UTF-8 encoded Postgres database without complaint. When in doubt, go to the source! The relevant file in the Postgres source code responsible for rejecting bad UTF-8 (as in the examples above) is src/backend/utils/mb/wchar.c Analyzing that file shows a small but elegant piece of code whose job is to ensure only "legal" UTF-8 is accepted:

bool
pg_utf8_islegal(const unsigned char *source, int length)
{
  unsigned char a;

  switch (length)
  {
    default:
      /* reject lengths 5 and 6 for now */
      return false;
    case 4:
      a = source[3];
      if (a < 0x80 || a > 0xBF)
        return false;
      /* FALL THRU */
    case 3:
      a = source[2];
      if (a < 0x80 || a > 0xBF)
        return false;
      /* FALL THRU */
    case 2:
      a = source[1];
      switch (*source)
      {
        case 0xE0:
          if (a < 0xA0 || a > 0xBF)
            return false;
          break;
        case 0xED:
          if (a < 0x80 || a > 0x9F)
            return false;
          break;
        case 0xF0:
          if (a < 0x90 || a > 0xBF)
            return false;
          break;
        case 0xF4:
          if (a < 0x80 || a > 0x8F)
            return false;
          break;
        default:
          if (a < 0x80 || a > 0xBF)
            return false;
          break;
      }
      /* FALL THRU */
    case 1:
      a = *source;
      if (a >= 0x80 && a < 0xC2)
        return false;
      if (a > 0xF4)
        return false;
      break;
  }
  return true;
}

Now that we know the UTF-8 rules for Postgres, how do we ensure our data follows it? While we could have made another standalone filter to run after fix_latin, that would increase the migration time. So I made a quick patch to the fix_latin program itself, rewriting that C logic in Perl. A new option "--strict-utf8" was added. Its job is to simply enforce the rules found in the Postgres source code. If a character is invalid, it is replaced with a question mark (there are other choices for a replacement character, but we decided simple question marks were quick and easy - and the surrounding data was unlikely to be read or even used anyway).

Voila! All of the data was now going into Postgres without a problem. Observe:

$ echo "SELECT E'"  "$(fix_latin  sample.data)"  "';" | psql
ERROR:  invalid byte sequence for encoding "UTF8": 0xf4 0xa5 0xa3 0xa5

$ echo "SELECT E'"  "$(fix_latin --strict-utf8 sample.data)"  "';" | psql
                   ?column?
----------------------------------------------
  [Windows-1252]   Euro: €   Double dagger: ‡+
 [Latin-1]   Yen: ¥   Half: ½                +
 [Japanese]   Ship: 船                       +
 [Invalid UTF-8]  Blob: ????
(1 row)

What are the lessons here? First and foremost, never use SQL_ASCII. It's outdated, dangerous, and will cause much pain down the road. Second, there are an amazing number of client encodings in use, especially for old data, but the world has pretty much standardized on UTF-8 these days, so even if you are stuck with SQL_ASCII, the amount of Windows-1252 and other monstrosities will be small. Third, don't be afraid to go to the source. If Postgres is rejecting your data, it's probably for a very good reason, so find out exactly why. There were other challenges to overcome in this migration, but the encoding was certainly one of the most interesting ones. Everyone, the client and us, is very happy to finally have everything using UTF-8!

↧

co:collective Doable Innovation Software

November 11, 2017, 12:37 pm

≫ Next: JBoss 4/5/6 to Wildfly migration tips

≪ Previous: Postgres migrating SQL_ASCII to UTF-8 with fix_latin

co:collective is a growth accelerator that works with leadership teams to conceive and execute innovation in the customer experience using a proprietary methodology called StoryDoing.

Doable, one of co:collective’s recent innovations, is a cloud-based platform designed to empower employees to meaningfully contribute and collaborate on ideas that move their business forward. The tool allows companies to solicit ideas from employees at all levels of an organization, filter down those ideas, make decisions as a team, and then implement a project — all the while collaborating in a fun and easy-to-use application. Over 200 companies across multiple sectors use Doable to create new products, new features, and problem solve to keep their business growing.

co:collective engaged End Point’s front-end developer Kamil Ciemniewski to work with their in-house development team led by Tommy Dunn. Kamil joined the Doable effort to refactor the Doable application which was moving from a classic Ruby on Rails-based application to an Angular frontend application with a Ruby on Rails backend. Kamil has been working with Doable since March to complete the application re-write and the project has gone immensely well. Kamil brings extensive frontend and backend knowledge to co:collective and together they’ve been able to refactor their site to be more efficient and powerful than before.

↧

JBoss 4/5/6 to Wildfly migration tips

November 11, 2017, 12:37 pm

≫ Next: How to check process duration in Linux with the "ps" command

≪ Previous: co:collective Doable Innovation Software

Introduction

Recently, we have taken over a big Java project that ran on the old JBoss 4 stack. As we know how dangerous for a business is outdated software, we and our client agreed that the most important task is to upgrade the server stack to the latest WildFly version.

It’s definitely not an easy job, but it’s worth to invest to sleep well and don’t worry about software problems.

This time it was even more work because of a complicated and not documented application, that’s why I wanted to share some tips and problem resolutions for issues I encountered.

Server configuration

You can set it up using multiple configuration files in the standalone/configuration directory.

I can recommend to use the standalone-full.xml file for most of setup, it contains a default full stack as opposed to standalone.xml.

You can also set up an application specific configuration using various configuration XML files (https://docs.jboss.org/author/display/WFLY10/Deployment+Descriptors+used+In+WildFly). Remember to keep the appliction specific configuration in the Classpath.

Quartz as a message queue

The Quartz library was used as a native message queue in previous JBoss versions. If you struggle and try to use its resource adapter with WildFly just skip it. It’s definitely too much work, even if it’s possible.

In the latest WildFly version (as of today 10.1) the default message queue library is ActiveMQ. It has almost the same API as the old Quartz has, so it’s easy to use it.

Quartz as a job scheduler

We had multiple cron-like jobs to migrate as well. All the jobs used Quartz to schedule runs.

The best solution here is to update Quartz to the latest version (yes!) and use a new API (http://www.quartz-scheduler.org/documentation/quartz-2.2.x/tutorials/tutorial-lesson-06.html) to create CronTriggers for the jobs.

trigger = newTrigger()
    .withIdentity("trigger3", "group1")
    .withSchedule(cronSchedule("0 42 10 * * ?"))
    .forJob(myJobKey)
    .build();

You can use the same cron syntax (e.g 0 42 10 * * ?) as in the 12 years old Quartz version. Yes!

JAR dependencies

In WildFly you can set up an internal module for each JAR dependency. It can be pretty time consuming to create declarations for more than 100 libraries (exactly 104 in our case). We decided to use Maven to handle dependencies of our application and skip declaring modules in WildFly. Why? In our opinion it’s better to encapsulate everything in an EAR file and keep WildFly configuration minimal as we won’t use our server for any other application in the future.

Just keep your dependencies in the Classpath and you will be fine.

JBoss CLI

I really prefer the bin/jboss-cli.sh interface to the web interface to handle deployments. It’s a powerful tool and much faster to work with than clicking through the UI.

JNDI path

If you can’t access your JNDI definition try to use the global namespace. Up to JAVA EE6 developers defined their own JNDI names. These names had a global scope. This doesn’t work anymore. To access a previously globally scoped name use this pattern: java:global/OLD_JNDI_NAME.

The java:global namespace was introduced in JAVA EE 6.

Reverse proxy

To configure a WildFly application with a reverse proxy you need to, of course, set up a virtual host with a reverse proxy declaration.

In addition, you must add an attribute to the server’s http-listener in the standalone-full.xml file. The attribute is proxy-address-forwarding and must be set to true.

Here is an example declaration:

<subsystem xmlns="urn:jboss:domain:undertow:3.1"><buffer-cache name="default"><server name="default-server"><http-listener enable-http2="true" name="default" proxy-address-forwarding="true" redirect-socket="https" socket-binding="http"></http-listener></server></buffer-cache></subsystem>

Summary

If you consider to upgrade to WildFly I can recommend it, it’s much faster than JBoss 4/5/6, scalable and fully prepared for modern applications.

↧

How to check process duration in Linux with the "ps" command

November 11, 2017, 12:37 pm

≫ Next: How to split Git repositories into two

≪ Previous: JBoss 4/5/6 to Wildfly migration tips

In certain cases we might want to get a certain process' elapsed time for our own reason. Turns out "ps" command could easily assist us in that. According to "ps" manual, etime could put the duration of time in [[DD-]hh:]mm:ss. format, while etimes in seconds.

From "ps" manpage:

etime       ELAPSED   elapsed time since the process was started, in the form [[DD-]hh:]mm:ss.
etimes      ELAPSED   elapsed time since the process was started, in seconds.

To use that, we could use (in [[DD-]hh:]mm:ss. format):

ps -p "pid" -o etime

or in seconds:

ps -p "pid" -o etimes

In this case the "pid" should be replaced with your intended process ID.

The following will help to nicely reporting the output. We can put -o etime or -o etimes with other argument, that is "command", in order to show the executed command along with its very own absolute path:

ps -p "28590" -o etime,command

ELAPSED COMMAND
21:45 /usr/bin/perl ./fastcgi-wrapper.pl 7999

We can also get the start date of the process' execution:

najmi@ubuntu-ampang:~$ ps -p 21745 -o etime,command,start
    ELAPSED COMMAND                      STARTED
 1-19:47:45 /usr/lib/firefox/firefox      Aug 02

What if we do not want to manually parsing the PID, instead (since we are very sure) to just get the name of the running application? We could just simply use pgrep or pidof

najmi@ubuntu-ampang:~$ ps -p $(pgrep firefox) -o pid,cmd,start,etime
  PID CMD                          STARTED     ELAPSED
21745 /usr/lib/firefox/firefox      Aug 02  2-04:29:36

najmi@ubuntu-ampang:~$ ps -p $(pidof firefox) -o pid,cmd,start,etime
  PID CMD                          STARTED     ELAPSED
21745 /usr/lib/firefox/firefox      Aug 02  2-04:29:42

What if the command issued many processes? Take an example of the Chrome browser:

najmi@ubuntu-ampang:~$ ps -p $(pidof chrome) -o pid,comm,cmd,start,etime
error: process ID list syntax error

Usage:
 ps [options]

 Try 'ps --help <simple|list|output|threads|misc|all>'
  or 'ps --help <s|l|o|t|m|a>'
 for additional help text.

For more details see ps(1).
</s|l|o|t|m|a></simple|list|output|threads|misc|all>

The best way (so far) that I could get is by creating a loop. It seems pidof is much more accurate when parsing the exact application (string) that we feed into it.

With pgrep:

najmi@ubuntu-ampang:~$ for i in `pgrep chrome`;do ps -p $i -o pid,comm,cmd,start,etime|tail -n +2;done
 2255 chrome          /opt/google/chrome/chrome - 08:05:43    02:55:17
 4990 chrome          /opt/google/chrome/chrome - 10:39:16       21:44
 5567 chrome          /opt/google/chrome/chrome - 10:53:13       07:47
 9448 chrome          /opt/google/chrome/chrome -   Jul 31  3-12:25:08
10033 chrome          /opt/google/chrome/chrome     Jul 27  8-10:43:42
10044 chrome          /opt/google/chrome/chrome -   Jul 27  8-10:43:42
10050 chrome          /opt/google/chrome/chrome -   Jul 27  8-10:43:42
10187 chrome          /opt/google/chrome/chrome -   Jul 27  8-10:43:39
10234 chrome          /opt/google/chrome/chrome -   Jul 27  8-10:43:37
10236 chrome          /opt/google/chrome/chrome -   Jul 27  8-10:43:37
19440 chrome          /opt/google/chrome/chrome - 22:30:34    12:30:26
20229 chrome          /opt/google/chrome/chrome -   Aug 03  1-09:51:25
20514 chrome          /opt/google/chrome/chrome - 22:52:25    12:08:35
20547 chrome          /opt/google/chrome/chrome - 22:52:36    12:08:24
21009 chrome          /opt/google/chrome/chrome -   Aug 03  1-09:27:11
22458 chrome          /opt/google/chrome/chrome -   Jul 27  8-03:44:07
22474 chrome-gnome-sh /usr/bin/python3 /usr/bin/c   Jul 27  8-03:44:07
23681 chrome          /opt/google/chrome/chrome -   Aug 03  1-03:33:45
23691 chrome          /opt/google/chrome/chrome -   Aug 03  1-03:33:45
23870 chrome          /opt/google/chrome/chrome - 00:15:15    10:45:45
24544 chrome          /opt/google/chrome/chrome - 00:40:17    10:20:43
25116 chrome          /opt/google/chrome/chrome - 00:51:31    10:09:29
25466 chrome          /opt/google/chrome/chrome - 00:59:55    10:01:05
29060 chrome          /opt/google/chrome/chrome - 02:15:42    08:45:18

With pidof:

najmi@ubuntu-ampang:~$ for i in `pidof chrome`;do ps -p $i -o pid,comm,cmd,start,etime|tail -n +2;done
29060 chrome          /opt/google/chrome/chrome - 02:15:42    08:47:40
25466 chrome          /opt/google/chrome/chrome - 00:59:55    10:03:27
25116 chrome          /opt/google/chrome/chrome - 00:51:31    10:11:51
24544 chrome          /opt/google/chrome/chrome - 00:40:17    10:23:05
23870 chrome          /opt/google/chrome/chrome - 00:15:15    10:48:07
23691 chrome          /opt/google/chrome/chrome -   Aug 03  1-03:36:07
23681 chrome          /opt/google/chrome/chrome -   Aug 03  1-03:36:07
22458 chrome          /opt/google/chrome/chrome -   Jul 27  8-03:46:29
21009 chrome          /opt/google/chrome/chrome -   Aug 03  1-09:29:33
20547 chrome          /opt/google/chrome/chrome - 22:52:36    12:10:46
20514 chrome          /opt/google/chrome/chrome - 22:52:25    12:10:57
20229 chrome          /opt/google/chrome/chrome -   Aug 03  1-09:53:47
19440 chrome          /opt/google/chrome/chrome - 22:30:34    12:32:48
10236 chrome          /opt/google/chrome/chrome -   Jul 27  8-10:45:59
10234 chrome          /opt/google/chrome/chrome -   Jul 27  8-10:45:59
10187 chrome          /opt/google/chrome/chrome -   Jul 27  8-10:46:01
10050 chrome          /opt/google/chrome/chrome -   Jul 27  8-10:46:04
10044 chrome          /opt/google/chrome/chrome -   Jul 27  8-10:46:04
10033 chrome          /opt/google/chrome/chrome     Jul 27  8-10:46:04
 9448 chrome          /opt/google/chrome/chrome -   Jul 31  3-12:27:30
 5567 chrome          /opt/google/chrome/chrome - 10:53:13       10:09
 4990 chrome          /opt/google/chrome/chrome - 10:39:16       24:06
 2255 chrome          /opt/google/chrome/chrome - 08:05:43    02:57:39

There is other tool, called as stat which records the timestamp of a file but for slightly different purpose. Stay tune for the next blogpost!

↧

How to split Git repositories into two

November 11, 2017, 12:37 pm

≫ Next: Client Case Study: Vervante - Publishing, Production and Fulfillment Services

≪ Previous: How to check process duration in Linux with the "ps" command

Ever wondered how to split your Git repo into two repos?

First you need to find out what files and directories you want to move to separate repos. In the above example we're moving dir3, dir4 and dir7 to repo A, and dir1, dir2, dir5 and dir8 to repo B.

Steps

What you need to do is to go through*each and every commit* in git history for every branch and filter out commits that modify directories that you dont care about in your new repo. The only flaw of this method is that it will leave those empty, filtered out commits in the history.

Track all branches

First we need to start tracking all branches locally:

for i in $(git branch -r | grep -vE "HEAD|master" | sed 's/^[ ]\+//');
  do git checkout --track $i
done

Then copy your original repo to two separate dirs: repo_a and repo_b.

cp -a source_repo repo_a
cp -a source_repo repo_b

Filter the history

Following command will delete all dirs that exclusively belong to repo B, thus we create repo A. Filtering is not limited to directories. You can provide relative paths to files, dirs etc.

cd repo_a
git filter-branch --index-filter 'git rm --cached -r dir8, dir2 || true' -- --all

cd repo_b
git filter-branch --index-filter 'git rm --cached -r dir3, dir4, dir7 || true' -- --all

Note that the || true prevents git from failing to filter our dirs mentioned in the rm clause in early stages of the git history where the dirs did not yet exist.

Look at the list of branches once again (in both repos):

git branch -l

Set new origins and push

In every repo, we need to remove the old origin and set up new origin. After it's done, we're ready to push.

Remove old origin:

git remote rm origin

Add new origin:

git remote add origin git@github.com:YourOrg/repo_a.git

Push all tracked branches:

git push origin --all

That's it!

↧

Client Case Study: Vervante - Publishing, Production and Fulfillment Services

November 11, 2017, 3:35 pm

≫ Next: Custom eCommerce Development

≪ Previous: How to split Git repositories into two

A real-life scenario

The following is a real-life example of services we have provided for one of our clients.

Vervante Corporation provides a print on demand and order fulfillment service for thousands of customers, in their case, "Authors". Vervante needed a way for these authors to keep track of their products. Essentially they needed an Inventory management system. So we designed a complete system from the ground up that allows Vervante's authors many custom functions that simply are not offered in a pre-built package anywhere.

This is also a good time to mention that you should always view your web presence, in fact your business itself, as a process, not a one time "setup". Your products will change, your customers will change, the web will change, everything will change. If you want your business to be successful, you will change.

Some Specifics

While it is beyond the scope of this case study to describe all of the programs that were developed for Vervante, it will be valuable for the reader to sample just a few of the areas to understand how diverse a single business can be. Here are a few of the functions we have built from scratch, over several years to continue to provide Vervante, their authors, and even their vendors with efficient processes to achieve their daily business needs.

Requirements

Author Requirement - First, in some cases, the best approach to a problem is to use someone else's solution! Vervante's authors have large data files that are converted to a product, and then shipped on demand as the orders come in. So we initially provided a custom file transfer process so that customers could directly upload their files to a server we set up for Vervante. Soon Vervante's rapid growth outpaced the efficacy of this system, so we investigated and determined the most efficient and cost-effective approach was to incorporate a 3rd party service. So we recommended a well known file transfer service and wrote a program to communicate with the file transfer service API. Now a client can easily describe and upload large files to Vervante.

View File Save Process 1. Storage Requirement - The remote storage of these large files caused Vervante a dramatic inefficiency as relates to access times, as they worked daily on these files to format, organize, and create product masters. So we needed to provide Vervante with a local server to act as a file server that was on their local network (LAN), where it could be rapidly accessed and manipulated. This was a challenge, as Vervante did not have IT personnel on site. So we purchased an appropriate server, set up everything in our offices, and shipped the complete server to them! They plugged the server into their local network, and with a long phone call, we had the server up and running and remotely managed.

Author Requirement - On the website, the authors first wanted to see what they had in inventory. Some customers provided Vervante with some product components that needed to be included with a complete product, while others relied on Vervante to build all components of their products. They also requested a way to set minimum inventory stock requirements.

So we built an interface that would allow authors to:

(a) See their current stock levels for all products,

(b) View outstanding orders for these items,

For example a finished product may consist of a book, a CD, and a DVD. A customer may supply the CD and require Vervante to produce the book and the DVD "on demand" for the product. We created a system that tracked all items at a "base" item level, and then allowed Vervante to "build" products with as many of these "base" items as necessary, to create the final product. The base items could be combined to create an item, and two or more items could be combined to produce yet another item. It is a recursive item inventory system, built from scratch specifically for Vervante.

Vervante Vendor (fulfillment warehouse) Requirement - Additionally, the fulfillment warehouse that receives, stores, builds and ships end user products, needed access to this system. They had several needs including:
- Retrieving pending orders for shipment - Creating packing / pick slips for the orders - Create shipping labels for orders - Manage returns - Input customer supplied inventory - Input fulfillment created inventory

In our administrative user interface for the fulfillment house, we developed a series of customer specific processes to address the above needs. Here is a high level example of how a few of the items on the list above are achieved:

- The fulfillment house logs into the user admin first thing in the morning, and prints the outstanding orders. - The "orders" are formatted similar to a packing slip, and each slip has all line items of the order, and a bar code imprinted on the slip. - This document is used as a "pick" slip, and is placed in a "pick" basket. The user then goes through the warehouse, gathers the appropriate items, and when complete the order is placed on a feed belt to the shipper location. - When the basket lands in front of the shipper, that person checks the contents of the basket against the slip, and then uses a bar code scanner to scan the order. That scan triggers a query into our system that returns all applicable shipping data into an Endicia or UPS system. - A shipping label is created, and the shipping cost and tracking information is returned to the our system. - Additionally the inventory is decremented accordingly when the order receives a shipping label and tracking number.

Requirements: administrative / accounting - Vervante also needed an administrative / accounting arm, designed to control all of the accounting functions such as:
- Recording customers' fulfillment charges - Recording customers' sales (Vervante sells product for the customers as well as fulfilling outside orders) - Determining fulfillment vendor fees and payments - Tracking shipping costs - Monthly billing of all customers - Monthly payments for all customers. - Interface with in-house accounting systems and keeping systems in sync - Tracking and posting outside order transactions

The above described processes are just a few of the processes that we developed from scratch, and matched to Vervante's needs. It is also a tiny portion of their system.

Last, but not least

Oh, and one other interesting fact: When Vervante first came to us several years ago, they had fewer than 20 customers. Today, they provide order fulfillment and print on demand services for nearly 4000 customers. So when we say to plan ahead for growth, we have experience in that area.

↧

Custom eCommerce Development

November 11, 2017, 12:37 pm

≫ Next: Web Security Services Roundup

≪ Previous: Client Case Study: Vervante - Publishing, Production and Fulfillment Services

"Custom eCommerce" means different things to different people and organizations. For some eCommerce shopping cart sites that pump out literally hundreds of websites a year, it may mean you get to choose from a dizzying array of "templates" that can set your website apart from others.

For others, it may be a slightly more involved arrangement where you can "create categories" to group display of your products on your website ... after you have entered your products into a prearranged database schema.

There are many levels of "custom" out there. Generally speaking, the closer you get to true "custom", the more accurate the term "development" becomes.

It is very important for your business that you decide what fits your needs, and that you match your needs to a platform or company that can provide appropriate services. As you can imagine, this will depend entirely on your business.

Example scenarios

For example, a small one- or two-person business that does fulfillment of online orders may be well suited for a pre-built approach, where you pay a monthly fee to simply log into an admin, add your products, and some content, and the company does the rest. It handles all of the "details."

A slightly larger company that has maybe 5-10 employees, and possibly a staff member with some understanding of websites, may choose to purchase a package that requires more customization and company related input, and perhaps even design or choice of templates.

From this level up, decisions become far more important and complex. For example, even though the previously described company may be perfectly suited with the choice described, if sales are expected to increase dramatically in the near future, or if the company is in a niche market where custom accounting or regulations require specific handling of records, a more advanced approach may be warranted.

What we do

The purpose of this post is not to give you guidelines as to what sort of website service you should buy, or consultancy you should hire for your company. Rather it is to point out some of the types of things that we at End Point do for companies that need a higher level of custom eCommerce development. In fact, the development we do is not limited to eCommerce.

We offer a full range of business consultancy and IT development services. We can guide you through many areas of your business development. True, we primarily provide services to companies that sell things on the web. But we also provide support for inventory management systems in your warehouses, accounts receivable / payable integration with your websites, management of your POS (point of sale) machines, strategic pricing for seasonal products with expiry dates, and the list goes on.

Real-life scenarios

The following is a real-life example of services we have provided for one client.

Case Study: Vervante

Consultant vs Service

Hopefully, the real life scenarios will help serve as an example as to how complex business needs can be, and how using an out of the box "eCommerce" website, will not work in every circumstance. When you find a good business consultant, you will know it. A consultant will not try to make your business fit into their template, they will listen to you and then assemble and tailor products to fit your business.

Regardless of the state of maturity of your business, very seldom will a single "system" or "website" cover all of your business needs. It will more likely be a collection of systems. Which systems and how they work together is likely what will determine success or failure. The more mature your business, the broader the scope of systems required to support the growing requirements of your business.

So whether you are a sole proprietor getting started with your business, or you are a CTO tasked with organizing and optimizing the many systems in your organization, understanding what type of service or partner you need, is the first step. In the future I will spotlight a few other examples of how we have assisted businesses in growing and improving how they do business.

↧

Web Security Services Roundup

November 13, 2017, 11:11 am

≫ Next: Liquid Galaxy Goes to Chile for IMPAC4

≪ Previous: Custom eCommerce Development

Security is often a very difficult thing to get right, especially when it’s not easy to find reliable or up-to-date information or the process of testing can be confusing and complicated. We have a lot of history and experience working on the security of websites and servers, and we’ve found many tools and websites to be very helpful. Here is a collection of those.

Server-side security

There are a number of tools available that can scan your website to check for common vulnerabilities and the quality of SSL/TLS configuration, as well as give great tips on how to improve security for your website.

Qualys SSL Labs Server Test takes a simple domain name, performs a series of tests from a variety of clients, and returns a simple letter grade (from A+ down to F) indicating the quality of your SSL/TLS configuration, as well as a detailed summary for a host of configuration options. It covers certificates key and algorithms; TLS and SSL configurations; cipher suites; handshakes on a wide variety of platforms including Android, iOS, Chrome, Firefox, Internet Explorer and Edge, Safari, and others; common protocols and vulnerabilities; and other details.
HTTP Security Report does a similar scan, but provides a much more simplified summary of a website, with a numeric score from 0 to 100. It gives a simple, easy to understand list of results, with a green check mark or a red X to indicate whether something is configured for security or not. It also provides short paragraphs explaining settings and recommended configurations.
HT-Bridge SSL/TLS Server Test is very similar to Qualys SSL Labs Server Test, but provides some valuable extra information, such as PCI-DSS, HIPAA, and NIST guidelines compliance, as well as industry best practices and basic analysis of third-party content.
- securityheaders.io is another letter-grade scan, but focuses on server headers only. It provides simple explanations for each recommended server header and links to guides on how to configure them correctly.
Observatory by Mozilla scans and gives information on HTTP, TLS, and SSH configuration, as well as simple summaries from other websites, including Qualys, HT-Bridge, and securityheaders.io as covered above.
SSL-Tools is focused on SSL and TLS configuration and certificates, with tools to scan websites and mail servers, check for common vulnerabilities, and decode certificates.
Microsoft Site Scan performs a series of simple tests, focused more on general website guidelines and best practices, including tests for outdated libraries and plugins which can be a security issue.
testssl.sh, the final website scanning tool I’ll cover, is a more advanced bash script that covers many of the same things these other websites do, but provides lots of options for fine-tuning test methods, returned information, and testing abnormal configurations. It’s also open source and doesn’t rely on any third parties.

These websites provide valuable information on SSL/TLS which can be used to create a secure, fast, and functional server configuration:

Security/Server Side TLS on the Mozilla wiki is a fantastic page which provides great summaries, recommendations, and reference information on many TLS topics, including handshakes, OCSP Stapling, HSTS, HPKP, certificate ciphers, and common attacks.
Mozilla SSL Configuration Generator is a simple tool that generates boilerplate server configuration files for common servers, including Apache and Nginx, and specific server and OpenSSL versions. It also allows you to target “modern”, “intermediate”, or “old” clients and servers, which will give the best configuration possible for each level.
Is TLS Fast Yet? is a great, simple, and to-the-point informational website which explains why TLS is so important and how to improve its performance so it has the smallest impact possible on your website’s speed.

Client-side security

These websites provide information and diagnostic tools to ensure that you are using a secure browser.

badssl.com gives a list of links to subdomains with various SSL configurations, including badly configured SSL, so you can have a good idea of what a well-configured website looks like versus one with errors in configuration, weak ciphers or key exchange protocols, or insecure HTTP forms.
IPv6 Test checks your network and browser for IPv6 support, showing you your ISP, reverse DNS pointers, both your IPv4 and IPv6 addresses, and giving an idea of when your computer or network may have problems with dual-stack IPv4 + IPv6 remote hosts or DNS.
How’s My SSL? and Qualys Labs SSL Client Test both check your browser for support of SSL/TLS versions, protocols, ciphers, and features, as well as susceptibility to common vulnerabilities.

General Tools

NeverSSL is a simple website that promises to never use SSL. Many public wifi networks require you to go through a payment or login page, which can be blocked when trying to access a well-secured website such as Google, Facebook, Twitter, or Amazon, which can cause trouble connecting to that website. NeverSSL provides an easy and simple way to access that login website.
crt.sh is a search engine for public TLS certificate information. It provides a history of certificates for a given domain name, with information including issuer and issue date, as well as an advanced search.
Digital Attack Map is an interactive map showing DDoS attacks across the world.
The Internet-Wide Scan Data Repository is a public archive of scans across the internet, intended for research and provided by the University of Michigan Censys Team.
take-a-screenshot.org is a simple website that shows how to take a screenshot on a variety of operating systems and desktop environments. It’s a fantastic tool to help less technically-minded people share their screens or issues they’re having.

↧

Liquid Galaxy Goes to Chile for IMPAC4

November 11, 2017, 3:35 pm

≫ Next: Working on production systems

≪ Previous: Web Security Services Roundup

Marine Protected Areas: Bringing the people and ocean together

Earlier this month, The Marine Conservation Institute and the Waitt Foundation brought a Liquid Galaxy to Chile to showcase interactive and panoramic ocean content at the Fourth International Marine Protected Areas Congress in La Serena. This conference was a convergence of marine biologists, ocean agencies, and environmentalists from around the globe to discuss the state of and future development initiatives of Marine Protected Areas worldwide.

The Marine Conservation Institute is working on a mapping project called MPAtlas that visually catalogs the development of Marine Protected Areas across the globe as well as the Global Ocean Refuge System which pushes for elevated standards for Marine Protected Areas and advocates for a 30% protected marine ecosystem by 2030.

mpatlas.org

We built new content to showcase the GLORES areas in Google Earth as well as data visualizations of the global system of Marine Protected Areas. In addition, we collected any past oceanographic related content we’ve developed for the Liquid Galaxy platform. This included underwater panoramic media from the Catlin Seaview Survey, Mission Blue Hope Spots, Google Street View collections of underwater photos such as this project which catalogs the Great Barrier Reef.

Min @marcelomena prueba Liquid Galaxy. Muestra imágenes en directo del fondo marino y permite fiscalizar áreas marinas protegidas #IMPAC4 pic.twitter.com/PxJ4c6Nxxd
— Min. Medio Ambiente (@MMAChile) September 5, 2017

Minister of the Environment Marcelo Mena Carrasco gives the
Liquid Galaxy a go

Aside from showcasing this curated content, the Liquid Galaxy served as a comprehensive interactive tool for scientists and environmentalists to showcase their research areas during the break periods between lectures and events occurring at the Congress. Since the Liquid Galaxy utilizes the entire globe, conference attendees were able to free fly to their respective research and/or proposed protected areas as an impromptu presentation aid and further explore underwater terrain and panoramic media in their location.

Liquid Galaxy at IMPAC4 - La Serena, Chile 2017

The Liquid Galaxy platform is featured in museums and aquariums around the world, and we are thrilled that it is being used as a tool to study and conserve oceans and nature. We recently had the opportunity to participate in The Ocean Conference at the United Nations, and are excited the system continues to be utilized to study and prevent future detrimental changes to our planet. We hope to have more opportunities to create content geared toward nature conservation, as well as opportunities to share the Liquid Galaxy with environmentalists so that the system will continue be used as a tool for visualizing research data in new and interesting ways.

↧